Quantcast
Channel: SQL Service Broker forum
Viewing all articles
Browse latest Browse all 461

Service Broker Activation Deadlocking

$
0
0

I have implemented a (slightly) modified version of conversation recycling using conversation timers and stored procedure activation from http://rusanu.com/2007/05/03/recycling-conversations/ . However it appears that, occasionally, deadlocks  occur between the send and activated procedures on the conversation group/table. The main modification is that instead of having a column to represent the SPID in the table I am using an IdentifierType and Identifier value to identify the conversation. However I am only using the defaults (@@SPID) so I don't think that should matter in this case.

For the send side I have:

CREATE PROCEDURE [dbo].[usp_SendMessage]
	(
	@endpointCode VARCHAR(255) = NULL,
	@endpointGroup VARCHAR(255) = NULL,
	@xmlPayload XML=NULL,
	@binaryPayload VARBINARY(MAX)=NULL,
	@varcharPayload VARCHAR(MAX)=NULL,
	@identifier VARCHAR(50) = @@SPID,
	@identifierType VARCHAR(50) = '@@SPID'
	)
AS BEGIN

	SET NOCOUNT ON

	DECLARE @fromService SYSNAME,
		@toService SYSNAME,
		@onContract SYSNAME,
		@messageType SYSNAME,
		@conversationTimeout INT

	SELECT @fromService = FromService
		, @toService = ToService
		, @onContract = OnContract
		, @messageType = MessageType
		, @conversationTimeout = ConversationTimeout
	FROM dbo.ServiceBrokerEndpointConfig
		WHERE GroupCode = @endpointGroup

	IF @fromService IS NULL OR @toService IS NULL OR @onContract IS NULL OR @messageType IS NULL OR @conversationTimeout IS NULL
	BEGIN
		RAISERROR (
			N'Failed to get endpoint config for GroupCode ''%s''.'
			, 16, 1, @endpointGroup) WITH LOG;
		RETURN;
	END

	DECLARE @SBDialog UNIQUEIDENTIFIER
	DECLARE @Message XML
	DECLARE @counter INT
	DECLARE @error INT
	DECLARE @handle UNIQUEIDENTIFIER;

	DECLARE @NotNullCount INT = ((CASE WHEN @xmlPayload IS NULL THEN 0 ELSE 1 END)
									+ (CASE WHEN @binaryPayload IS NULL THEN 0 ELSE 1 END)+ (CASE WHEN @varcharPayload IS NULL THEN 0 ELSE 1 END))
	IF @NotNullCount > 1
	BEGIN
		RAISERROR (
				N'Failed to SEND because %i payload fields are filled in when no more than 1 is expected'
				, 16, 1, @NotNullCount) WITH LOG;
		RETURN;
	END

	SET @counter = 1

	WHILE (1=1)
	BEGIN

		SET @handle = NULL

		-- Seek an eligible conversation in [ServiceBrokerConversations]
		-- We will hold an UPDLOCK on the composite primary key
		SELECT @handle = Handle
			FROM [ServiceBrokerConversations] WITH (UPDLOCK)
			WHERE Identifier = @identifier
					AND IdentifierType = @identifierType
					AND FromService = @fromService
					AND ToService = @toService
					AND OnContract = @onContract;

		IF @handle IS NULL
		BEGIN

			-- Need to start a new conversation for the current @Id
			BEGIN DIALOG CONVERSATION @handle
				FROM SERVICE @fromService
				TO SERVICE @toService
				ON CONTRACT @onContract
				WITH ENCRYPTION = OFF;

			-- Then the sender must listen on the
			-- send queue for the http://schemas.microsoft.com/SQL/ServiceBroker/DialogTimer message type and
			-- cleanup appropriately.
			IF @conversationTimeout IS NOT NULL
			BEGIN
				BEGIN CONVERSATION TIMER (@handle) TIMEOUT = @conversationTimeout;
			END

			INSERT INTO [ServiceBrokerConversations]
				(Identifier, IdentifierType, FromService, ToService, OnContract, Handle)
				VALUES
				(@identifier, @identifierType, @fromService, @toService, @onContract, @handle);

		END;

		IF @xmlPayload IS NOT NULL
		BEGIN
			-- Attempt to SEND on the associated conversation
			;SEND ON CONVERSATION @handle
				MESSAGE TYPE @messageType (@xmlPayload);
		END ELSE IF @binaryPayload IS NOT NULL
		BEGIN
			;SEND ON CONVERSATION @handle
				MESSAGE TYPE @messageType (@binaryPayload);
		END ELSE BEGIN
			;SEND ON CONVERSATION @handle
				MESSAGE TYPE @messageType (@varcharPayload);
		END

		SELECT @error = @@ERROR;

		IF @error = 0
		BEGIN
			-- Successful send, just exit the loop
			BREAK;
		END

		SELECT @counter = @counter+1;

		IF @counter > 10
		BEGIN
			-- We failed 10 times in a row, something must be broken
			RAISERROR (
				N'Failed to SEND on a conversation for more than 10 times. Error %i.'
				, 16, 1, @error) WITH LOG;
			BREAK;
		END

		-- Delete the associated conversation from the table and try again
		DELETE FROM [ServiceBrokerConversations]
			WHERE Handle = @handle;

		SET @handle = NULL;

	END

END


And for the activation on the initiator queue I have:

CREATE PROCEDURE [dbo].[usp_InitiatorQueueHandler]
AS
BEGIN

	SET NOCOUNT ON

	DECLARE @handle UNIQUEIDENTIFIER;
	DECLARE @messageTypeName SYSNAME;
	DECLARE @messageBody VARBINARY(MAX);

	WHILE (1=1)
	BEGIN

		BEGIN TRAN;

		;WAITFOR (RECEIVE TOP(1)
			@handle = conversation_handle,
			@messageTypeName = message_type_name,
			@messageBody = message_body
		FROM [InitiatorQueue]), TIMEOUT 5000;

		IF (@@ROWCOUNT = 0)
		BEGIN
			COMMIT TRAN;
			BREAK;
		END

		-- Call the base stored procedure to handle ending the conversation
		EXEC dbo.usp_BrokerHandleInitiator @handle, @messageTypeName, @messageBody

		COMMIT TRAN;

	END

END

GO

ALTER QUEUE [InitiatorQueue]
WITH ACTIVATION (
	STATUS=ON,
	PROCEDURE_NAME=dbo.usp_InitiatorQueueHandler,
    EXECUTE AS OWNER,
    MAX_QUEUE_READERS=10
)

GO

CREATE PROCEDURE [dbo].[usp_BrokerHandleInitiator]
	(
	@handle UNIQUEIDENTIFIER,
	@messageTypeName SYSNAME,
	@messageBody VARBINARY(MAX)
	)
AS
BEGIN

	SET NOCOUNT ON

	IF @handle IS NOT NULL
	BEGIN

		-- Delete the message from the [ServiceBrokerConversations] table
		-- before sending the [EndOfStream] message. The order is
		-- important to avoid deadlocks.
		IF @messageTypeName = N'http://schemas.microsoft.com/SQL/ServiceBroker/DialogTimer'
				OR @messageTypeName = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog'
		BEGIN
			DELETE FROM [ServiceBrokerConversations]
				WHERE [Handle] = @handle;
		END

		IF @messageTypeName = N'http://schemas.microsoft.com/SQL/ServiceBroker/DialogTimer'
		BEGIN
			;SEND ON CONVERSATION @handle
				MESSAGE TYPE [EndOfStream];

		END
		ELSE IF @messageTypeName = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog'
		BEGIN
			END CONVERSATION @handle;
		END
		ELSE IF @messageTypeName = N'http://schemas.microsoft.com/SQL/ServiceBroker/Error'
		BEGIN
			END CONVERSATION @handle;

			-- We could send a notification or store the error in a table for further inspection
			DECLARE @error INT;
			DECLARE @description NVARCHAR(4000);
			WITH XMLNAMESPACES (N'http://schemas.microsoft.com/SQL/ServiceBroker/Error' AS ssb)

			SELECT

				@error = CAST(@messageBody AS XML).value(
					'(//ssb:Error/ssb:Code)[1]', 'INT'),
				@description = CAST(@messageBody AS XML).value(
					'(//ssb:Error/ssb:Description)[1]', 'NVARCHAR(4000)')

			-- Maybe log to audit log instead?
			RAISERROR(N'Received error Code:%i Description:"%s"',
				 16, 1, @error, @description) WITH LOG;

		END;
	END
END

The deadlock XML is:

<deadlock><victim-list><victimProcess id="process807dbd0c8" /></victim-list><process-list><process id="process807dbd0c8" taskpriority="0" logused="0" waitresource="METADATA: database_id = 21 CONVERSATION_GROUP($hash = 0xff26c7e1:0x478840de:0xd403bb)" waittime="2600" ownerId="8333217736" transactionname="GetDialogByHandle" lasttranstarted="2015-03-23T10:53:58.683" XDES="0x87f251c90" lockMode="X" schedulerid="2" kpid="7220" status="suspended" spid="110" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2015-03-23T10:53:58.683" lastbatchcompleted="2015-03-23T10:53:58.683" lastattention="1900-01-01T00:00:00.683" clientapp=".Net SqlClient Data Provider" hostname="COLFOQA2" hostpid="1436" loginname="dev" isolationlevel="read committed (2)" xactid="8333217704" currentdb="21" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056"><executionStack><frame procname="MYDB.dbo.usp_SendMessage" line="116" stmtstart="7540" stmtend="7696" sqlhandle="0x03001500aada77428391a0005da4000001000000000000000000000000000000000000000000000000000000">
SEND ON CONVERSATION @handle
				MESSAGE TYPE @messageType (@xmlPayload);    </frame></executionStack><inputbuf>
Proc [Database Id = 21 Object Id = 1115151018]   </inputbuf></process><process id="process869a5e558" taskpriority="0" logused="588" waitresource="KEY: 21:72057594039959552 (1f1ae6770d1b)" waittime="2600" ownerId="8333217730" transactionname="user_transaction" lasttranstarted="2015-03-23T10:53:58.683" XDES="0x3e28456a8" lockMode="U" schedulerid="4" kpid="6720" status="background" spid="22" sbid="0" ecid="0" priority="0" trancount="2"><executionStack><frame procname="MYDB.dbo.usp_BrokerHandleInitiator" line="28" stmtstart="1996" stmtend="2144" sqlhandle="0x03001500f704cd06e691a0005da4000001000000000000000000000000000000000000000000000000000000">
DELETE FROM [ServiceBrokerConversations]
				WHERE [Handle] = @handle;    </frame><frame procname="MYDB.dbo.usp_InitiatorQueueHandler" line="29" stmtstart="1014" stmtend="1172" sqlhandle="0x03001500316f56101694a0005da4000001000000000000000000000000000000000000000000000000000000">
EXEC dbo.usp_BrokerHandleInitiator @handle, @messageTypeName, @messageBody    </frame></executionStack><inputbuf></inputbuf></process></process-list><resource-list><metadatalock subresource="CONVERSATION_GROUP" classid="$hash = 0xff26c7e1:0x478840de:0xd403bb" dbid="21" id="lock54fdb1800" mode="X"><owner-list><owner id="process869a5e558" mode="X" /></owner-list><waiter-list><waiter id="process807dbd0c8" mode="X" requestType="wait" /></waiter-list></metadatalock><keylock hobtid="72057594039959552" dbid="21" objectname="MYDB.dbo.ServiceBrokerConversations" indexname="PK__ServiceB__877FDFD18DF079BD" id="lock6c65b1a00" mode="U" associatedObjectId="72057594039959552"><owner-list><owner id="process807dbd0c8" mode="U" /></owner-list><waiter-list><waiter id="process869a5e558" mode="U" requestType="wait" /></waiter-list></keylock></resource-list></deadlock>

I have a clustered index on the fields I am SELECTing by and a UNIQUE index on the Handle (for the DELETE). When running the SELECT/DELETE statements against the table the query plan reports index seeks are being used:

CREATE TABLE [dbo].[ServiceBrokerConversations] (
    [Identifier]     VARCHAR (50)     NOT NULL,
    [IdentifierType] VARCHAR (50)     NOT NULL,
    [FromService]    [sysname]        NOT NULL,
    [ToService]      [sysname]        NOT NULL,
    [OnContract]     [sysname]        NOT NULL,
    [Handle]         UNIQUEIDENTIFIER NOT NULL,
    [CreateDate]     DATETIME2 (7)    NULL,
    PRIMARY KEY CLUSTERED ([Identifier] ASC, [IdentifierType] ASC, [FromService] ASC, [ToService] ASC, [OnContract] ASC) ON [PRIMARY],
    UNIQUE NONCLUSTERED ([Handle] ASC) ON [PRIMARY]
) ON [PRIMARY];

What appears to be happening is the DELETE is somehow deadlocking with the SEND but I am not sure how since I am using them in the same order in both the send procedure and the activated procedure. Also, RCSI is enabled on the database I am receiving the deadlocks on.

EDIT:

I think I have found the culprit with lock acquisition order:

- In the usp_SendMessage proc:

  • The SELECT locks the conversation record
  • The SEND locks the conversation group

- In the timer activated proc on the initiator queue:

  • The RECEIVE locks the conversation group
  • The DELETE locks the conversation record

Given that I think there may be a few solutions:

  1. There is some subtle difference between my code and the code from the article that I am not noticing that when fixed will resolve the deadlocking. I am hoping this is the case since it seems that others have used this pattern without issues as far as I know.
  2. Or...The deadlocking is inherent to the pattern the code is using and I can either:
  • Deal with the deadlocking by adjusting the deadlock priority on the activated stored procedure so that it becomes the victim, and I can implement retry logic.
  • Remove conversation timers and activation all together and resort to some sort of job that expires the conversation by polling it, where I can control the ordering.

My ultimate goal is to eliminate any deadlocking on usp_SendMessage so that it "never" fails.

I appreciate any feedback!

Thanks





Viewing all articles
Browse latest Browse all 461

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>