I have implemented a (slightly) modified version of conversation recycling using conversation timers and stored procedure activation from http://rusanu.com/2007/05/03/recycling-conversations/ . However it appears that, occasionally, deadlocks occur between the send and activated procedures on the conversation group/table. The main modification is that instead of having a column to represent the SPID in the table I am using an IdentifierType and Identifier value to identify the conversation. However I am only using the defaults (@@SPID) so I don't think that should matter in this case.
For the send side I have:
CREATE PROCEDURE [dbo].[usp_SendMessage] ( @endpointCode VARCHAR(255) = NULL, @endpointGroup VARCHAR(255) = NULL, @xmlPayload XML=NULL, @binaryPayload VARBINARY(MAX)=NULL, @varcharPayload VARCHAR(MAX)=NULL, @identifier VARCHAR(50) = @@SPID, @identifierType VARCHAR(50) = '@@SPID' ) AS BEGIN SET NOCOUNT ON DECLARE @fromService SYSNAME, @toService SYSNAME, @onContract SYSNAME, @messageType SYSNAME, @conversationTimeout INT SELECT @fromService = FromService , @toService = ToService , @onContract = OnContract , @messageType = MessageType , @conversationTimeout = ConversationTimeout FROM dbo.ServiceBrokerEndpointConfig WHERE GroupCode = @endpointGroup IF @fromService IS NULL OR @toService IS NULL OR @onContract IS NULL OR @messageType IS NULL OR @conversationTimeout IS NULL BEGIN RAISERROR ( N'Failed to get endpoint config for GroupCode ''%s''.' , 16, 1, @endpointGroup) WITH LOG; RETURN; END DECLARE @SBDialog UNIQUEIDENTIFIER DECLARE @Message XML DECLARE @counter INT DECLARE @error INT DECLARE @handle UNIQUEIDENTIFIER; DECLARE @NotNullCount INT = ((CASE WHEN @xmlPayload IS NULL THEN 0 ELSE 1 END) + (CASE WHEN @binaryPayload IS NULL THEN 0 ELSE 1 END)+ (CASE WHEN @varcharPayload IS NULL THEN 0 ELSE 1 END)) IF @NotNullCount > 1 BEGIN RAISERROR ( N'Failed to SEND because %i payload fields are filled in when no more than 1 is expected' , 16, 1, @NotNullCount) WITH LOG; RETURN; END SET @counter = 1 WHILE (1=1) BEGIN SET @handle = NULL -- Seek an eligible conversation in [ServiceBrokerConversations] -- We will hold an UPDLOCK on the composite primary key SELECT @handle = Handle FROM [ServiceBrokerConversations] WITH (UPDLOCK) WHERE Identifier = @identifier AND IdentifierType = @identifierType AND FromService = @fromService AND ToService = @toService AND OnContract = @onContract; IF @handle IS NULL BEGIN -- Need to start a new conversation for the current @Id BEGIN DIALOG CONVERSATION @handle FROM SERVICE @fromService TO SERVICE @toService ON CONTRACT @onContract WITH ENCRYPTION = OFF; -- Then the sender must listen on the -- send queue for the http://schemas.microsoft.com/SQL/ServiceBroker/DialogTimer message type and -- cleanup appropriately. IF @conversationTimeout IS NOT NULL BEGIN BEGIN CONVERSATION TIMER (@handle) TIMEOUT = @conversationTimeout; END INSERT INTO [ServiceBrokerConversations] (Identifier, IdentifierType, FromService, ToService, OnContract, Handle) VALUES (@identifier, @identifierType, @fromService, @toService, @onContract, @handle); END; IF @xmlPayload IS NOT NULL BEGIN -- Attempt to SEND on the associated conversation ;SEND ON CONVERSATION @handle MESSAGE TYPE @messageType (@xmlPayload); END ELSE IF @binaryPayload IS NOT NULL BEGIN ;SEND ON CONVERSATION @handle MESSAGE TYPE @messageType (@binaryPayload); END ELSE BEGIN ;SEND ON CONVERSATION @handle MESSAGE TYPE @messageType (@varcharPayload); END SELECT @error = @@ERROR; IF @error = 0 BEGIN -- Successful send, just exit the loop BREAK; END SELECT @counter = @counter+1; IF @counter > 10 BEGIN -- We failed 10 times in a row, something must be broken RAISERROR ( N'Failed to SEND on a conversation for more than 10 times. Error %i.' , 16, 1, @error) WITH LOG; BREAK; END -- Delete the associated conversation from the table and try again DELETE FROM [ServiceBrokerConversations] WHERE Handle = @handle; SET @handle = NULL; END END
And for the activation on the initiator queue I have:
CREATE PROCEDURE [dbo].[usp_InitiatorQueueHandler] AS BEGIN SET NOCOUNT ON DECLARE @handle UNIQUEIDENTIFIER; DECLARE @messageTypeName SYSNAME; DECLARE @messageBody VARBINARY(MAX); WHILE (1=1) BEGIN BEGIN TRAN; ;WAITFOR (RECEIVE TOP(1) @handle = conversation_handle, @messageTypeName = message_type_name, @messageBody = message_body FROM [InitiatorQueue]), TIMEOUT 5000; IF (@@ROWCOUNT = 0) BEGIN COMMIT TRAN; BREAK; END -- Call the base stored procedure to handle ending the conversation EXEC dbo.usp_BrokerHandleInitiator @handle, @messageTypeName, @messageBody COMMIT TRAN; END END GO ALTER QUEUE [InitiatorQueue] WITH ACTIVATION ( STATUS=ON, PROCEDURE_NAME=dbo.usp_InitiatorQueueHandler, EXECUTE AS OWNER, MAX_QUEUE_READERS=10 ) GO CREATE PROCEDURE [dbo].[usp_BrokerHandleInitiator] ( @handle UNIQUEIDENTIFIER, @messageTypeName SYSNAME, @messageBody VARBINARY(MAX) ) AS BEGIN SET NOCOUNT ON IF @handle IS NOT NULL BEGIN -- Delete the message from the [ServiceBrokerConversations] table -- before sending the [EndOfStream] message. The order is -- important to avoid deadlocks. IF @messageTypeName = N'http://schemas.microsoft.com/SQL/ServiceBroker/DialogTimer' OR @messageTypeName = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog' BEGIN DELETE FROM [ServiceBrokerConversations] WHERE [Handle] = @handle; END IF @messageTypeName = N'http://schemas.microsoft.com/SQL/ServiceBroker/DialogTimer' BEGIN ;SEND ON CONVERSATION @handle MESSAGE TYPE [EndOfStream]; END ELSE IF @messageTypeName = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog' BEGIN END CONVERSATION @handle; END ELSE IF @messageTypeName = N'http://schemas.microsoft.com/SQL/ServiceBroker/Error' BEGIN END CONVERSATION @handle; -- We could send a notification or store the error in a table for further inspection DECLARE @error INT; DECLARE @description NVARCHAR(4000); WITH XMLNAMESPACES (N'http://schemas.microsoft.com/SQL/ServiceBroker/Error' AS ssb) SELECT @error = CAST(@messageBody AS XML).value( '(//ssb:Error/ssb:Code)[1]', 'INT'), @description = CAST(@messageBody AS XML).value( '(//ssb:Error/ssb:Description)[1]', 'NVARCHAR(4000)') -- Maybe log to audit log instead? RAISERROR(N'Received error Code:%i Description:"%s"', 16, 1, @error, @description) WITH LOG; END; END END
The deadlock XML is:
<deadlock><victim-list><victimProcess id="process807dbd0c8" /></victim-list><process-list><process id="process807dbd0c8" taskpriority="0" logused="0" waitresource="METADATA: database_id = 21 CONVERSATION_GROUP($hash = 0xff26c7e1:0x478840de:0xd403bb)" waittime="2600" ownerId="8333217736" transactionname="GetDialogByHandle" lasttranstarted="2015-03-23T10:53:58.683" XDES="0x87f251c90" lockMode="X" schedulerid="2" kpid="7220" status="suspended" spid="110" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2015-03-23T10:53:58.683" lastbatchcompleted="2015-03-23T10:53:58.683" lastattention="1900-01-01T00:00:00.683" clientapp=".Net SqlClient Data Provider" hostname="COLFOQA2" hostpid="1436" loginname="dev" isolationlevel="read committed (2)" xactid="8333217704" currentdb="21" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056"><executionStack><frame procname="MYDB.dbo.usp_SendMessage" line="116" stmtstart="7540" stmtend="7696" sqlhandle="0x03001500aada77428391a0005da4000001000000000000000000000000000000000000000000000000000000"> SEND ON CONVERSATION @handle MESSAGE TYPE @messageType (@xmlPayload); </frame></executionStack><inputbuf> Proc [Database Id = 21 Object Id = 1115151018] </inputbuf></process><process id="process869a5e558" taskpriority="0" logused="588" waitresource="KEY: 21:72057594039959552 (1f1ae6770d1b)" waittime="2600" ownerId="8333217730" transactionname="user_transaction" lasttranstarted="2015-03-23T10:53:58.683" XDES="0x3e28456a8" lockMode="U" schedulerid="4" kpid="6720" status="background" spid="22" sbid="0" ecid="0" priority="0" trancount="2"><executionStack><frame procname="MYDB.dbo.usp_BrokerHandleInitiator" line="28" stmtstart="1996" stmtend="2144" sqlhandle="0x03001500f704cd06e691a0005da4000001000000000000000000000000000000000000000000000000000000"> DELETE FROM [ServiceBrokerConversations] WHERE [Handle] = @handle; </frame><frame procname="MYDB.dbo.usp_InitiatorQueueHandler" line="29" stmtstart="1014" stmtend="1172" sqlhandle="0x03001500316f56101694a0005da4000001000000000000000000000000000000000000000000000000000000"> EXEC dbo.usp_BrokerHandleInitiator @handle, @messageTypeName, @messageBody </frame></executionStack><inputbuf></inputbuf></process></process-list><resource-list><metadatalock subresource="CONVERSATION_GROUP" classid="$hash = 0xff26c7e1:0x478840de:0xd403bb" dbid="21" id="lock54fdb1800" mode="X"><owner-list><owner id="process869a5e558" mode="X" /></owner-list><waiter-list><waiter id="process807dbd0c8" mode="X" requestType="wait" /></waiter-list></metadatalock><keylock hobtid="72057594039959552" dbid="21" objectname="MYDB.dbo.ServiceBrokerConversations" indexname="PK__ServiceB__877FDFD18DF079BD" id="lock6c65b1a00" mode="U" associatedObjectId="72057594039959552"><owner-list><owner id="process807dbd0c8" mode="U" /></owner-list><waiter-list><waiter id="process869a5e558" mode="U" requestType="wait" /></waiter-list></keylock></resource-list></deadlock>
I have a clustered index on the fields I am SELECTing by and a UNIQUE index on the Handle (for the DELETE). When running the SELECT/DELETE statements against the table the query plan reports index seeks are being used:
CREATE TABLE [dbo].[ServiceBrokerConversations] ( [Identifier] VARCHAR (50) NOT NULL, [IdentifierType] VARCHAR (50) NOT NULL, [FromService] [sysname] NOT NULL, [ToService] [sysname] NOT NULL, [OnContract] [sysname] NOT NULL, [Handle] UNIQUEIDENTIFIER NOT NULL, [CreateDate] DATETIME2 (7) NULL, PRIMARY KEY CLUSTERED ([Identifier] ASC, [IdentifierType] ASC, [FromService] ASC, [ToService] ASC, [OnContract] ASC) ON [PRIMARY], UNIQUE NONCLUSTERED ([Handle] ASC) ON [PRIMARY] ) ON [PRIMARY];
What appears to be happening is the DELETE is somehow deadlocking with the SEND but I am not sure how since I am using them in the same order in both the send procedure and the activated procedure. Also, RCSI is enabled on the database I am receiving the deadlocks on.
EDIT:
I think I have found the culprit with lock acquisition order:
- In the usp_SendMessage proc:
- The SELECT locks the conversation record
- The SEND locks the conversation group
- In the timer activated proc on the initiator queue:
- The RECEIVE locks the conversation group
- The DELETE locks the conversation record
Given that I think there may be a few solutions:
- There is some subtle difference between my code and the code from the article that I am not noticing that when fixed will resolve the deadlocking. I am hoping this is the case since it seems that others have used this pattern without issues as far as I know.
- Or...The deadlocking is inherent to the pattern the code is using and I can either:
- Deal with the deadlocking by adjusting the deadlock priority on the activated stored procedure so that it becomes the victim, and I can implement retry logic.
- Remove conversation timers and activation all together and resort to some sort of job that expires the conversation by polling it, where I can control the ordering.
My ultimate goal is to eliminate any deadlocking on usp_SendMessage so that it "never" fails.
I appreciate any feedback!
Thanks