I recently wrote a high-volume, fast paced application which supports many threads reaching into a work queue (SQL table) and retrieving a row representing a piece of work.  The table uses an identity column as its primary key.

In other work experiences, I have seen SQL locking hints used to prevent two threads from grabbing the same piece of work.  Mostly, the locking hints are READPAST, and some form of ROWLOCK, PAGELOCK or TABLELOCK.  In high volume environments, the locking hints have periodically failed, and occasionally the same piece of work goes to different threads.

I decided to try a different approach which isolates the allocation away from SQL internal locking  mechanisms, although the locking mechanisms can be used.  In my table, the work is marked as assigned, and later is marked as either failed/reassign or successful, or can timeout and become available for reassignment.  In other words, the row survives the work completion: it is not deleted upon retrieval.

The technique uses a column in the work queue table called Process_Marker, which is a nullable uniqueidentifier.  When a piece of work is injected, this column is left null.  When the worker calls the stored procedure to retrieve a piece of available work, the stored procedure creates a GUID using NEWID() in a static variable, and attempts to update the Process_Marker column in an available row (i.e. Process Marker is null) with that GUID.  As its final action, the stored procedure selects all the rows with that particular GUID, and updates any necessary admin columns in those rows.  In the UPDATE statement, the READPAST locking hint is used.

The identity column is used by the workers to report/update results, in case the workers are capable of requesting multiple work pieces in a single call to the stored procedure. The worker just needs to set the Last_Attempt_End_Date to the current system time, to prevent the work from being reissued/retried.

I haven’t done full load testing on it yet, but the results look promising.

Here is the code to create the work table (Job_Queue):

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
 
CREATE TABLE [dbo].[Job_Queue] (
[ID] [BIGINT] IDENTITY(1,1) NOT NULL,
[Priority] [tinyint] NOT NULL CONSTRAINT [DF_Job_Queue_Priority]  DEFAULT ((255)),
[Available_Date] [datetime] NOT NULL CONSTRAINT [DF_Job_Queue_Available_Date]  DEFAULT (getdate()),
[Timeout_Date] [datetime] NULL,
[Attempts] [INT] NOT NULL CONSTRAINT [DF_Job_Queue_Attempts]  DEFAULT ((0)),
[Processing_Mark] [uniqueidentifier] NULL,
[Processing_Server] [VARCHAR](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Last_Attempt_Start_Date] [datetime] NULL,
[Last_Attempt_End_Date] [datetime] NULL,
[Payload] [VARCHAR](100) NOT NULL CONSTRAINT [DF_Job_Queue_Payload] DEFAULT ('a piece of work in the queue')
CONSTRAINT [PK_Job_Queue] PRIMARY KEY CLUSTERED
(
[ID] ASC
) ON [PRIMARY]
) ON [PRIMARY]
 
GO
 
SET ANSI_PADDING OFF
GO

And here is the stored procedure for retrieving one or more jobs, with anti-collision logic (usp_API_Get_Job):

SET ANSI_NULLS ON
GO
 
SET QUOTED_IDENTIFIER ON
GO
 
ALTER PROCEDURE [dbo].[usp_API_Get_Job]
 
	@sProcessingServer VARCHAR(50),
	@nWorkCount INT = 1
 
AS
 
BEGIN
	-- SET NOCOUNT ON added to prevent extra result sets from
	-- interfering with SELECT statements.
	SET NOCOUNT ON;
 
	DECLARE @xGUID UNIQUEIDENTIFIER
 
	SET @xGUID = NEWID()
 
	UPDATE jq SET
		Processing_Mark = @xGUID,
		Processing_Server = @sProcessingServer,
		Attempts = Attempts + 1,
		Last_Attempt_Start_Date = GETDATE(),
		Timeout_Date = DATEADD (mi, 15, GETDATE())
	FROM Job_Queue AS jq WITH (UPDLOCK)
	INNER JOIN (
		SELECT TOP (@nWorkCount) ID
		FROM Job_Queue WITH (READPAST)
		WHERE
		(Processing_Mark IS NULL AND Attempts = 0 AND Available_Date <= GETDATE())
		OR (
			ISNULL(Last_Attempt_End_Date, '1/1/2000') < Last_Attempt_Start_Date
			AND ISNULL (Timeout_Date, Available_Date) <= GETDATE()
			AND Attempts < 3
		)
		ORDER BY Priority ASC, Available_Date ASC
	) AS jqs
	ON jq.ID = jqs.ID
	WHERE
	(jq.Processing_Mark IS NULL AND jq.Attempts = 0 AND jq.Available_Date <= GETDATE())
	OR (
		ISNULL(jq.Last_Attempt_End_Date, '1/1/2000') < jq.Last_Attempt_Start_Date
		AND ISNULL (jq.Timeout_Date, jq.Available_Date) <= GETDATE()
		AND jq.Attempts < 3
	)
 
	SELECT ID, Payload
	FROM Job_Queue
	WHERE Processing_Mark = @xGUID
END

So I’ll see if this behaves well under some heavy loads..