Reduce SQL Server Deadlocks with a Clustered Index

Problem

A DBA flagged a T-SQL script that runs thousands of times daily, causing hundreds of deadlocks. This DBA is busy addressing major issues and testing backups, and it’s not their responsibility to fix the application’s code. Developers must resolve the SQL Server deadlock issue.

Solution

Let’s start by outlining the significance of SQL Server deadlocks. I will share some code that led to excessive deadlocks and common coding strategies to resolve this issue, including enabling Read Committed Snapshot Isolation (RCSI) and using table hints. Spoiler alert: We added a clustered index to resolve the deadlocks. This tip may assist you with a similar challenge.

What is a SQL Server Deadlock?

A deadlock typically occurs when two processes hold resources that the other needs, and neither is willing to surrender them. SQL Server intervenes by declaring one of them the deadlock victim and rolling back that transaction.

Let’s discuss a real-world scenario outside of SQL Server. I have two sons (Henry and Liam) who both want to play video games. Henry grabs the only working controller and searches for the Mario Kart cartridge. Liam has the cartridge and wants to play first. So, he waits for the controller. Now they’re stuck, each holding what the other needs and not willing to give up. That’s a deadlock – when a circular wait persists without resolution.

Microsoft provides a list of potential deadlocks in the “Deadlocks guide.” Not only can tables be deadlocked, but memory and various other resources can also experience deadlocks.

Why Deadlocks Matter

Most active databases occasionally experience deadlocks. However, if deadlocks are excessive, your application may lose data since one of the transactions must roll back. You can mitigate this data loss by implementing retry logic in your T-SQL or application code. Furthermore, deadlocks create additional work on SQL Server to roll back one of the transactions.

SQL Server has a built-in mechanism for detecting and resolving deadlocks. When the lock monitor identifies a deadlock, it designates the transaction with the lowest rollback cost as the victim, while allowing the other to commit.

Demonstration

Create an Environment

Let’s create our demo with a single database and table MagicNumbers. What will my demo environment look like?

  • VM with Windows 2019 & 4 Cores
  • 16GB of memory
  • MAX DOP = 4
  • MAX Server Memory (MB) = 14000
  • Cost threshold = 20
  • SQL Server 2022 Developer Edition

For the MagicNumbers table, take note of Column2, as it plays a crucial role later. Run the code below to follow along.

/*
* MSSQLTips.com
* Create one table with 75,000 rows
*/
USE [master];
GO
IF DB_ID('SolvingDeadlocksDemo') IS NOT NULL
BEGIN
    ALTER DATABASE SolvingDeadlocksDemo
    SET SINGLE_USER
    WITH ROLLBACK IMMEDIATE;
    DROP DATABASE SolvingDeadlocksDemo;
END;
GO
CREATE DATABASE SolvingDeadlocksDemo;
GO
ALTER DATABASE SolvingDeadlocksDemo SET RECOVERY SIMPLE;
GO
USE SolvingDeadlocksDemo;
GO
CREATE TABLE dbo.MagicNumbers
(
    Id INT IDENTITY(1, 1) NOT NULL,
    Number1 INT NOT NULL,
    Number2 INT NOT NULL,
    Number3 INT NOT NULL,
    SomeText NVARCHAR(500) NULL,
    CreateDate DATETIME2 NOT NULL
);
GO
INSERT INTO dbo.MagicNumbers
(
    Number1,
    Number2,
    Number3,
    SomeText,
    CreateDate
)
SELECT TOP (75000)
    ABS(CHECKSUM(NEWID()) % 50) + 1 AS Number1,
    ABS(CHECKSUM(NEWID()) % 25) + 1 AS Number2,
    ABS(CHECKSUM(NEWID()) % 75) + 1 AS Number3,
    REPLICATE('Test Text to have more pages', 10) SomeText,
    SYSUTCDATETIME() AS CreateDate
FROM sys.all_columns AS n1
    CROSS JOIN sys.all_columns AS n2;
GO
CREATE NONCLUSTERED INDEX IX_MagicNumbers_Number2
ON dbo.MagicNumbers (Number2)
WITH (FILLFACTOR = 80);
GO
CREATE OR ALTER PROCEDURE dbo.GenerateMagicNumbers (@deleteNumber2 INT)
AS
BEGIN
    SET NOCOUNT ON;
    SET XACT_ABORT ON;
    CREATE TABLE #MagicNumberTemp
    (
        Number1 INT,
        Number3 INT,
        SomeText NVARCHAR(500)
    );
    DECLARE @currentDate AS DATETIME2 = SYSUTCDATETIME();
    INSERT INTO #MagicNumberTemp
    (
        Number1,
        Number3,
        SomeText
    )
    SELECT TOP 3000
        ABS(CHECKSUM(NEWID()) % 50) + 1 AS Number1,
        ABS(CHECKSUM(NEWID()) % 75) + 1 AS Number3,
        REPLICATE('Test Text to have more pages', 10) SomeText
    FROM sys.all_columns AS n1;
    BEGIN TRANSACTION
    DELETE FROM dbo.MagicNumbers
    WHERE Number2 = @deleteNumber2;
    INSERT INTO dbo.MagicNumbers
    (
        Number1,
        Number2,
        Number3,
        SomeText,
        CreateDate
    )
    SELECT Number1,
           @deleteNumber2,
           Number3,
           SomeText,
           @currentDate
    FROM #MagicNumberTemp;
    COMMIT TRANSACTION;
END
GO

Insert-Delete Pattern

The stored procedure at the end resembles the one that caused the deadlocks. From my research, the problem lies in the DELETE statement below.

DELETE FROM dbo.MagicNumbers
WHERE Number2 = @deleteNumber2;   

It follows an insert-then-delete pattern on the same table. Could we use an UPDATE statement instead? Perhaps, but we must delete some of the data. Unless we add an IsDeleted column and do this in batches or as part of a cleanup script afterward, that would require changing the application’s logic.

Something else to note is that at one point, this table did not suffer from deadlocks, at least to any noticeable degree, early on in its life. After a couple of years, and as databases tend to do, it grew over time to tens of thousands of rows.

Simulating the Deadlocks

There are several ways to reproduce deadlocks. One of my favorite tools for this type of exercise is Adam Machanic’s SqlQueryStress.

I’m using parameter substitution along with executing the stored procedure repeatedly with multiple sessions. Below is the code and settings to follow along, assuming you have downloaded SqlQueryStress.

SELECT TOP (10000)
       ABS(CHECKSUM(NEWID()) % 25) + 1 AS DeleteNumber2
FROM sys.all_columns AS n1;
SqlQueryStress

After running SqlQueryStress for a minute, you’ll encounter several deadlocks. Let’s look at why the deadlock occurs.

Looking at the Deadlock

There are likely several types of deadlocks occurring here; however, I will present the typical one we encountered below. I’ve included a simple deadlock diagram from the Sentry Plan Explorer, which I prefer over the deadlock graph in SQL Server Management Studio (SSMS) for several reasons, including its playback feature.

Deadlock Diagram
  • Step 1: SPID 74 takes an Update (U) lock on Key B, likely because it scans and believes a DELETE will occur.
  • Step 2: SPID 78 takes an Exclusive (X) lock on Key A; likely, a (U) lock occurred before this.
  • Step 3: SPID 78 requests a (U) lock on Key B and 75 blocks it.
  • Step 4: SPID 74 requests a (U) lock on Key A, and the deadlock circular reference begins.

In the example above, the lock monitor selects SPID 74 as the victim and rolls back the transaction.

Theory on Why This Deadlock Occurs

Non-clustered indexes on heaps utilize Row Identifiers (RIDs) to locate rows. Heaps lack a physical row order, so different processes may access the same resource in varying sequences. In this scenario, the first delete process acquires a (U) lock and then attempts to upgrade it to an exclusive (X) lock, but another process already holds the (X) lock. If multiple processes access the rows in opposite order, they can each hold one lock while waiting for the other, resulting in a deadlock. I’m open to revising my thoughts here if someone can identify a more specific cause.

Resolving the SQL Server Deadlock

I recall reading something Paul White wrote, along the lines of, “You shouldn’t really care much about why the deadlock occurs; the important question is, how do you avoid it?” I like Paul’s advice. Let’s review a few suggestions for eliminating the deadlock before getting to the ultimate fix that worked for us.

Read Committed Snapshot Isolation

After enabling Read Committed Snapshot isolation for a database, writers will no longer block readers, as a reader does not need to acquire a shared (S) lock. In this scenario, reading data with a SELECT statement isn’t what causes deadlocks. I’ve included the code below for you to try.

ALTER DATABASE SolvingDeadlocksDemo SET READ_COMMITTED_SNAPSHOT ON; 

The code above didn’t help resolve the issue for me.

Exclusive Table Lock During the Delete

I mention this idea because it’s one you see, but I’m not a huge fan of it. For one, it locks the table, which is counter to the whole concept of concurrency. Does it stop the deadlocks in this situation? Yes, but the cost is too high.

DELETE FROM dbo.MagicNumbers WITH (TABLOCKX)
WHERE Number2 = @deleteNumber2;

But as Joe Celko said, “It is impossible to simultaneously achieve high availability, consistency, and serializability.” With that in mind, let’s shift to something less dramatic.

Delete Rows in Batches

One common piece of advice is to delete rows in smaller batches, such as 100 rows at a time, instead of 3,000. This method should reduce the time and the number of locks held. However, it relies on the session committing each smaller batch after the deletion occurs. Unfortunately, we cannot do that here. The commit cannot happen until after the insert. In this case, we would likely experience more deadlocks than we currently do.

Add a READPAST Hint

Using a READPAST hint is an intriguing idea suggested to me by Adam Machanic, and I would like to explore it further in a future post.

DELETE FROM dbo.MagicNumbers WITH (READPAST)
WHERE Number2 = @deleteNumber2;

If you use a READPAST hint on a table, other sessions skip currently locked rows, which helps reduce blocking and deadlocks. However, there seems to be a significant downside: it skips the rows that are intended for deletion. Depending on your situation, this behavior might not pose an issue; alternatively, you could implement additional logic to ensure SQL eventually deletes those rows. As I mentioned, I plan to explore this hint further in a future article.

Add a Clustered Index

The original stored procedure contained a significant amount of logic preceding the INSERT and DELETE statements. However, through testing, it became clear that the DELETE statement was causing the deadlocks. After reviewing the original stored procedure, I realized that I had overlooked one key detail: this is a heap because it lacks a clustered index. You likely spotted this detail in the original database creation script and have been saying all along, “add a clustered index, dude!”

Let’s add a clustered index and run the test again.

CREATE CLUSTERED INDEX PK_MagicNumbers_Id ON dbo.MagicNumbers(Id)
WITH (FILLFACTOR = 80);
GO

I’ve rerun this test at least a dozen times and have yet to encounter another deadlock. Another important piece of information to add is that when I asked the team if there was a reason they didn’t include a clustered index, they replied, “No, I think we just missed it.”

Clean Up

When you finish with the demo database, execute the statement below to drop the database.

/*
 * MSSQLTips.com
 * Clean up when you are finished
*/
USE [master];
GO
IF DB_ID('SolvingDeadlocksDemo') IS NOT NULL
BEGIN
    ALTER DATABASE SolvingDeadlocksDemo SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE SolvingDeadlocksDemo;
END;
GO

Summary

To sum up the moral of the story, heaps aren’t your friend when it comes to updating and deleting data, even Microsoft agrees with this statement. Erik Darling mentioned in one of his training videos that “heaps are great for staging tables, but they are terrible for everything else.” I agree with this perspective, and if you’re performing updates and deletes, consider adding a clustered index if possible. Finally, don’t assume someone omitted a clustered index on purpose.

Next Steps

2 Comments

  1. Q: How do you ensure the transaction is applied on all the rows including the rows skipped by the READPAST hint?

    @Jared, would love to see this article –
    “Alternatively, you could implement additional logic to ensure SQL eventually deletes those rows. As I mentioned, I plan to explore this hint further in a future article.”

  2. Table Partitioning must help since order of Lock Escalation without partitioning is Row Lock > Page Lock > Table Lock.
    But with Partitioning , Row Lock > Page Lock > Partition Lock> Table Lock.
    This extra division of “Partition Lock” might subvert few deadlock instances.

    Thoughts or feedback?

    — In ‘thoughts’…
    Lonely Rogue

Leave a Reply

Your email address will not be published. Required fields are marked *