SQL Server 2022 TempDB Contention Enhancements with Page Latch Concurrency

By:   |   Updated: 2024-06-19   |   Comments (2)   |   Related: > Performance Tuning


Problem

Tempdb often acts as a high-traffic repository in SQL Server, experiencing significant contention. This includes not only temp table usage but also processes such as triggers, worktables for storing intermediate results for spools, cursors, sorts, work files for hash join, and temporary large object (LOB) storage, just to name a few.

A prominent issue is Global Allocation Map (GAM) and Shared Global Allocation Map (SGAM) system page latch contention, which can be particularly problematic under specific high concurrency workloads.

Solution

SQL Server 2022 offers many impressive new features, some more widely recognized than others. One lesser-known announcement is the enhancement of the system page latch concurrency.

The introduction of latch-free GAM pages in SQL Server 2022 is a game-changer for certain workload patterns that are not easily tunable. However, this feature is not available in earlier versions of SQL Server, making an upgrade necessary to benefit from this improvement.

Demonstration

In this tip, we will run a workload stress test using Ostress.exe on SQL Server 2019 and repeat the same test on SQL Server 2022 to collect latch-related metrics.

SQL Server TempDB File Configuration

The screenshot below is the tempdb data and transaction log file configuration on both SQL Server 2019 and SQL Server 2022.

There is only a single data file, which is far from Microsoft's best practices recommendations.

TempDB File Configuration A single data file and transaction log file

Workload Stress Test Setup

The script below will create a new database and a stored procedure, generating tempdb contention on the GAM page when executed in high concurrency. We won't worry about the SGAM system page because SQL Server 2016 and higher has made GAM page allocation the default behavior.

CREATE DATABASE TestGamContention
GO
USE TestGamContention
GO
CREATE OR ALTER PROCEDURE dbo.GenerateGamContention
AS
BEGIN
    SET NOCOUNT ON;

    exec sp_executesql N'
    CREATE TABLE #TestGamContention
    (TransactionTimestamp DATETIME2, Quantity tinyint);

    INSERT INTO #TestGamContention (TransactionTimestamp, Quantity)
    SELECT GETDATE() AS TransactionTimestamp, 1 AS Quantity'
END
GO

Stress Test Command

The command below is run from the command prompt, creating 100 threads and repeat the workload 300 times.

Ostress.exe -Slocalhost -E -Q"exec dbo.GenerateGamContention" -n100 -r300 -dTestGamContention

Monitoring Script

The script below would show waiting tasks that have latch-related wait types. The ResourceType column would show page id type - PFS, GAM or SGAM.

; WITH waits AS (
        SELECT  
            os.session_id,
            os.wait_type,
            os.wait_duration_ms,
            os.blocking_session_id,
            os.resource_description,
            CONVERT(INT, REPLACE(PARSENAME(REPLACE(os.resource_description, ':', '.'), 1), '.', ':')) PageId,
            ExecutingSQL = 
            SUBSTRING(t.text,(r.statement_start_offset/2)+1,
                CASE 
                    WHEN r.statement_end_offset=-1 OR r.statement_end_offset=0 THEN (DATALENGTH(t.Text)-r.statement_start_offset/2)+1
                    ELSE (r.statement_end_offset-r.statement_start_offset)/2+1
                    END)
        FROM sys.dm_os_waiting_tasks os
        JOIN sys.dm_exec_requests r ON os.session_id = r.session_id
        CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
        WHERE os.wait_type LIKE 'PAGE%LATCH_%'
        AND os.resource_description LIKE '2:%'
            )
SELECT 
   w.session_id,
   w.wait_type,
   w.wait_duration_ms,
   w.blocking_session_id,
   w.resource_description,
   w.PageId,
   ResourceType = 
   CASE
       WHEN w.PageID = 1 OR w.PageID % 8088 = 0 THEN 'PFS Page'
       WHEN w.PageID = 2 OR w.PageID % 511232 = 0 THEN 'GAM Page'
       WHEN w.PageID = 3 OR (w.PageID - 1) % 511232 = 0 THEN 'SGAM Page'
       ELSE 'Not PFS, GAM, SGAM page'
   END,
   w.PageId,
   ResourceObject = object_name(pg.object_id),
   w.ExecutingSQL
FROM waits w
OUTER APPLY sys.dm_db_page_info (2, 1, w.PageId, DEFAULT) pg;

Workload Stress Test on SQL Server 2019

When the workload stress test is run without the in-memory tempdb optimization feature enabled, you encounter contention on the tempdb metadata as the first bottleneck. On my laptop with an i9-10885H CPU @ 2.40GHz, 8 cores, 16 logical processors, NVMe drives, and 64 GB memory, this workload took ~13 seconds to complete the iteration.

First Workload Stress Test on SQL Server 2019 Workload took ~13 seconds without memory-optimized tempdb metadata feature enabled

The resource object contention is on sysallocunits.

Metadata Contention on TempDB Due to memory-optimized tempdb metadata feature not enabled

We will not dwell on metadata contention since the focus of this tip is on GAM contention. We will memory-optimize the tempdb metadata feature by executing the command below and then restart the SQL Server service in order to take effect.

USE master
GO
ALTER SERVER CONFIGURATION SET MEMORY_OPTIMIZED TEMPDB_METADATA = ON;
GO

We will clear all the wait stats on the SQL Server before re-running the workload.

DBCC SQLPERF('sys.dm_os_wait_stats',CLEAR);

TempDb Latch Contention on GAM Page

Re-running the workload now will show contention on the GAM page. A key point to note is that in-memory tempdb optimization does not resolve the GAM contention issue in SQL Server 2019.

GAM page latch contention on TempDB Page id 2 is GAM page

Following Microsoft best practices, you might be tempted to keep adding more same-sized tempdb data files, but this only helps to a point when a permanent solution is devised and implemented to fix the root cause.

The Ostress.exe command is executed three times.

Workload duration using Ostress.exe Each iteration took ~10 seconds to complete even with memory-optimized tempdb metadata feature enabled

The perfmon screenshot below captures the relevant metrics and the workload is executed three times. Each workload iteration finishes consistently in ~10 seconds. The highest throughput Batch Requests/sec observed is ~3,100 requests/sec.

Perfmon monitor The highest Batch Requests/sec is ~3100 requests/sec

Monitoring the GAM contention in real-time, the wait type shows PAGELATCH_UP. But from sys.dm_os_wait_stats, the prevalent wait type is PAGELATCH_EX. The two wait types are related as:

  • PAGELATCH_UP: Multiple threads trying to change the same bitmap.
  • PAGELATCH_EX: Threads trying to insert onto the same page, and at the same time, PAGELATCH_SH for threads trying to read from the page.

So, while the updated GAM is having contention to record extents to be used for a temp table, the waits are very high on even more threads trying to just get into the GAM page.

SELECT *, (wait_time_ms + signal_wait_time_ms) / 1000 total_wait_time_sec
FROM sys.dm_os_wait_stats
WHERE wait_type LIKE '%LATCH%'
ORDER BY waiting_tasks_count DESC
GO
Latch wait types from sys.dm_os_wait_stats Prevalent latch waits on PAGELATCH. Faster storage subsystem would not help since this occurs in the memory region.

Workload Stress Test on SQL Server 2022

On the same hardware and configuration on SQL Server 2022, we first enable memory-optimized tempdb metadata prior to starting the workload; otherwise, metadata contention would be the first bottleneck we will encounter.

The same workload is executed three times, and each iteration finishes in ~7 seconds.

Workload duration using Ostress.exe Each iteration took ~7 seconds to complete with memory-optimized tempdb metadata feature enabled, and using the exact same code.

You might not think this is an impressive performance improvement in terms of run duration, but the performance metrics disagree. The Batch Requests/sec can reach a high of 4800 requests/sec, which is a ~55% improvement over SQL Server 2019, which achieved a high of 3100 requests/sec.

Perfmon monitor The highest Batch Requests/sec is ~4800 requests/sec

If you are interested in the latch wait types in the perfmon, they are due to some wait on memory structure and updating pages on the storage disk.

Latch wait types from sys.dm_os_wait_stats There is impressively no PAGELATCH wait types at all

Alleviate TempDb Latch Contention

If you pay close attention to the stored procedure that generates the tempdb GAM contention, there is one distinct coding style to note: the usage of sp_executesql.

The fine print on using sp_executesql is that while it allows query plan caching, it does not allow temp table caching. This means each execution would force SQL Server to explicitly create the temp table, and this is equivalent to running the stored procedure with recompile. Executing such a pattern in high concurrency is bound to overwhelm the tempdb.

If you have a chance to rewrite the stored procedure on SQL Server 2019, remove the dynamic query from the stored procedure, allowing the workload to execute in ~3 seconds.

USE TestGamContention
GO
CREATE OR ALTER PROCEDURE dbo.AlleviateGamContention
AS
BEGIN
    SET NOCOUNT ON;

    CREATE TABLE #AlleviateGamContention
    (TransactionTimestamp DATETIME2, Quantity tinyint);

    INSERT INTO #AlleviateGamContention (TransactionTimestamp, Quantity)
    SELECT GETDATE() AS TransactionTimestamp, 1 AS Quantity
END
GO

There will still be GAM contention, but the contention is so much better compared to the original code.

Latch wait types from sys.dm_os_wait_stats There are still PAGELATCH latch wait with the updated code, but the tempdb contention is a lot lesser

If you are wondering about this updated workload duration in SQL Server 2022, each iteration finishes in ~1 second.

Conclusion

Memory-optimized tempdb metadata only addresses metadata contention. System pages, such as GAM and SGAM page contention, are only fixed in SQL Server 2022.

However, there are ways developers can alleviate tempdb contention on SQL Server 2019 and earlier versions with coding techniques that minimize tempdb use.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Simon Liew Simon Liew is an independent SQL Server Consultant in Sydney, Australia. He is a Microsoft Certified Master for SQL Server 2008 and holds a Masterís Degree in Distributed Computing.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2024-06-19

Comments For This Article




Thursday, June 27, 2024 - 8:44:32 PM - Simon Liew Back To Top (92350)
Hi Jan,
Memory optimized tempdb metadata memory feature at times is not appropriate for certain workload type. Assume you don't have control over long running explicit open transaction involving DDL on temp tables, or the instance require to cater for many long running queries involving temp table, spills, sorts and so on just a name a few. This would then cause the memory footprint to keep growing and not back down for the instance, and this are the example scenarios where Memory optimized tempdb metadata memory feature is not suitable to be enabled.

Wednesday, June 19, 2024 - 4:28:16 PM - Jan Back To Top (92326)
I had a situation where when Memory Optimized Tempdb Metadata was enabled, on a production SQL 2019 instance, stolen memory would gradually grow from 50gb to 600 or 700+ GB and eventually crash SQL server.

The CRM application makes heavy use of tempdb, stored proc ls and dynamic SQL.

I wonder if you noticed any significant increase in stolen memory footprint in either 2019 or 2022 instance.

Thanks.














get free sql tips
agree to terms