Compare SQL Server Page Splits for INT vs BIGINT vs GUID Data Types

By: Ben Snaidero | Comments (2) | Related: > Database Design

Problem

A good point was brought up with a tip I wrote last month on using a GUID column for a primary key and how it can cause really poor performance over time due to the fact that using this column type will result in a lot of page splits. In this tip we will run through basically the same steps as the original tip, but this time we will measure the page splits encountered when inserting data for each of the following datatypes: integer, bigint, GUID and sequential GUID. We will also measure the impact that these page splits have on the SQL Server transaction log.

Solution

Before we dive into our page split demonstration I just want to give a quick overview on the two different types of page splits we can encounter in a SQL Server database as it pertains to clustered indexes and inserting data. As we all know data in a SQL Server database is stored in pages. Since each page is 8KB in size we are inevitably going to have to add and/or split a database page in order to add more data. This leads us to the two types of page splits that can occur, one where we are adding a page to the end of the structure and one where the data needs to go in the middle of the structure, but the page is full so we actually have to split this full page into two pages. In the case of the former since integer, bigint and the sequential GUID datatypes are ever increasing columns (assuming the integer types are identity values of some sort) we expect all or most of the page splits with these indexes to happen when adding a page to the end of the index structure which requires very little overhead and does not increase the level of fragmentation in the object. The latter is the case which can lead to poor performance. When a page split occurs in the middle of an objects structure it impacts:

INSERTs and UPDATEs while the page is actually being split during the DML operation
SELECT queries due to excessive fragmentation caused by the page splits

Since the random GUID column could have data added anywhere in the structure due to its random nature, it could be become very susceptible to some poor performance because of these bad page splits (assuming no other maintenance is done).

Now that we understand a little more about page splits let’s run through a demonstration using a large number of inserts against tables with the different column types for the primary key. Below is the T-SQL to setup the 4 tables. Each table has a primary key column that is either an integer, bigint, GUID or sequential GUID.

CREATE TABLE testtable_int 
   ([pk_col] [int] not null primary key clustered, 
    [col1] [int], 
    [col2] [datetime], 
    [col3] [varchar](20)); 
CREATE TABLE testtable_bigint 
   ([pk_col] [bigint] not null primary key clustered, 
    [col1] [int], 
    [col2] [datetime], 
    [col3] [varchar](20)); 
CREATE TABLE testtable_guid 
   ([pk_col] [uniqueidentifier] not null primary key clustered, 
    [col1] [int], 
    [col2] [datetime], 
    [col3] [varchar](20)); 
CREATE TABLE testtable_seqguid 
   ([pk_col] [uniqueidentifier] default newsequentialid() not null primary key clustered, 
    [col1] [int], 
    [col2] [datetime], 
    [col3] [varchar](20));

We can use the following T-SQL to load each of these tables with 1,000,000 records which should be enough to see a pretty significant number of page splits.

-- data load 
DECLARE @val int 
DECLARE @pkguid uniqueidentifier 
SELECT @val=0 
WHILE @val < 1000000
BEGIN   
   INSERT INTO testtable_int (pk_col, col1, col2,  col3)  
      VALUES (@val,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); 
   
   INSERT INTO testtable_bigint (pk_col, col1, col2,  col3)  
      VALUES (@val,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); 
   
   SELECT @pkguid = newid(); 
   
   INSERT INTO testtable_guid (pk_col, col1, col2,  col3)  
      VALUES (@pkguid,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); 
   
   INSERT INTO testtable_seqguid ( col1, col2,  col3) 
      VALUES (round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); 
   
   SELECT @val=@val+1 
END 
GO

With the data loaded, we now need to find these bad page splits and luckily for us this information can be pulled from the SQL Server transaction log. Assuming you are in full recovery mode and no log backup has been taken or you are in simple recovery mode and the vlf hasn't been overwritten we can use the undocumented function fn_dblog to query the SQL Server transaction log. More details on this function can be found here. Using the query from that link we can see in the chart below that as we suspected we have quite a few more bad page splits on the table with the random GUID primary key. This also explains the large amount of fragmentation we were seeing in my original tip. Below is the query we can use to inspect the SQL Server transaction log and the output of this query for our test tables.

SELECT
    COUNT(1) AS NumberOfSplits
    ,AllocUnitName
    ,Context
FROM
    fn_dblog(NULL,NULL)
WHERE
    Operation = 'LOP_DELETE_SPLIT'
GROUP BY
    AllocUnitName, Context
ORDER BY
    NumberOfSplits DESC

NumberOfSplits	AllocUnitName	Context
9044	dbo.testtable_guid.PK__testtabl__CF31D692BC64152C	LCX_CLUSTERED
41	dbo.testtable_seqguid.PK__testtabl__CF31D6922AC4B1C1	LCX_INDEX_INTERIOR
38	dbo.testtable_guid.PK__testtabl__CF31D692BC64152C	LCX_INDEX_INTERIOR
23	dbo.testtable_bigint.PK__testtabl__CF31D6927EBF9CD7	LCX_INDEX_INTERIOR
16	dbo.testtable_int.PK__testtabl__CF31D692E6421A60	LCX_INDEX_INTERIOR

Now that we know how many page splits occurred for each datatype let take a look at the effect that these page splits have on the amount of space required in the SQL Server transaction log. I ran through the data load script a second time (after truncating the tables), but this I time I ran each table load separately in its own loop so I could measure the transaction log usage using DBCC SQLPERF (LOGSPACE). You can see from the results below that the page splits caused by the random GUID column required a lot more transaction log space compared to the amount generated by the other 3 column types. It's especially interesting to see how much more space it used compared with the table that used the sequential GUID since these columns require the exact same amount of space (16 bytes), but the table that was loaded with a random GUID used much more transaction log space.

Column Type	Transaction Log Usage
GUID	237 MB
Integer	138 MB
BigInt	146 MB
Sequential GUID	155 MB

Next Steps

Read other tips comparing INT and GUID data types:
- SQL Server Performance Comparison – INT vs GUID
- Compare SQL Server Table Join Performance for INT vs BIGINT vs GUID Data Types
Read tips on fill factors to reduce page splits:
- Understanding SQL Server Fill Factor Settings
- How to avoid SQL Server fragmentation
Read more on tracking page splits:

About the author

Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on performance tuning.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Sunday, October 31, 2021 - 9:40:59 PM - Jeff Moden	Back To Top (89389)
Ben, Nice article but watch the following 'tube. It'll seriously change your mind about what has been said in this article. To summarize, GUID Fragmentation is actually a myth. GUIDs aren't the problem... WE ARE! https://www.youtube.com/watch?v=qfQtY17bPQ4

Friday, January 25, 2019 - 4:56:44 PM - Matthew Peterson	Back To Top (78887)
Thanks for writing this follow up post. I was researching this issue and I'm glad you took the time to answer this question. Any chance you could post a link to this on your previous post?

Compare SQL Server Page Splits for INT vs BIGINT vs GUID Data Types

Problem

Solution

Next Steps

About the author

Comments For This Article