Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Scripts          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Compare SQL Server Page Splits for INT vs BIGINT vs GUID Data Types


By:   |   Updated: 2017-11-16   |   Comments (1)   |   Related: More > Database Design

Problem

A good point was brought up with a tip I wrote last month on using a GUID column for a primary key and how it can cause really poor performance over time due to the fact that using this column type will result in a lot of page splits.  In this tip we will run through basically the same steps as the original tip, but this time we will measure the page splits encountered when inserting data for each of the following datatypes: integer, bigint, GUID and sequential GUID.  We will also measure the impact that these page splits have on the SQL Server transaction log.

Solution

Before we dive into our page split demonstration I just want to give a quick overview on the two different types of page splits we can encounter in a SQL Server database as it pertains to clustered indexes and inserting data.  As we all know data in a SQL Server database is stored in pages.  Since each page is 8KB in size we are inevitably going to have to add and/or split a database page in order to add more data.  This leads us to the two types of page splits that can occur, one where we are adding a page to the end of the structure and one where the data needs to go in the middle of the structure, but the page is full so we actually have to split this full page into two pages.  In the case of the former since integer, bigint and the sequential GUID datatypes are ever increasing columns (assuming the integer types are identity values of some sort) we expect all or most of the page splits with these indexes to happen when adding a page to the end of the index structure which requires very little overhead and does not increase the level of fragmentation in the object.  The latter is the case which can lead to poor performance.  When a page split occurs in the middle of an objects structure it impacts:

  • INSERTs and UPDATEs while the page is actually being split during the DML operation
  • SELECT queries due to excessive fragmentation caused by the page splits

Since the random GUID column could have data added anywhere in the structure due to its random nature, it could be become very susceptible to some poor performance because of these bad page splits (assuming no other maintenance is done).

Now that we understand a little more about page splits letís run through a demonstration using a large number of inserts against tables with the different column types for the primary key.  Below is the T-SQL to setup the 4 tables.  Each table has a primary key column that is either an integer, bigint, GUID or sequential GUID.

CREATE TABLE testtable_int 
   ([pk_col] [int] not null primary key clustered, 
    [col1] [int], 
    [col2] [datetime], 
    [col3] [varchar](20)); 
CREATE TABLE testtable_bigint 
   ([pk_col] [bigint] not null primary key clustered, 
    [col1] [int], 
    [col2] [datetime], 
    [col3] [varchar](20)); 
CREATE TABLE testtable_guid 
   ([pk_col] [uniqueidentifier] not null primary key clustered, 
    [col1] [int], 
    [col2] [datetime], 
    [col3] [varchar](20)); 
CREATE TABLE testtable_seqguid 
   ([pk_col] [uniqueidentifier] default newsequentialid() not null primary key clustered, 
    [col1] [int], 
    [col2] [datetime], 
    [col3] [varchar](20)); 

We can use the following T-SQL to load each of these tables with 1,000,000 records which should be enough to see a pretty significant number of page splits.

-- data load 
DECLARE @val int 
DECLARE @pkguid uniqueidentifier 
SELECT @val=0 
WHILE @val < 1000000
BEGIN   
   INSERT INTO testtable_int (pk_col, col1, col2,  col3)  
      VALUES (@val,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); 
   
   INSERT INTO testtable_bigint (pk_col, col1, col2,  col3)  
      VALUES (@val,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); 
   
   SELECT @pkguid = newid(); 
   
   INSERT INTO testtable_guid (pk_col, col1, col2,  col3)  
      VALUES (@pkguid,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); 
   
   INSERT INTO testtable_seqguid ( col1, col2,  col3) 
      VALUES (round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR)); 
   
   SELECT @[email protected]+1 
END 
GO   

With the data loaded, we now need to find these bad page splits and luckily for us this information can be pulled from the SQL Server transaction log.  Assuming you are in full recovery mode and no log backup has been taken or you are in simple recovery mode and the vlf hasn't been overwritten we can use the undocumented function fn_dblog to query the SQL Server transaction log.  More details on this function can be found here.  Using the query from that link we can see in the chart below that as we suspected we have quite a few more bad page splits on the table with the random GUID primary key.  This also explains the large amount of fragmentation we were seeing in my original tip.  Below is the query we can use to inspect the SQL Server transaction log and the output of this query for our test tables.

SELECT
    COUNT(1) AS NumberOfSplits
    ,AllocUnitName
    ,Context
FROM
    fn_dblog(NULL,NULL)
WHERE
    Operation = 'LOP_DELETE_SPLIT'
GROUP BY
    AllocUnitName, Context
ORDER BY
    NumberOfSplits DESC
   
NumberOfSplits AllocUnitName Context
9044 dbo.testtable_guid.PK__testtabl__CF31D692BC64152C LCX_CLUSTERED
41 dbo.testtable_seqguid.PK__testtabl__CF31D6922AC4B1C1 LCX_INDEX_INTERIOR
38 dbo.testtable_guid.PK__testtabl__CF31D692BC64152C LCX_INDEX_INTERIOR
23 dbo.testtable_bigint.PK__testtabl__CF31D6927EBF9CD7 LCX_INDEX_INTERIOR
16 dbo.testtable_int.PK__testtabl__CF31D692E6421A60 LCX_INDEX_INTERIOR

Now that we know how many page splits occurred for each datatype let take a look at the effect that these page splits have on the amount of space required in the SQL Server transaction log.  I ran through the data load script a second time (after truncating the tables), but this I time I ran each table load separately in its own loop so I could measure the transaction log usage using DBCC SQLPERF (LOGSPACE).  You can see from the results below that the page splits caused by the random GUID column required a lot more transaction log space compared to the amount generated by the other 3 column types.  It's especially interesting to see how much more space it used compared with the table that used the sequential GUID since these columns require the exact same amount of space (16 bytes), but the table that was loaded with a random GUID used much more transaction log space.

Column Type Transaction Log Usage
GUID 237 MB
Integer 138 MB
BigInt 146 MB
Sequential GUID 155 MB
Next Steps


Last Updated: 2017-11-16


get scripts

next tip button



About the author
MSSQLTips author Ben Snaidero Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on performance tuning.

View all my tips




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Friday, January 25, 2019 - 4:56:44 PM - Matthew Peterson Back To Top

Thanks for writing this follow up post.† I was researching this issue and I'm glad you took the time to answer this question.† Any chance you could post a link to this on your previous post?


Learn more about SQL Server tools