SQL Server Performance Comparison INT versus GUID

By: Ben Snaidero | Comments (11) | Related: > Database Design

Problem

Over the years I have read quite a few articles about which datatypes should be used for primary keys. At some point the conversation ends up with one side thinking an integer is a better option and the other side leaning more towards using a GUID.

In this tip we will aim to identify any performance differences between these two datatypes. We will also look at a couple other index characteristics associated with each type to see if there are any other benefits that might outweigh any performance differences.

Solution

As many of my tips deal with SQL Server performance we will test the performance of these two datatypes by running through a simple test case to see if we can measure any significant performance differences. We will setup two almost identical tables with the only variance being the datatype of the primary key column.

Below is the T-SQL to setup and load these tables.

CREATE TABLE testtable_int ([col1] [int] NOT NULL primary key clustered,
                            [col2] [int] NULL,
                            [col3] [int] NULL,
                            [col4] [varchar](50) NULL); 

CREATE TABLE testtable_guid ([col1] [uniqueidentifier] NOT NULL primary key clustered,
                             [col2] [int] NULL,
                             [col3] [int] NULL,
                             [col4] [varchar](50) NULL); 
DECLARE @val INT
SELECT @val=1
WHILE @val < 5000000
BEGIN  
   INSERT INTO testtable_int (col1, col2,  col3, col4) 
     VALUES (@val,round(rand()*100000,0),round(rand()*100000,0),'TEST' + CAST(@val AS VARCHAR))

   INSERT INTO testtable_guid (col1, col2,  col3, col4) 
     VALUES (newid(),round(rand()*100000,0),round(rand()*100000,0),'TEST' + CAST(@val AS VARCHAR))

   SELECT @val=@val+1
END
GO

Before we take a look at the performance let's first take a look at some of the other index characteristics, disk space usage and fragmentation. Given that an integer requires 4 bytes and a GUID uses 16 bytes I would expect that the index on the GUID column would be much larger. Using the query below let's take a look at just how much more disk space it uses.

SELECT OBJECT_NAME(i.[object_id]) AS TableName,
       i.[name] AS IndexName,
       SUM(s.[used_page_count]) * 8 AS IndexSizeKB
FROM sys.dm_db_partition_stats AS s
INNER JOIN sys.indexes AS i ON s.[object_id] = i.[object_id] AND s.[index_id] = i.[index_id]
WHERE OBJECT_NAME(i.[object_id]) like '%testtable%'
GROUP BY i.[name],i.[object_id];

TableName	IndexName	IndexSizeKB
testtable_int	PK__testtabl__357D0D3E194A02AE	178056
testtable_guid	PK__testtabl__357D0D3E65B8D323	342456

As would be expected the index on the GUID column is using quite a bit more space. But one thing we should consider when looking at this is whether all this extra space used is due to the fact that it's a 4 byte datatype vs 16 byte datatype or if there is there some fragmentation that is contributing to the extra disk space being used. I would suspect that since the GUID values are random there would be quite a bit more fragmentation with this datatype versus the integer which is ever increasing. Let's take a look at the fragmentation of each index using the T-SQL below.

SELECT OBJECT_NAME(ind.OBJECT_ID) AS TableName, 
       ind.name AS IndexName, 
       indexstats.index_type_desc AS IndexType, 
       indexstats.avg_fragmentation_in_percent 
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) indexstats 
INNER JOIN sys.indexes ind ON ind.object_id = indexstats.object_id AND ind.index_id = indexstats.index_id 
WHERE OBJECT_NAME(ind.OBJECT_ID) like '%testtable%'
ORDER BY indexstats.avg_fragmentation_in_percent DESC

TableName	IndexName	IndexType	Avg_fragmentation_in_percent
testtable_int	PK__testtabl__357D0D3E194A02AE	CLUSTERED INDEX	0.37432913904298
testtable_guid	PK__testtabl__357D0D3E65B8D323	CLUSTERED INDEX	99.1926777751701

The above output confirms our assumption, the index on the GUID column does have quite a bit more fragmentation. Before we fix this let's first run through some simple DML statements to see if there are any performance differences with the indexes in this state. As always we can use SQL Profiler to capture the performance metrics for each statement. For my test I ran these queries 10 times restoring a baseline backup of the original tables each time in order to rule out any anomalies. I also ran a DBCC DROPCLEANBUFFERS before each execution to simulate the worst case scenario of the data having to be read from disk. Note that if you use the T-SQL below you'll need to update the GUID in the SELECT, UPDATE and DELETE statements with values that were generated when you ran the insert script above.

-- test select
SELECT * FROM testtable_int WHERE col1=4789341
GO
SELECT * FROM testtable_guid WHERE col1='CDD450DD-0048-4E3A-A074-7BF57D4C5C91'
GO

-- test insert
INSERT INTO testtable_int (col1, col2,  col3, col4) 
   VALUES (5000001,round(rand()*100000,0),round(rand()*100000,0),'TEST' + CAST(5000001 AS VARCHAR))
GO
INSERT INTO testtable_guid (col1, col2,  col3, col4) 
   VALUES (newid(),round(rand()*100000,0),round(rand()*100000,0),'TEST' + CAST(5000001 AS VARCHAR))
GO

-- test update
UPDATE testtable_int SET col2=232342,col3=232340,col4='TESTUPDATE'
WHERE col1=3023481
GO
UPDATE testtable_guid SET col2=232342,col3=232340,col4='TESTUPDATE'
WHERE col1='FA7B4737-70ED-49EA-BD26-19B737294C9D'
GO

-- test delete
DELETE FROM testtable_int WHERE col1=2789341
GO
DELETE FROM testtable_guid WHERE col1='EE053ACD-99C3-49C3-B11A-2D18F8248D29'
GO

TableName	DML Operation	CPU (ms)	Reads	Writes	Duration (ms)
testtable_int	SELECT	171	1447	6	257
testtable_guid	SELECT	187	2397	5	341
testtable_int	INSERT	0	8	1	3
testtable_guid	INSERT	0	8	1	2
testtable_int	UPDATE	0	12	1	2
testtable_guid	UPDATE	0	12	1	3
testtable_int	DELETE	0	14	2	2
testtable_guid	DELETE	0	14	2	3

Well this is a bit of a surprise. Both indexes have very similar performance with only the SELECT performance of the integer datatype showing any noticeable edge over the GUID datatype. Let's rebuild each index using T-SQL below and see if anything changes from these initial results.

alter index PK__testtabl__357D0D3E194A02AE on testtable_int rebuild;

alter index PK__testtabl__357D0D3E65B8D323 on testtable_guid rebuild;

With these indexes rebuilt let's again take a look at the storage and fragmentation of each index using the queries we used above.

TableName	IndexName	IndexType	Avg_fragmentation_in_percent
testtable_int	PK__testtabl__357D0D3E194A02AE	CLUSTERED INDEX	0. 0315685036529268
testtable_guid	PK__testtabl__357D0D3E65B8D323	CLUSTERED INDEX	0.01

TableName	IndexName	IndexSizeKB
testtable_int	PK__testtabl__357D0D3E194A02AE	177744
testtable_guid	PK__testtabl__357D0D3E65B8D323	237440

We can see from these results there is no longer any index fragmentation with the index on the GUID column. Also notice that even though we recovered quite a bit of space from index on the guid column it is still using a fair bit more than the index on the integer column. Let's run our DML statements again and see if the performance has changed at all from our initial test.

TableName	DML Operation	Reads	Writes	Duration (ms)
testtable_int	SELECT	83	1	12
testtable_guid	SELECT	83	1	15
testtable_int	INSERT	8	1	2
testtable_guid	INSERT	13	2	2
testtable_int	UPDATE	12	1	2
testtable_guid	UPDATE	12	1	3
testtable_int	DELETE	14	2	2
testtable_guid	DELETE	14	4	4

With the indexes rebuilt we now see almost no performance difference between the two datatypes. Assuming you are running regular maintenance on your indexes this simple test shows that you achieve pretty much identical performance between the two datatypes. One thing to also note is that the GUID column also gives you the added benefit of uniqueness across systems. If any of your applications merge data generated from different systems then using a GUID column as your key allows you to not have to worry about resolving any duplicates when you merge the data.

Finally, as with any performance testing, you should always test with your own data to ensure you are seeing the expected results before making any changes in your production environment.

Next Steps

Extend testing further to also compare using sequential GUIDs and the bigint datatype
Read other tips on indexing
Read other tips on performance tuning
Here is a link to a related post.

About the author

Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on performance tuning.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Saturday, July 30, 2022 - 7:19:21 AM - Robson	Back To Top (90325)
If privacy is a concern, do not use the NEWSEQUENTIALID function. It is possible to guess the value of the next generated GUID and, therefore, access data associated with that GUID.

Friday, September 29, 2017 - 12:24:44 PM - Ben Snaidero	Back To Top (66730)
Dataset had 5,000,000 records. The script is provided at the start of the tip so you can change and rerun if you like. I am not a fan of GUIDs either and originally was thinking the performance would be much worse but was surprised when I saw the results.

SQL Server Performance Comparison INT versus GUID

Problem

Solution

Next Steps

About the author

Comments For This Article