SQL Server Performance Comparison INT versus GUID

By:   |   Comments (11)   |   Related: > Database Design


Problem

Over the years I have read quite a few articles about which datatypes should be used for primary keys.  At some point the conversation ends up with one side thinking an integer is a better option and the other side leaning more towards using a GUID. 

In this tip we will aim to identify any performance differences between these two datatypes.  We will also look at a couple other index characteristics associated with each type to see if there are any other benefits that might outweigh any performance differences. 

Solution

As many of my tips deal with SQL Server performance we will test the performance of these two datatypes by running through a simple test case to see if we can measure any significant performance differences.  We will setup two almost identical tables with the only variance being the datatype of the primary key column. 

Below is the T-SQL to setup and load these tables.

CREATE TABLE testtable_int ([col1] [int] NOT NULL primary key clustered,
                            [col2] [int] NULL,
                            [col3] [int] NULL,
                            [col4] [varchar](50) NULL); 

CREATE TABLE testtable_guid ([col1] [uniqueidentifier] NOT NULL primary key clustered,
                             [col2] [int] NULL,
                             [col3] [int] NULL,
                             [col4] [varchar](50) NULL); 
DECLARE @val INT
SELECT @val=1
WHILE @val < 5000000
BEGIN  
   INSERT INTO testtable_int (col1, col2,  col3, col4) 
     VALUES (@val,round(rand()*100000,0),round(rand()*100000,0),'TEST' + CAST(@val AS VARCHAR))

   INSERT INTO testtable_guid (col1, col2,  col3, col4) 
     VALUES (newid(),round(rand()*100000,0),round(rand()*100000,0),'TEST' + CAST(@val AS VARCHAR))

   SELECT @val=@val+1
END
GO 

Before we take a look at the performance let's first take a look at some of the other index characteristics, disk space usage and fragmentation.  Given that an integer requires 4 bytes and a GUID uses 16 bytes I would expect that the index on the GUID column would be much larger.  Using the query below let's take a look at just how much more disk space it uses.

SELECT OBJECT_NAME(i.[object_id]) AS TableName,
       i.[name] AS IndexName,
       SUM(s.[used_page_count]) * 8 AS IndexSizeKB
FROM sys.dm_db_partition_stats AS s
INNER JOIN sys.indexes AS i ON s.[object_id] = i.[object_id] AND s.[index_id] = i.[index_id]
WHERE OBJECT_NAME(i.[object_id]) like '%testtable%'
GROUP BY i.[name],i.[object_id];
TableName IndexName IndexSizeKB
testtable_int PK__testtabl__357D0D3E194A02AE 178056
testtable_guid PK__testtabl__357D0D3E65B8D323 342456

As would be expected the index on the GUID column is using quite a bit more space.  But one thing we should consider when looking at this is whether all this extra space used is due to the fact that it's a 4 byte datatype vs 16 byte datatype or if there is there some fragmentation that is contributing to the extra disk space being used.  I would suspect that since the GUID values are random there would be quite a bit more fragmentation with this datatype versus the integer which is ever increasing.  Let's take a look at the fragmentation of each index using the T-SQL below.

SELECT OBJECT_NAME(ind.OBJECT_ID) AS TableName, 
       ind.name AS IndexName, 
       indexstats.index_type_desc AS IndexType, 
       indexstats.avg_fragmentation_in_percent 
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) indexstats 
INNER JOIN sys.indexes ind ON ind.object_id = indexstats.object_id AND ind.index_id = indexstats.index_id 
WHERE OBJECT_NAME(ind.OBJECT_ID) like '%testtable%'
ORDER BY indexstats.avg_fragmentation_in_percent DESC 
TableName IndexName IndexType Avg_fragmentation_in_percent
testtable_int PK__testtabl__357D0D3E194A02AE CLUSTERED INDEX 0.37432913904298  
testtable_guid PK__testtabl__357D0D3E65B8D323 CLUSTERED INDEX 99.1926777751701

The above output confirms our assumption, the index on the GUID column does have quite a bit more fragmentation.  Before we fix this let's first run through some simple DML statements to see if there are any performance differences with the indexes in this state.  As always we can use SQL  Profiler to capture the performance metrics for each statement.  For my test I ran these queries 10 times restoring a baseline backup of the original tables each time in order to rule out any anomalies.  I also ran a DBCC DROPCLEANBUFFERS before each execution to simulate the worst case scenario of the data having to be read from disk.  Note that if you use the T-SQL below you'll need to update the GUID in the SELECT, UPDATE and DELETE statements with values that were generated when you ran the insert script above.

-- test select
SELECT * FROM testtable_int WHERE col1=4789341
GO
SELECT * FROM testtable_guid WHERE col1='CDD450DD-0048-4E3A-A074-7BF57D4C5C91'
GO

-- test insert
INSERT INTO testtable_int (col1, col2,  col3, col4) 
   VALUES (5000001,round(rand()*100000,0),round(rand()*100000,0),'TEST' + CAST(5000001 AS VARCHAR))
GO
INSERT INTO testtable_guid (col1, col2,  col3, col4) 
   VALUES (newid(),round(rand()*100000,0),round(rand()*100000,0),'TEST' + CAST(5000001 AS VARCHAR))
GO

-- test update
UPDATE testtable_int SET col2=232342,col3=232340,col4='TESTUPDATE'
WHERE col1=3023481
GO
UPDATE testtable_guid SET col2=232342,col3=232340,col4='TESTUPDATE'
WHERE col1='FA7B4737-70ED-49EA-BD26-19B737294C9D'
GO

-- test delete
DELETE FROM testtable_int WHERE col1=2789341
GO
DELETE FROM testtable_guid WHERE col1='EE053ACD-99C3-49C3-B11A-2D18F8248D29'
GO 
TableName DML Operation CPU (ms) Reads Writes Duration (ms)
testtable_int SELECT 171 1447 6 257
testtable_guid SELECT 187 2397 5 341
testtable_int INSERT 0 8 1 3
testtable_guid INSERT 0 8 1 2
testtable_int UPDATE 0 12 1 2
testtable_guid UPDATE 0 12 1 3
testtable_int DELETE 0 14 2 2
testtable_guid DELETE 0 14 2 3

Well this is a bit of a surprise.  Both indexes have very similar performance with only the SELECT performance of the integer datatype showing any noticeable edge over the GUID datatype.  Let's rebuild each index using T-SQL below and see if anything changes from these initial results.

alter index PK__testtabl__357D0D3E194A02AE on testtable_int rebuild;

alter index PK__testtabl__357D0D3E65B8D323 on testtable_guid rebuild;   

With these indexes rebuilt let's again take a look at the storage and fragmentation of each index using the queries we used above.

TableName IndexName IndexType Avg_fragmentation_in_percent
testtable_int PK__testtabl__357D0D3E194A02AE CLUSTERED INDEX 0. 0315685036529268
testtable_guid PK__testtabl__357D0D3E65B8D323 CLUSTERED INDEX 0.01


TableName IndexName IndexSizeKB
testtable_int PK__testtabl__357D0D3E194A02AE 177744
testtable_guid PK__testtabl__357D0D3E65B8D323 237440

We can see from these results there is no longer any index fragmentation with the index on the GUID column.  Also notice that even though we recovered quite a bit of space from index on the guid column it is still using a fair bit more than the index on the integer column.  Let's run our DML statements again and see if the performance has changed at all from our initial test.

TableName DML Operation CPU (ms) Reads Writes Duration (ms)
testtable_int SELECT 0 83 1 12
testtable_guid SELECT 0 83 1 15
testtable_int INSERT 0 8 1 2
testtable_guid INSERT 0 13 2 2
testtable_int UPDATE 0 12 1 2
testtable_guid UPDATE 0 12 1 3
testtable_int DELETE 0 14 2 2
testtable_guid DELETE 0 14 4 4

With the indexes rebuilt we now see almost no performance difference between the two datatypes.  Assuming you are running regular maintenance on your indexes this simple test shows that you achieve pretty much identical performance between the two datatypes.  One thing to also note is that the GUID column also gives you the added benefit of uniqueness across systems.  If any of your applications merge data generated from different systems then using a GUID column as your key allows you to not have to worry about resolving any duplicates when you merge the data.

Finally, as with any performance testing, you should always test with your own data to ensure you are seeing the expected results before making any changes in your production environment.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Ben Snaidero Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on performance tuning.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Saturday, May 6, 2023 - 5:12:02 PM - Robert Back To Top (91161)
Thank you Ben for your interesting quantitative analysis. My view of this may differ somewhat/ The issue is never with the insert statements those will always be similar due to the very slight difference in the amount of data being written. In your analysis, the 25% degradation of the single select, however, does seem to be significant. While 25% on a single table, I have not encountered many situations where we are doing selects from just one table for most enterprise situations. In these situations, the majority of reads are from rather complicated joins that are needed to aggregate the data for the business purpose at hand. When the impact of the multiple joins (typically 3 to 5 needed and in many cases, stored procedures have many of those combinations, you end up with an unacceptable penalty that is difficult to justify. There are 3 paradigms that should be observed in order. Performance 1st, Maintainability 2nd, Cost 3rd (because #1 and #2 directly impact #3 over time). This analysis has confirmed for me at least, that Guids should not be used as PK unless there is some other overriding purpose to them beyond just allowing to implement the Domain Driven Paradigm without critical thinking of the practical impact of that design.

Saturday, July 30, 2022 - 7:19:21 AM - Robson Back To Top (90325)
If privacy is a concern, do not use the NEWSEQUENTIALID function. It is possible to guess the value of the next generated GUID and, therefore, access data associated with that GUID.

Thursday, August 2, 2018 - 10:30:48 AM - Michael Kröschel Back To Top (76945)

HI Ben

nice articel! But you should test this with two tables where have to you across the string columns the you will see a big difference.

Regards

Michael


Saturday, September 30, 2017 - 7:00:32 PM - JRStern Back To Top (66756)

Both ints and GUIDS are only surrogate keys, doesn't anyone even try to do natural keys anymore? 

As far as the datatypes themselves, Microsoft has long claimed that the performance is close, plus or minus the space difference, and whenever I've measuredthat's what I've found.

Ints are far more common, identity ints, in my experience.  Easier to type as you do stuff, too.  Probably sequences would be better than pure ints, or any trick that helps associate a naked key value with the table it comes from.


Friday, September 29, 2017 - 7:58:07 PM - Trip Back To Top (66740)

 

I tested using NEWSEQUENTIALID() and IDENTITY in SQL 2017RC2 on Windows 10 

 

When comparing BIGINT to UNIQUEIDENTIFIER, and both types of compression, there is no difference in performance:

 

https://imgur.com/a/AMITw

 

This is after DBCC DROPCLEANBUFFERS. Perhaps under more load a difference in performance would emerge?

 

Here you can see the saving in storage:

 

TableName IndexName IndexType avg_fragmentation_in_percent IndexSizeKB CompType

testtable_int_pagecomp PK__testtabl__357D0D3E2275BC2B CLUSTERED INDEX 0.154939846882975 88,056.00 PAGE

testtable_int_rowcomp PK__testtabl__357D0D3ECB985B95 CLUSTERED INDEX 0.0805239424522225 149,408.00 ROW

testtable_guid PK__testtabl__357D0D3EBA44B5DE CLUSTERED INDEX 0.0776685914969777 237,760.00 NONE

testtable_guid_rowcomp PK__testtabl__357D0D3EE6DDE57B CLUSTERED INDEX 0.0751261045326083 224,632.00 ROW

testtable_guid_pagecomp PK__testtabl__357D0D3EC9900121 CLUSTERED INDEX 0.0659630606860158 170,568.00 PAGE

testtable_int PK__testtabl__357D0D3EE482C5B7 CLUSTERED INDEX 0.0609137055837563 197,528.00 NONE

 

 

 

Here are the scripts I used:

--GENERATE DATA

SET NOCOUNT ON

 

CREATE TABLE testtable_int ([col1] [bigint] identity NOT NULL PRIMARY KEY CLUSTERED, [col2] [int] NULL, [col3] [int] NULL, [col4] [varchar](50) NULL);

 

CREATE TABLE testtable_guid ([col1] [uniqueidentifier] NOT NULL PRIMARY KEY CLUSTERED DEFAULT newsequentialid(), [col2] [int] NULL, [col3] [int] NULL, [col4] [varchar](50) NULL);

 

DECLARE @val INT

 

SELECT @val = 1

 

WHILE @val < 5000000

BEGIN

INSERT INTO testtable_int (col2, col3, col4)

VALUES (round(rand() * 100000, 0), round(rand() * 100000, 0), 'TEST' + CAST(@val AS VARCHAR))

 

INSERT INTO testtable_guid (col2, col3, col4)

VALUES (round(rand() * 100000, 0), round(rand() * 100000, 0), 'TEST' + CAST(@val AS VARCHAR))

 

SELECT @val = @val + 1

END

GO

 

CREATE TABLE testtable_int_rowcomp ([col1] [bigint] identity NOT NULL PRIMARY KEY CLUSTERED, [col2] [int] NULL, [col3] [int] NULL, [col4] [varchar](50) NULL)

WITH (DATA_COMPRESSION = ROW);

 

CREATE TABLE testtable_guid_rowcomp ([col1] [uniqueidentifier] NOT NULL PRIMARY KEY CLUSTERED DEFAULT newsequentialid(), [col2] [int] NULL, [col3] [int] NULL, [col4] [varchar](50) NULL)

WITH (DATA_COMPRESSION = ROW);

 

CREATE TABLE testtable_int_pagecomp ([col1] [bigint] identity NOT NULL PRIMARY KEY CLUSTERED, [col2] [int] NULL, [col3] [int] NULL, [col4] [varchar](50) NULL)

WITH (DATA_COMPRESSION = PAGE);

 

CREATE TABLE testtable_guid_pagecomp ([col1] [uniqueidentifier] NOT NULL PRIMARY KEY CLUSTERED DEFAULT newsequentialid(), [col2] [int] NULL, [col3] [int] NULL, [col4] [varchar](50) NULL)

WITH (DATA_COMPRESSION = PAGE);

GO

 

SET IDENTITY_INSERT testtable_int_rowcomp ON

 

INSERT testtable_int_rowcomp ([col1], [col2], [col3], [col4])

SELECT *

FROM testtable_int

 

SET IDENTITY_INSERT testtable_int_rowcomp OFF

 

INSERT testtable_guid_rowcomp

SELECT *

FROM testtable_guid

 

SET IDENTITY_INSERT testtable_int_pagecomp ON

 

INSERT testtable_int_pagecomp ([col1], [col2], [col3], [col4])

SELECT *

FROM testtable_int

 

SET IDENTITY_INSERT testtable_int_pagecomp OFF

 

INSERT testtable_guid_pagecomp

SELECT *

FROM testtable_guid

 

ALTER INDEX ALL ON testtable_int_rowcomp REBUILD

WITH (FILLFACTOR = 100)

 

ALTER INDEX ALL ON testtable_guid_rowcomp REBUILD

WITH (FILLFACTOR = 100)

 

ALTER INDEX ALL ON testtable_int_pagecomp REBUILD

WITH (FILLFACTOR = 100)

 

ALTER INDEX ALL ON testtable_guid_pagecomp REBUILD

WITH (FILLFACTOR = 100)

 

ALTER INDEX ALL ON testtable_int REBUILD

WITH (FILLFACTOR = 100)

 

ALTER INDEX ALL ON testtable_guid REBUILD

WITH (FILLFACTOR = 100)

GO

 

--FRAGMENTATION AND SIZE SCRIPT

 

SELECT OBJECT_NAME(ind.OBJECT_ID) AS TableName, 

       ind.name AS IndexName, 

       indexstats.index_type_desc AS IndexType, 

       indexstats.avg_fragmentation_in_percent,

  CONVERT(varchar, CAST( SUM(s.[used_page_count]) * 8 AS money), 1) AS IndexSizeKB, 

   MAX(pa.data_compression_desc) AS CompType

FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'DETAILED') indexstats 

INNER JOIN sys.indexes ind ON ind.object_id = indexstats.object_id AND ind.index_id = indexstats.index_id

LEFT JOIN  sys.dm_db_partition_stats AS s ON s.[object_id] = ind.[object_id] AND s.[index_id] = ind.[index_id]

LEFT JOIN sys.partitions AS pa ON pa.[index_id] = ind.[index_id] and pa.[object_id] = ind.[object_id]

WHERE OBJECT_NAME(ind.OBJECT_ID) like '%testtable%' AND indexstats.index_level = 0

GROUP BY ind.OBJECT_ID, ind.name, indexstats.index_type_desc, indexstats.avg_fragmentation_in_percent

ORDER BY indexstats.avg_fragmentation_in_percent DESC 

 

--TESTING SCRIPT

dbcc dropcleanbuffers

go

declare @testid bigint = (select top 1 col1 from testtable_int order by NEWID())

, @testguid uniqueidentifier  = (select top 1 col1 from testtable_guid order by NEWID())

 

 

SELECT * FROM testtable_int WHERE col1=@testid

 

SELECT * FROM testtable_guid WHERE col1=@testguid

 

SELECT * FROM testtable_int_rowcomp WHERE col1=@testid

 

SELECT * FROM testtable_guid_rowcomp WHERE col1=@testguid

 

SELECT * FROM testtable_int_pagecomp WHERE col1=@testid

 

SELECT * FROM testtable_guid_pagecomp WHERE col1=@testguid

 

 

-- test insert

INSERT INTO testtable_int ( col2,  col3, col4) 

   VALUES (round(rand()*100000,0),round(rand()*100000,0),'TEST' + CAST(5000001 AS VARCHAR))

 

INSERT INTO testtable_guid ( col2,  col3, col4) 

   VALUES (round(rand()*100000,0),round(rand()*100000,0),'TEST' + CAST(5000001 AS VARCHAR))

 

INSERT INTO testtable_int_rowcomp( col2,  col3, col4) 

   VALUES (round(rand()*100000,0),round(rand()*100000,0),'TEST' + CAST(5000001 AS VARCHAR))

 

INSERT INTO testtable_guid_rowcomp( col2,  col3, col4) 

   VALUES (round(rand()*100000,0),round(rand()*100000,0),'TEST' + CAST(5000001 AS VARCHAR))

 

INSERT INTO testtable_int_pagecomp( col2,  col3, col4) 

   VALUES (round(rand()*100000,0),round(rand()*100000,0),'TEST' + CAST(5000001 AS VARCHAR))

 

INSERT INTO testtable_guid_pagecomp( col2,  col3, col4) 

   VALUES (round(rand()*100000,0),round(rand()*100000,0),'TEST' + CAST(5000001 AS VARCHAR))

 

 

-- test update

UPDATE testtable_int SET col2=232342,col3=232340,col4='TESTUPDATE'

WHERE col1=@testid

 

UPDATE testtable_guid SET col2=232342,col3=232340,col4='TESTUPDATE'

WHERE col1=@testguid

 

UPDATE testtable_int_rowcomp SET col2=232342,col3=232340,col4='TESTUPDATE'

WHERE col1=@testid

 

UPDATE testtable_guid_rowcomp SET col2=232342,col3=232340,col4='TESTUPDATE'

WHERE col1=@testguid

 

UPDATE testtable_int_pagecomp SET col2=232342,col3=232340,col4='TESTUPDATE'

WHERE col1=@testid

 

UPDATE testtable_guid_pagecomp SET col2=232342,col3=232340,col4='TESTUPDATE'

WHERE col1=@testguid

 

 

-- test delete

DELETE FROM testtable_int WHERE col1=@testid

 

DELETE FROM testtable_guid WHERE col1=@testguid

 

DELETE FROM testtable_int_rowcomp WHERE col1=@testid

 

DELETE FROM testtable_guid_rowcomp WHERE col1=@testguid

 

DELETE FROM testtable_int_pagecomp WHERE col1=@testid

 

DELETE FROM testtable_guid_pagecomp WHERE col1=@testguid

 

 

 


Friday, September 29, 2017 - 12:41:38 PM - Ann Back To Top (66731)

 And this is why I say to my students that you have to be careful of what you read on the web.  Although there is no CPU difference, I/O is typically much more "costly" these days than CPU.  Even with your indexes, the INSERT and DELETE have twice as many writes and you are only working with 1 row.  The UPDATE of a single row adds 50% of overhead on reads.  That is not no difference.

Try doing a join of two tables on on GUIDs and watch what that does to perfomrance.

I fully agree with others that if you need the GUID datatype for some other reason, use NewSequentialID() unless you are in one of the rare occasions where having a sequential clustered index is bad for your performance.  Then, simply make your PK a non-clustered index instead of the clustered index.

 


Friday, September 29, 2017 - 12:24:44 PM - Ben Snaidero Back To Top (66730)

Dataset had 5,000,000 records.  The script is provided at the start of the tip so you can change and rerun if you like.   I am not a fan of GUIDs either and originally was thinking the performance would be much worse but was surprised when I saw the results.


Friday, September 29, 2017 - 11:25:56 AM - Brian Guarnieri Back To Top (66727)

 Sample size for data? What was the sample size for the data inserts/select? IE how big were the tables and how much data are we talking about. It would be interesting to apply this across data sets at 1,000,000 rows, 5,000,000 and say 10,000,000 rows and beyond so we can see how it scales. I AM NOT saying anything is wrong just asking the question on vertical scalability for integer primary keys verses GUID. There actually may be little difference in a lab like environment. Great thought provoking write up! I hate GUID's but your write up has me asking "Is my dislike of GUID's biased"?


Friday, September 29, 2017 - 10:53:10 AM - Ben Snaidero Back To Top (66724)

 I agree that newsequentialid would definitely help with the fragmentation.  I mention this as a next step for testing.

Thanks for reading.

 

 


Friday, September 29, 2017 - 9:28:00 AM - Mark Tassin Back To Top (66720)

 Wouldn't the next insert almost assuredly cause a page split and over the course of a day, do a fairly good job of trashing the Clustered index (CI) all over again?
A very busy system that records several thousand or more inserts would likely start to see performance degrade across the business day, 24/7 operations may have trouble finding a windows to defrag the index (even using ONLINE=ON)

Wouldn't the better fix be to use NEWSEQUENTIALID() as a default constraint for the column the CI is set on, so that future inserts are sequentially higher in value than prior ones (at least until the instance restarts, at which point you may wind up splitting pages again at first, but will setlle out of that as the new seed into NEWSEQUENTIALID() makes space?

 

 


Friday, September 29, 2017 - 3:19:03 AM - Rajko Back To Top (66708)

 Why you don't use newsequentialid()

[code]

alter table add guid uniqueidentifier default newsequentialid() primary key clustered;

[/code]

 















get free sql tips
agree to terms