Auto generated SQL Server keys uniqueidentifier or IDENTITY

Problem

I’m designing a table and I’ve decided to create an auto-generated primary key value as opposed to creating my own scheme or using natural keys. I see that SQL Server offers globally unique identifiers (GUIDs) as well as identities to create these values. What are the pros and cons of using a SQL uniqueidentifier versus an identity value?

Solution

Yes, there are a number of ways you can auto-generate key values for your tables. The most common ways are via the use of the IDENTITY column property or by specifying a uniqueidentifier (GUID) data type along with defaulting with either the NEWID() or NEWSEQUENTIALID() function. Furthermore, GUIDs are heavily used in SQL Server Replication to uniquely identify rows in Merge Replication or Transactional Replication with updating subscriptions.

The most common, well known way to auto-generate a key value is via the use of the IDENTITY column property on a column that’s typically declared as an integer. Once defined, the engine will automatically generate a sequential number based on the way the property has been declared on the column. The IDENTITY property takes an initial seed value as its first parameter and an increment value as its second parameter.

Example Using Identity Value

Consider the following example which creates and inserts into identity based tables that define the primary key as a clustered index:

SET NOCOUNT ON
GO
USE MASTER
GO
CREATE DATABASE MSSQLTIPS
GO
 
USE MSSQLTIPS
GO
 
-- Start at 1 and increment by 1
CREATE TABLE IDENTITY_TEST1 
(
ID INT IDENTITY(1,1) PRIMARY KEY,
TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000)
)
GO
 
-- Start at 10 and increment by 10
CREATE TABLE IDENTITY_TEST2
(
ID INT IDENTITY(10,10) PRIMARY KEY,
TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000)
)
GO
 
-- Start at 1000 and increment by 5
CREATE TABLE IDENTITY_TEST3
(
ID INT IDENTITY(1000,5) PRIMARY KEY,
TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000)
)
GO
 
-- INSERT 1000 ROWS INTO EACH TEST TABLE 
DECLARE @COUNTER INT
SET @COUNTER = 1
 
WHILE (@COUNTER <= 1000)
BEGIN
   INSERT INTO IDENTITY_TEST1 DEFAULT VALUES 
   INSERT INTO IDENTITY_TEST2 DEFAULT VALUES 
   INSERT INTO IDENTITY_TEST3 DEFAULT VALUES 
   SET @COUNTER = @COUNTER + 1
END
GO
 
SELECT TOP 3 ID FROM IDENTITY_TEST1
SELECT TOP 3 ID FROM IDENTITY_TEST2
SELECT TOP 3 ID FROM IDENTITY_TEST3
GO

Example Using an uniqueidentifier

Another way to auto-generate key values is to specify your column as a type of uniqueidentifier and DEFAULT using NEWID() or NEWSEQUENTIALID(). Unlike IDENTITY, a DEFAULT constraint must be used to assign a GUID value to the column.

How do NEWID() and NEWSEQUENTIALID() differ? NEWID() randomly generates a guaranteed unique value based on the identification number of the server’s network card plus a unique number from the CPU clock. In contrast, NEWSEQUENTIALID() generates these values in sequential order as opposed to randomly.

Let’s create new tables that use a uniqueidentifier along with both NEWID() and NEWSEQUENTIALID().

USE MSSQLTIPS
GO
 
CREATE TABLE NEWID_TEST
(
ID UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY,
TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000)
)
GO
CREATE TABLE NEWSEQUENTIALID_TEST
(
ID UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID() PRIMARY KEY,
TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000)
)
GO
 
-- INSERT 1000 ROWS INTO EACH TEST TABLE 
DECLARE @COUNTER INT
SET @COUNTER = 1
 
WHILE (@COUNTER <= 1000)
BEGIN
   INSERT INTO NEWID_TEST DEFAULT VALUES 
   INSERT INTO NEWSEQUENTIALID_TEST DEFAULT VALUES 
   SET @COUNTER = @COUNTER + 1
END
GO
 
SELECT TOP 3 ID FROM NEWID_TEST
SELECT TOP 3 ID FROM NEWSEQUENTIALID_TEST
GO

As you can see, the first table which uses NEWID() generates random values while the second table that uses NEWSEQUENTIALID() generates sequential values. As opposed to the integers generated by the IDENTITY approach, the GUID values generated are not as friendly to look at or work with. There is one other item to note. SQL Server keeps the last generated identity value in memory which can be retrieved right after an INSERT using SCOPE_IDENTITY(), @@IDENTITY, or CHECK_IDENT (depending on the scope you require). There is nothing similar to capture the last generated GUID value. If you use a GUID, you’ll have to create your own mechanism to capture the last inserted value (i.e. retrieve the GUID prior to insertion or use the SQL Server OUTPUT clause).

Storage Comparison Identity versus uniqueidentifier

Now that we understand how to auto generate key values and what they look like, let’s examine the storage impacts of each approach. As part of the previously created table definitions, I added a column of CHAR(2000) to mimic the storage of additional column data. Let’s examine the physical storage of the data:

USE MSSQLTIPS
GO
SELECT OBJECT_NAME([OBJECT_ID]) as tablename, avg_fragmentation_in_percent, fragment_count, page_count
FROM sys.dm_db_index_physical_stats (DB_ID(), null, null, null, null)
ORDER BY tablename
GO

Looking at this output, you can see that the NEWID() test table is very fragmented as evidenced by its fragmentation percentage of 98%. Furthermore, you can see that the rows were dispersed among 490 pages. This is due to the page splitting that occurred due to the random nature of the key generation. In contrast, the IDENTITY and NEWSEQUENTIALID() test tables show minimal fragmentation since their auto generated keys occur in sequential order. As a result, they don’t suffer from the page splitting condition that plagues the NEWID() approach. Though you can defragment the NEWID() table, the random nature of the key generation will still cause page splitting and fragmentation with all future table INSERTs. However, page splitting can be minimized by specifying an appropriate FILL FACTOR.

Looking at the NEWSEQUENTIALID() test table, we see it generated fewer pages than the NEWID() approach but it still generated more pages than the IDENTITY approach. Why is this? It’s because the uniqueidentifier data type consumes 16 bytes of disk space as opposed to the 4 bytes used by the integer data type that was used for the IDENTITY. Considering that SQL Server pages are generally capped at 8K or roughly 8060 bytes this leads to more pages generated for the NEWSEQUENTIALID() approach as opposed to the IDENTITY approach.

Using sp_spaceused to see storage use

Examining the database table space used, we see that the tables using the IDENTITY approach used the least amount disk space.

exec sp_spaceused IDENTITY_TEST1
GO
exec sp_spaceused IDENTITY_TEST2
GO
exec sp_spaceused IDENTITY_TEST3
GO
exec sp_spaceused NEWID_TEST
GO
exec sp_spaceused NEWSEQUENTIALID_TEST
GO

Now also consider this, since a uniqueidentifier data type consumes 16 bytes of data, the size of any defined non-clustered indexes on a table using a GUID as a clustered index are also affected because the leaf level of these non-clustered indexes contains the clustered index key as a pointer. As a result, the size of any non-clustered indexes would end up being larger than if an IDENTITY were defined as integer or bigint.

Summary

It’s evident that using IDENTITY to auto-generate key values offers a few advantages over the GUID approaches:

IDENTITY generated values are configurable, easy to read, and easier to work with
Fewer database pages are required to satisfy query requests
In comparison to NEWID(), page splitting (and its associated overhead) is eliminated
Database size is minimized
System functions exist to capture the last generated IDENTITY value (i.e. SCOPE_IDENTITY(), etc)
Some system functions – such as MIN() and MAX(), for instance – cannot be used on uniqueidentifier columns

Next Steps

Read more about NEWSEQUENTIALID()
Read Using uniqueidentifier Data
If you’re not in a situation where you require a globally unique value, consider if an IDENTITY makes sense for auto-generating your key values.
Regardless if you decide on a GUID or IDENTITY, consider adding a meaningful UNIQUE key based on the real data in your table.

MSSQL Tips

For 20+ years, MSSQLTips.com has been delivering comprehensive technical solutions on a daily basis for the Microsoft SQL Server platform. In that time, an innovative team of over 300 authors from around the globe have delivered over 7000 articles and 1000 webcasts ranging from beginner topics to advanced solutions. Our community prides itself on delivering human only generated content.

Example Using Identity Value

Example Using an uniqueidentifier

Storage Comparison Identity versus uniqueidentifier

Using sp_spaceused to see storage use

Summary

Leave a ReplyCancel Reply