Learn more about SQL Server tools



solving sql server problems for millions of dbas and developers since 2006 attend our next webcast













































   Got a SQL tip?
            We want to know!

Auto generated SQL Server keys with the uniqueidentifier or IDENTITY

MSSQLTips author Armando Prato By:   |   Read Comments (36)   |   Related Tips: 1 | 2 | 3 | 4 | More > Identities

Problem
I'm designing a table and I've decided to create an auto-generated primary key value as opposed to creating my own scheme or using natural keys. I see that SQL Server offers globally unique identifiers (GUIDs) as well as identities to create these values. What are the pros and cons of these approaches?

Solution
Yes, there are a number of ways you can auto-generate key values for your tables. The most common ways are via the use of the IDENTITY column property or by specifying a uniqueidentifier (GUID) data type along with defaulting with either the NEWID() or NEWSEQUENTIALID() function. Futhermore, GUIDs are heavily used in SQL Server Replication to uniquely identify rows in Merge Replication or Transactional Replication with updating subscriptions.

The most common, well known way to auto-generate a key value is via the use of the IDENTITY column property on a column that's typically declared as an integer. Once defined, the engine will automatically generate a sequential number based on the way the property has been declared on the column. The IDENTITY property takes an initial seed value as its first parameter and an increment value as its second parameter.

Consider the following example which creates and inserts into identity based tables that define the primary key as a clustered index:

SET NOCOUNT ON
GO
USE MASTER
GO
CREATE DATABASE MSSQLTIPS
GO

USE MSSQLTIPS
GO
-- Start at 1 and increment by 1
CREATE TABLE IDENTITY_TEST1
(
ID INT IDENTITY(1,1) PRIMARY KEY,
TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000)
)
GO

-- Start at 10 and increment by 10
CREATE TABLE IDENTITY_TEST2
(
ID INT IDENTITY(10,10) PRIMARY KEY,
TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000)
)
GO

-- Start at 1000 and increment by 5
CREATE TABLE IDENTITY_TEST3
(
ID INT IDENTITY(1000,5) PRIMARY KEY,
TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000)
)
GO

-- INSERT 1000 ROWS INTO EACH TEST TABLE
DECLARE @COUNTER INT
SET @COUNTER = 1

WHILE (@COUNTER <= 1000)
BEGIN
   INSERT INTO IDENTITY_TEST1 DEFAULT VALUES
   INSERT INTO IDENTITY_TEST2 DEFAULT VALUES
   INSERT INTO IDENTITY_TEST3 DEFAULT VALUES
   SET @COUNTER = @COUNTER + 1
END
GO

SELECT TOP 3 ID FROM IDENTITY_TEST1
SELECT TOP 3 ID FROM IDENTITY_TEST2
SELECT TOP 3 ID FROM IDENTITY_TEST3
GO

Another way to auto-generate key values is to specify your column as a type of uniqueidentifier and DEFAULT using NEWID() or NEWSEQUENTIALID(). Unlike IDENTITY, a DEFAULT constraint must be used to assign a GUID value to the column.

How do NEWID() and NEWSEQUENTIALID() differ? NEWID() randomly generates a guaranteed unique value based on the identification number of the server's network card plus a unique number from the CPU clock. In contrast, NEWSEQUENTIALID() generates these values in sequential order as opposed to randomly.

Let's create new tables that use a uniqueidentifier along with both NEWID() and NEWSEQUENTIALID()

 
USE MSSQLTIPS
GO

CREATE TABLE NEWID_TEST
(
ID UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY,
TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000)
)
GO
CREATE TABLE NEWSEQUENTIALID_TEST
(
ID UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID() PRIMARY KEY,
TESTCOLUMN CHAR(2000) DEFAULT REPLICATE('X',2000)
)
GO

-- INSERT 1000 ROWS INTO EACH TEST TABLE
DECLARE @COUNTER INT
SET @COUNTER = 1

WHILE (@COUNTER <= 1000)
BEGIN
   INSERT INTO NEWID_TEST DEFAULT VALUES
   INSERT INTO NEWSEQUENTIALID_TEST DEFAULT VALUES
   SET @COUNTER = @COUNTER + 1
END
GO

SELECT TOP 3 ID FROM NEWID_TEST
SELECT TOP 3 ID FROM NEWSEQUENTIALID_TEST
GO

As you can see, the first table which uses NEWID() generates random values while the second table that uses NEWSEQUENTIALID() generates sequential values. As opposed to the integers generated by the IDENTITY approach, the GUID values generated are not as friendly to look at or work with. There is one other item to note. SQL Server keeps the last generated identity value in memory which can be retrieved right after an INSERT using SCOPE_IDENTITY(), @@IDENTITY, or CHECK_IDENT (depending on the scope you require). There is nothing similar to capture the last generated GUID value. If you use a GUID, you'll have to create your own mechanism to capture the last inserted value (i.e. retrieve the GUID prior to insertion or use the SQL Server 2005 OUTPUT clause).

Now that we understand how to auto generate key values and what they look like, let's examine the storage impacts of each approach. As part of the previously created table definitions, I added a column of CHAR(2000) to mimic the storage of additional column data. Let's examine the physical storage of the data:

USE MSSQLTIPS
GO
SELECT OBJECT_NAME([OBJECT_ID]) as tablename, avg_fragmentation_in_percent, fragment_count, page_count
FROM sys.dm_db_index_physical_stats (DB_ID(), null, null, null, null)
ORDER BY tablename
GO

Looking at this output, you can see that the NEWID() test table is very fragmented as evidenced by its fragmentation percentage of 98%. Furthermore, you can see that the rows were dispersed among 490 pages. This is due to the page splitting that occurred due to the random nature of the key generation. In contrast, the IDENTITY and NEWSEQUENTIALID() test tables show minimal fragmentation since their auto generated keys occur in sequential order. As a result, they don't suffer from the page splitting condition that plagues the NEWID() approach. Though you can defragment the NEWID() table, the random nature of the key generation will still cause page splitting and fragmentation with all future table INSERTs. However, page splitting can be minimized by specifying an appropriate FILL FACTOR.

Looking at the NEWSEQUENTIALID() test table, we see it generated fewer pages than the NEWID() approach but it still generated more pages than the IDENTITY approach. Why is this? It's because the uniqueidentifier data type consumes 16 bytes of disk space as opposed to the 4 bytes used by the integer data type that was used for the IDENTITY. Considering that SQL Server pages are generally capped at 8K or roughly 8060 bytes (as of SQL Server 2005, there is a row-overflow mechanism that can kick in but that's for another discussion), this leads to more pages generated for the NEWSEQUENTIALID() approach as opposed to the IDENTITY approach.

Examining the database table space used, we see that the tables using the IDENTITY approach used the least amount disk space.

 
exec sp_spaceused IDENTITY_TEST1
GO
exec sp_spaceused IDENTITY_TEST2
GO
exec sp_spaceused IDENTITY_TEST3
GO
exec sp_spaceused NEWID_TEST
GO
exec sp_spaceused NEWSEQUENTIALID_TEST
GO

Now also consider this, since a uniqueidentifier data type consumes 16 bytes of data, the size of any defined non-clustered indexes on a table using a GUID as a clustered index are also affected because the leaf level of these non-clustered indexes contains the clustered index key as a pointer. As a result, the size of any non-clustered indexes would end up being larger than if an IDENTITY were defined as integer or bigint.

It's evident that using IDENTITY to auto-generate key values offers a few advantages over the GUID approaches:

  • IDENTITY generated values are configurable, easy to read, and easier to work with
  • Fewer database pages are required to satisfy query requests
  • In comparison to NEWID(), page splitting (and its associated overhead) is eliminated
  • Database size is minimized
  • System functions exist to capture the last generated IDENTITY value (i.e. SCOPE_IDENTITY(), etc)
  • Some system functions - such as MIN() and MAX(), for instance - cannot be used on uniqueidentifier columns

Next Steps

  • Read more about NEWSEQUENTIALID() in the SQL Server 2005 Books Online
  • Read Using uniqueidentifier Data in the SQL Server 2005 Books Online
  • If you're not in a situation where you require a globally unique value, consider if an IDENTITY makes sense for auto-generating your key values.
  • Regardless if you decide on a GUID or IDENTITY, consider adding a meaningful UNIQUE key based on the real data in your table.


Last Update: 10/8/2008


About the author
MSSQLTips author Armando Prato
Armando Prato has over 24 years of industry experience and has been working with SQL Server since version 6.5.

View all my tips


print tip Print  
Become a paid author





join MSSQLTips for free SQL Server tips     



Learn more about SQL Server tools
Post a comment or let the author know this tip helped you.

       All comments are reviewed, so stay on subject or we may delete your comment.

*Name   *Email Notify for updates



       Note: your email address is not published. Required fields are marked with an asterisk (*)


Get free SQL tips:

*Enter Code refresh code     



Thursday, August 21, 2014 - 9:38:44 AM - Mike Read The Tip

I find this a very simplified look at generating keys.

 

You don't know about replication?

 

I do like SQL Server 2012's ability to create your own unique pattern also.

 


Friday, April 12, 2013 - 3:12:32 PM - george hardy Read The Tip

so it looks like i can still maintain my readable index numbers that are auto-incrementing integers, just remove it as the primary key, and begin to use these guids as the real row key.  this way, my code doing something like "DELETE FROM EQUIPMENT WHERE EQUIPMENT_ID = x" will still work.

i also want a much cleaner way to utilize the sync framework, and had i known this before, that task would have been MUCH easier!

 

thanks for the write-up.


Monday, January 14, 2013 - 2:26:26 AM - URVISH SUTHAR Read The Tip

Great job, many thanks for sharing :)

 


Wednesday, May 23, 2012 - 11:26:06 PM - Ravi Chauhan Read The Tip

how to delete record from table where primary key(ID) is auto generated, and we need to delete that record according to primary example

DELETE TOP(1) FROM dbo.Test WHERE ID = 1

but we dont know the ID no in above scenario....


Wednesday, May 23, 2012 - 2:26:09 AM - Ravi Chauhan Read The Tip

We create example of two tables,

but suppose we have three table IDENTITY_TEST1, IDENTITY_TEST2 and IDENTITY_TEST3

Means A >> B >> C (link sign ">>")

so how to create the relationship in these three tables?

 


Wednesday, May 23, 2012 - 2:17:38 AM - Ravi Chauhan Read The Tip

create database db

use db

CREATE TABLE IDENTITY_TEST1
(
ID INT IDENTITY(1,1) PRIMARY KEY,
TESTCOLUMN varchar(20)
)

CREATE TABLE IDENTITY_TEST2
(
EMP_ID INT IDENTITY(1,1) REFERENCES IDENTITY_TEST1(ID),
ADDR VARCHAR(20)
)

insert into IDENTITY_TEST1 values('cde')

select * from IDENTITY_TEST1

insert into IDENTITY_TEST2 values('acd')

select * from IDENTITY_TEST2


Thursday, April 19, 2012 - 6:00:38 PM - Enigmatic Read The Tip

@Javed,

Yes that is correct. IDENTITY is a counter and every time you insert a record the next number in that counter is used. This is done to ensure you ALWAYS get a unique number that could not possibly be in the table. If you want to go back and "fill in" missing numbers then you are going to have to implement your own locking mechanism to first lock the table, then select the first missing number and then unlock it once you have found it. If you do not do this, there is a chance multiple calls made at the same time could potentially return the same number and attempt to both insert the same key which would give you a primary key violation.

If you want to delete all records and restart your numbering at 1 you need to use:

DBCC CHECKIDENT ("<Schema>.<TableName>")

This will reset the IDENTITY back to its original value


Thursday, April 19, 2012 - 9:11:55 AM - javed Read The Tip

Hi. sir,

My questions is I have a column id and set as primary key identity in a table.whenever I delete any record from table and insert new record then column id start count as where last id number was exists.And  whenvere delete all record from table. then insrt new record then does not start with 1.whereas start with last id was there.

plz help me..do reply me.. 


Monday, April 02, 2012 - 1:37:32 AM - Enigmatic Read The Tip

One thing that wasn't listed were the disadvantages of using IDENTITY columns.

The biggest issue I have is the use of index values in code. Where a function accepts an integer as the index, a person can place any number they like in there, regardless of what that number respresents. It opens you up to the possibility of more errors as every table with an identity will have a row with ID = 1, 2, 3, etc.

So for instance, there would be nothing stopping me from accidentally transposing an ID from one table into an ID of another table, and in fact I could create foreign key indexes between unrelated data simply because the ID values of one table exist in the other.

To me this is a loss of referencial integrity, and the signature of C# methods that require IDs all end up something like (int, int, int). To me this isn't IDentifying records at all, its imply numbers.

For this reason I prefer to use GUIDs as you are guaranteed that the GUID value in one table will not appear in any other table other than the one it belongs in, thus ensuring no mistakes in foreign keys, and no accidental transposing when filling out method signatures. When you consider how many functions, stored procedures, methods, and queries either pass identifiers or compare identifiers, the odds of transposing increase with every single new addition.

This level of security and reliability easily "trumps" a bit of disk space in a world that is now being measured in Terabytes, or the use of MIN/MAX functions on an IDENTITY field (that makes no sense!), or a few less pages being created.

Anyone who has spent days trying to find out why something isn't working correctly only to realise that they "accidentally" inserted an IndustryTypeID into an IndustryID, but it never complained because there just happens to be an IndustryID = 3 the same as there is an IndustryTypeID = 3, or why records occasionally said it violated a primary key but only at what appears to be random intervals, etc.


Thursday, March 22, 2012 - 12:51:58 AM - sandeep kumar Read The Tip

How to create cust_id autogenrated conditions are-

1- not match to previous id

2- look like cust_id AB001

Please reply me.

                                 Thank You.

 


Tuesday, March 20, 2012 - 9:53:03 AM - Zeni Read The Tip

 

Thanks. This article was helpful for me in deciding primery key datatype for my database.

Thanks


Tuesday, February 22, 2011 - 5:41:51 AM - Fayssal El Moufatich Read The Tip

Thanks for the nicely written article! So, unless one is in a situation where he needs a glabally unique identifier, one should avoid using GUIDs. And even, in this situation, it is recommended to generate the keys using NEWSEQUENTIALID(), as long as it is possible. I agree! :)


Monday, January 19, 2009 - 6:38:33 PM - aprato Read The Tip

You answered your own question.  You work in a distributed environment and they make sense in this scenario.  Replication makes use of GUIDs for this reason.   I personally wouldn't use a GUID for an application that didn't have a valid business reason for it.  They have their place but they do have some downside.   


Monday, January 19, 2009 - 5:50:12 PM - laughingskeptic Read The Tip

But itsn't this a bit of a red herring?  The natural fill factor of an index on random data (such as a GUID) that experiences primarily inserts is 75%.  If you compact it, then you just cause a bunch of page splits as soon as the inserts begin again, potentially swinging quickly from a fill of 100% to 50% before settling back in to 75%.  A SQL Server page can hold around 254 GUIDs, so if your table has significantly more rows than this you should expect to see close to 100% fragmentation.  For random indexes, average fill factor is a better metric than percent fragmentation.  If the number is not about 75% then an investigation is warranted.

GUIDs are 4 times bigger than an integer, but if your server has reached memory constraints (most real ones do) a GUID index will be 5 times slower than an integer at lookups.  This is a design consideration.

I work with some highly distributed databases, involving multitiered replication and thousands of clients (some being third parties).  We would never get this system to work without using a lot of GUIDs.  GUIDs are especially handy when multiple clients can work standalone then expect to re-sync when they connect.


Monday, October 27, 2008 - 7:40:26 AM - aprato Read The Tip

Hi

I was just stating that you'll still get rapid index fragmentation (heavy, actually) if you decide to create a non-clustered index on a uniqueidentifier that uses NEWID().  It doesn't take many rows.  I had done a test with just 10,000 insertions and I still had an index with 95% fragmentation.  Due to the random generation, you'll get rapid fragmentation; it won't take many rows.   It's just something to be aware of.


Monday, October 27, 2008 - 6:29:39 AM - mharr Read The Tip

[quote user="aprato"]

You should note that even using NEWID() for a non-clustered index will lead to fragmentation for that index. Recall that index structures are sorted logically.

[/quote]

Yes, but:

  • a fragmented index is less likely (since an index "row" will be smaller than a data "row", and will have more entries and more room for new entries in an index), and will be less
  • rebuilding an index to remove fragmentation is easier and less disruptive than reordering a table.
  • while doing "one of " lookups of a primary key where the primary key is a guid (most used query using a primary key), using a fragmented index and unfragmented data pages will likely still be more efficient than using a fragmented data pages where the guid is clustered primary key.

Thursday, October 23, 2008 - 12:38:00 PM - aprato Read The Tip

You should note that even using NEWID() for a non-clustered index will lead to fragmentation for that index. Recall that index structures are sorted logically.


Thursday, October 23, 2008 - 12:13:46 PM - Wiseman82 Read The Tip

[quote user="DiverKas"]

[quote user="mharr"]

Nice article on the advantages/disadvantages of identity and uniqueidentifier keys.  However, there is a relatively simple solution to the problem of fragmentation when using uniqueidentifier keys: don't use the primary key as a clustered index.

By default, when you create a primary key, it is created as a clustered index.  But it does not have to be so.  You can choose any other column as the clustered index, then the data rows will be physically ordered by that column.

How are the queries on the table going to be used?  If it is by and large going to be random access, then perhaps a DateInserted date column would be appropriate.  If there will frequent access of a customer table by territory number, it may be best to set up a clustered index by that column.  Or, choose not to have any clustered index, and rows will always be added to the end of the last data page, and time-consuming page splits will never occur.

There is no rule, in practice or theory, that the primary key must be a clustered index.  A little more thought on how the table will be used and accessed can resolve the fragmentation issue, and provide better performance.

Hope this helps

[/quote]

That is a great point.  Its silly for Microsoft to have left the clustered index on an identity or guid column for that matter.  It makes no sense and certainly lends nothing to the performance.  It is usually the first thing I have to change on a newly created table.

 

[/quote]

Choosing another column for the clustered index might get around the fragmentation issue, but the primary key is often the best candidate for the clustered index.  Sequential GUIDs might be a better option to get around the fragmentation issue.  Also remember that the fragmentation issue would apply (to a lesser extent) to a non-clustered index created on the GUID column.

It's normally a good idea for the clustered index to be unique - if it's not SQL Server will add a uniqueifier to make the clustered index unique (additional overhead).  Due to the fragmentation issues already mentioned, it's also a good idea to to have an auto-incermenting clustered index - Identities and Sequential GUIDs both fit the bill here.  Another thing you need to consider is if the chosen column(s) for the clustered index are likely to be updated - Ideally you want to pick a column that is never/rarely updated - the primary key also fits the bill here.

I'd advise people to keep the clustered index as the primary key, unless they have a good reason for choosing a different column (MS made this a default for good reason).  I'm not saying that you always should use the primary key as a clustered index, but it's not a bad choice in the absence of another candidate for the clustered index.

In terms of query performance (rather than write performance), you might want to use a column that is frequently used in ORDER BY, GROUP BY, JOINS (foreign keys) or where the column is often used to select a range of values etc. 

As mentioned, a date column might be a good candidate if it's populated with a GetDate()/GetUTCDate() - you would expect a very low number of duplicate values and the column will be auto-incrementing.  It might also unlikely that the column will ever be updated.  If queries frequently sort on this column and filter it by a range of values then it would be a good candidate for the clustered index.

Also note that best practice will vary depending on the type of database.  If you are creating a data warehouse/reporting database rather than an OLTP database, you will have different priorities. 

 

 


Thursday, October 23, 2008 - 9:31:52 AM - Perre Read The Tip

Nice article.  Also good comments telling there are also advantages with guids ... not that you said there aren't.

I personally prefer identity columns ... indead for passing back the key from Stored procedures to the client software, for easiness in SQL statements, joins, etc ... and as you prove nicely to limit page splits, index size, ...

Although I once had a problem when a branch office with it's own SQL server, with the same application and database started using VPN.  Business managers decided the databases should be merged together ... so I hit the problem we had the same clients id's in both databases but regarding different clients ... and of course those id's were spread accross a lot of tables as foreign keys.  By consequence I had a serious job merging the data.  Finally I decided to add an extra field to the autoincremental primary key, telling me it's a client from the branch office or one from the main office.  So nowadays when I have to use a key like that I consider adding an extra column.  Alternativly I could have added 10.000.000 to the id's from the branch office before importing the data ... but I didn't thought about that at that moment and it wouldn't have clearified the difference between clients from both offices inserted after the merge.

An old collegue of me always used GUIDs for his websitedatabases since they seem more complex in the URL ... avoiding the fact if you hit .com?id=127 people would try to enter .com?id=128 which should not be visible to them ... but afterwards I think this isn't a strong argument.  If you should not be allowed to see the page with .com?id=128 ... the website should block it :-)

 Cheers

 

 

 

 

 

 

 


Thursday, October 23, 2008 - 9:22:09 AM - DiverKas Read The Tip

[quote user="mharr"]

Nice article on the advantages/disadvantages of identity and uniqueidentifier keys.  However, there is a relatively simple solution to the problem of fragmentation when using uniqueidentifier keys: don't use the primary key as a clustered index.

By default, when you create a primary key, it is created as a clustered index.  But it does not have to be so.  You can choose any other column as the clustered index, then the data rows will be physically ordered by that column.

How are the queries on the table going to be used?  If it is by and large going to be random access, then perhaps a DateInserted date column would be appropriate.  If there will frequent access of a customer table by territory number, it may be best to set up a clustered index by that column.  Or, choose not to have any clustered index, and rows will always be added to the end of the last data page, and time-consuming page splits will never occur.

There is no rule, in practice or theory, that the primary key must be a clustered index.  A little more thought on how the table will be used and accessed can resolve the fragmentation issue, and provide better performance.

Hope this helps

[/quote]

That is a great point.  Its silly for Microsoft to have left the clustered index on an identity or guid column for that matter.  It makes no sense and certainly lends nothing to the performance.  It is usually the first thing I have to change on a newly created table.

 


Thursday, October 23, 2008 - 9:19:29 AM - mharr Read The Tip

Nice article on the advantages/disadvantages of identity and uniqueidentifier keys.  However, there is a relatively simple solution to the problem of fragmentation when using uniqueidentifier keys: don't use the primary key as a clustered index.

By default, when you create a primary key, it is created as a clustered index.  But it does not have to be so.  You can choose any other column as the clustered index, then the data rows will be physically ordered by that column.

How are the queries on the table going to be used?  If it is by and large going to be random access, then perhaps a DateInserted date column would be appropriate.  If there will frequent access of a customer table by territory number, it may be best to set up a clustered index by that column.  Or, choose not to have any clustered index, and rows will always be added to the end of the last data page, and time-consuming page splits will never occur.

There is no rule, in practice or theory, that the primary key must be a clustered index.  A little more thought on how the table will be used and accessed can resolve the fragmentation issue, and provide better performance.

Hope this helps


Thursday, October 23, 2008 - 9:15:52 AM - DiverKas Read The Tip

[quote user="cbasoz"]

I found the conclusions at the end of the article a little bit biased to the identity usage. OK so far so good, IDENTITY has advantages. Doesn't it have any clear disadvantages? Does it really outperform GUIDs in performance?

I am not an SQL DBA but more a developer and I think GUIDs have undeniable advantages over IDENTITY. ie:

-I can create the key independantly from the data store. I know it is unique so I wouldn't bother to create data on my laptop totally disconnected and later insert to main db without any change in keys.

-Not needing db store to learn the key also means I wouldn't need roundtrips to server. Those roundtrips might sum to a big amount. Performance wise maybe it is better to use GUIDs then?

I wonder, if GUIDs are disadvantegous, why would ever SQL server itself use GUIDs in replication? SQL server team developers might do that with integer key fix ups, but they chose GUID. Probably they thought using GUIDs is cheaper than doing key fix ups?

(just thinking loud)

[/quote]

 There is a rather large performance difference between comparing 4bytes and 16 bytes of data, especially in join conditions.  This isnt a trival difference.

Also, your assumption that GUIDs are unique, by creating them on the client is false.  GUIDs have a reasonable chance of being unique, it is not guaranteed, and I see cases monthly where it is not true.  Since the GUID on a client is generated based on several pieces of "hardware" and time, it is quite possible through oem vendor mistakes and luck of the draw to have the same GUID assigned on different hardware.

GUIDs as a general rule should only be used for replication and distributed data.  In general it is also best to generate the key from a single source, thereby reducing the chance of a collision to about nil.

For performance sake, and if replication and distributed data is not a concern, Identity columns are a HUGE winner.

 


Thursday, October 23, 2008 - 8:59:47 AM - aprato Read The Tip

I didn't take it negatively at all.  I understand what you're driving at.  


Thursday, October 23, 2008 - 8:36:40 AM - cbasoz Read The Tip

I was referring to section starting with:

"It's evident that using IDENTITY to auto-generate key values offers a few advantages over the GUID approaches"

and you say:

"The reason they probably chose GUIDs is due to their uniqueness across space and time."

It sounds like a very good reason to choose a surrogate primary key to me so in the same manner I could say:

It's evident that using GUID to auto-generate key values offers a few advantages over the IDENTITY approaches

this statement is a fact too but I didn't see any mention and naturally thought it as biased. I didn't have a negative intention just meant I felt the article was not completed.


Thursday, October 23, 2008 - 8:21:42 AM - aprato Read The Tip

 <<Yes - more or less.  You might also consider composite keys - using identity in conjunction with another column. You could also generate your own "unique" identifier based on some custom logic. You might even consider splitting identity ranges between tables. In this situation GUIDs would probably be the most attractive option IMO.>>

I would concur with Wiseman.  GUIDs may be the better option if you're working in a distributed environment. 


Thursday, October 23, 2008 - 7:59:01 AM - Wiseman82 Read The Tip

M[quote user="pmh4514"]

Thanks for the followups to my questions!

 [quote user="aprato"]The reason they probably chose GUIDs is due to their uniqueness across space and time.[/quote]

If uniqueness across space in time is a fundamental requirement of a given distributed database archicture, am I correct in saying that this requirement alone makes the decision for us, and requires the use of GUID over IDENTITY, despite any inherent performance issues that come with that decision?

[/quote]

Yes - more or less.  You might also consider composite keys - using identity in conjunction with another column. You could also generate your own "unique" identifier based on some custom logic. You might even consider splitting identity ranges between tables. In this situation GUIDs would probably be the most attractive option IMO.


Thursday, October 23, 2008 - 7:49:52 AM - pmh4514 Read The Tip

Thanks for the followups to my questions!

 [quote user="aprato"]The reason they probably chose GUIDs is due to their uniqueness across space and time.[/quote]

If uniqueness across space in time is a fundamental requirement of a given distributed database archicture, am I correct in saying that this requirement alone makes the decision for us, and requires the use of GUID over IDENTITY, despite any inherent performance issues that come with that decision?

 

 

 


Thursday, October 23, 2008 - 7:31:55 AM - aprato Read The Tip

 <<I found the conclusions at the end of the article a little bit biased to the identity usage>>

In reality, it wasn't bias - they were statements in fact.  Using IDENTITY results in less database bloat.  Fewer data and index pages are generated (less churn for DML statements) as well as less bloat of non-clustered indexes if the GUID is the clustered index.    

The reason they probably chose GUIDs is due to their uniqueness across space and time.


Thursday, October 23, 2008 - 7:24:53 AM - Wiseman82 Read The Tip

GUIDs do have benefits over identity columns but they come with quite a large overhead - 16 bytes compared to 4bytes for an integer.  Some of the problems with uniqueidentifiers can be overcome by using sequential guids, but identity is far better from a performance point of view.

Bottom line: Identities should be your first choice. If an identity column doesnít fit the bill, consider a GUID. 

Also, donít take peoples word for it.  Construct your own performance test to see how they perform in comparison to each other.  Iíve done so myself and found the difference to be significant.  The difference between sequential guids and non-sequential guids was also VERY significant.

 


Thursday, October 23, 2008 - 7:13:44 AM - aprato Read The Tip

 

<<Question 1 would be "how are tables defraged?">>

 Depends on your version of SQL Server

2000: DBCC DBREINDEX or DBCC INDEXDEFRAG
2005: ALTER INDEX specifying either REBUILD or REORGANIZE

You could also drop and re-create the clustered index

<<2. Describe "fill factor" - how would I implement that? What is it specifying?>>

Creating indexes with a FILL FACTOR % tells the engine how full you want the index leaf pages (i.e. a FILL FACTOR of 80 means keep 20% of the leaf free for future inserts - or 80% full).  It minimizes splits (an intensive operation where 1/2 of the data on a page is moved to a new page) because the free space can accommodate any random row that can fit on the page.  It's used in high volume enviorments where page splitting becomes a problem.

 

 


Thursday, October 23, 2008 - 6:47:48 AM - cbasoz Read The Tip

I found the conclusions at the end of the article a little bit biased to the identity usage. OK so far so good, IDENTITY has advantages. Doesn't it have any clear disadvantages? Does it really outperform GUIDs in performance?

I am not an SQL DBA but more a developer and I think GUIDs have undeniable advantages over IDENTITY. ie:

-I can create the key independantly from the data store. I know it is unique so I wouldn't bother to create data on my laptop totally disconnected and later insert to main db without any change in keys.

-Not needing db store to learn the key also means I wouldn't need roundtrips to server. Those roundtrips might sum to a big amount. Performance wise maybe it is better to use GUIDs then?

I wonder, if GUIDs are disadvantegous, why would ever SQL server itself use GUIDs in replication? SQL server team developers might do that with integer key fix ups, but they chose GUID. Probably they thought using GUIDs is cheaper than doing key fix ups?

(just thinking loud)


Thursday, October 23, 2008 - 6:45:33 AM - Wiseman82 Read The Tip

You defrag tables by doing an clustered index rebuild/reorganize.  BOL will give you more info on this.

A lower fill factor will leave room for more items to be inserted, reducing page splits and improving performance of inserts.  Lower fill factors will have a negative impact on select performance though.  With an auto-increasing key (identity/seq guid) you can use a high fill factor - new items will be inserted at the end of the index. Fill factors are specified when the index is created - BOL will provide you with more info.


Thursday, October 23, 2008 - 6:45:00 AM - aprato Read The Tip

<< There is a problem with this image - SQL Server actually sorts GUIDs by byte group from right to left >>

Yes, you're correct.  The image is meant to shows the differences in how the engine generates the value (i.e. the randomness of NEWID() vs NEWSEQUENTIALID())


Thursday, October 23, 2008 - 6:24:53 AM - pmh4514 Read The Tip

Very interesting..I'm putting together a new distributed DB design using GUIDs as unique identifiers to ensure uniquness across many computers. We are still in early development, so design and implementation changes happen daily as I work through everything. I found this article interesting because it may represent some things I can do early to prevent some of the inherent problems with using GUIDs as identities. But there were a few points I was unsure about.

The author described page splitting that occurs, when discussing the defragmentation differences between using an incremental integer for an identity vs. a GUID.  He wrote:

Though you can defragment the NEWID() table, the random nature of the key generation will still cause page splitting and fragmentation with all future table INSERTs. However, page splitting can be minimized by specifying an appropriate FILL FACTOR.

Question 1 would be "how are tables defraged?" (I am more a programmer than a DBA for what it's worth) and 2. Describe "fill factor" - how would I implement that? What is it specifying?

 

 

 


Thursday, October 23, 2008 - 4:06:44 AM - Wiseman82 Read The Tip

 There is a problem with this image - SQL Server actually sorts GUIDs by byte group from right to left.


Thursday, October 23, 2008 - 2:59:58 AM - jnollet Read The Tip

Great article ... I've found this out too and its important to think about up front in the development process.  Trying to change later can be difficult.




 
Sponsor Information







Copyright (c) 2006-2014 Edgewood Solutions, LLC All rights reserved
privacy | disclaimer | copyright | advertise | about
authors | contribute | feedback | giveaways | free t-shirt | user groups | community | events | first timer?
Some names and products listed are the registered trademarks of their respective owners.