Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Surrogate Key vs Natural Key Differences and When to Use in SQL Server


By:   |   Read Comments (5)   |   Related Tips: More > Database Design

Attend this free live MSSQLTips webcast

Leveraging Storage Spaces Direct for SQL Server High Availability

Thursday, July 19, 2018 - click here to learn more


Problem

If you polled any number of SQL Server database professionals and asked the question, "Which is better when defining a primary key, having surrogate or natural key column(s)?", I'd bet the answer would be very close to a 50/50 split.  About the only definitive answer you will get on the subject is most people agree that when implementing a data warehouse, you have to use surrogate keys for your dimension and fact tables.  This is because a source system can change at any time due to business requirements and your data warehouse should be able to handle these changes without needing any updates.  This tip will go through some of the pros and cons of each type of primary key so that you can make a better decision when deciding which one to implement in your own environments.

Solution

Before we get into the pros and cons let's first make sure we understand the difference between a surrogate and natural key. 

Surrogate Key Overview

A surrogate key is a system generated (could be GUID, sequence, etc.) value with no business meaning that is used to uniquely identify a record in a table.  The key itself could be made up of one or multiple columns.  The following diagram shows an example of a table with a surrogate key (AddressID column) along with some sample data.  Notice the key itself has no business meaning, it's just a sequential integer.

sql server table design
sql server results

Natural Key Overview

A natural key is a column or set of columns that already exist in the table (e.g. they are attributes of the entity within the data model) and uniquely identify a record in the table.  Since these columns are attributes of the entity they obviously have business meaning.  The following is an example of a table with a natural key (SSN column) along with some sample data.  Notice that the key for the data in this table has business meaning.

sql server table design
sql server results

Since this topic has been debated for years with no definitive answer as to which is better I thought with this tip I would put together a list of all the pros and cons of each type of key.  This list can then be used as a reference when deciding what type of key would be best suited for your own environment/application.  After all, everyone's requirements are different.  What works or performs well in one application might not work so well in another.

Natural Key Pros

  • Key values have business meaning and can be used as a search key when querying the table
  • Column(s) and primary key index already exist so no disk extra space is required for the extra column/index that would be used by a surrogate key column
  • Fewer table joins since join columns have meaning.  For example, this can reduce disk IO by not having to perform extra reads on a lookup table

Natural Key Cons

  • May need to change/rework key if business requirements change.  For example, if you used SSN for your employee as in the example above and your company expands outside of the United States not all employees would have a SSN so you would have to come up with a new key.
  • More difficult to maintain if key requires multiple columns.  It's much easier from the application side dealing with a key column that is constructed with just a single column.
  • Poorer performance since key value is usually larger and/or is made up of multiple columns.  Larger keys will require more IO both when inserting/updating data as well as when you query.
  • Can't enter record until key value is known.  It's sometimes beneficial for an application to load a placeholder record in one table then load other tables and then come back and update the main table.
  • Can sometimes be difficult to pick a good key.  There might be multiple candidate keys each with their own trade-offs when it comes to design and/or performance.

Surrogate Key Pros

  • No business logic in key so no changes based on business requirements.  For example, if the Employee table above used a integer surrogate key you could simply add a separate column for SIN if you added an office in Canada (to be used in place of SSN)
  • Less code if maintaining same key strategy across all entities.  For example, application code can be reused when referencing primary keys if they are all implemented as a sequential integer.
  • Better performance since key value is smaller.  Less disk IO is required on when accessing single column indexes.
  • Surrogate key is guaranteed to be unique.  For example, when moving data between test systems you don't have to worry about duplicate keys since new key will be generated as data is inserted.
  • If a sequence used then there is little index maintenance required since the value is ever increasing which leads to less index fragmentation.

Surrogate Key Cons

  • Extra column(s)/index for surrogate key will require extra disk space
  • Extra column(s)/index for surrogate key will require extra IO when insert/update data
  • Requires more table joins to child tables since data has no meaning on its own.
  • Can have duplicate values of natural key in table if there is no other unique constraint defined on the natural key
  • Difficult to differentiate between test and production data.  For example, since surrogate key values are just auto-generated values with no business meaning it's hard to tell if someone took production data and loaded it into a test environment.
  • Key value has no relation to data so technically design breaks 3NF
  • The surrogate key value can't be used as a search key
  • Different implementations are required based on database platform.   For example, SQL Server identity columns are implemented a little bit different than they are in Postgres or DB2.

Summary

As mentioned above it's easy to see why this continues to be debated.  Each type of key has a similar number of pros and cons.  If you read through them though you can see how based your requirements some of the cons might not even apply in your environment.  If that's the case then it makes it much easier to decide which type of key is the best fit for your application.

Next Steps


Last Update:


next webcast button


next tip button



About the author
MSSQLTips author Ben Snaidero Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on performance tuning.

View all my tips





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Wednesday, April 18, 2018 - 8:21:47 PM - Joe Celko Back To Top

I wish more people would read Codd's original work. His definition of a surrogate key is that it is hidden from the view of the user, and the engine uses it to build the joins or other constructs. Think of a hash code or something, it's only used by the engine and never exposed. Unfortunately, the SQL Server community wants to define it is something they actually build themselves and expose. Obviously, you have to keep the "natural" keys for data integrity, and then carry the extra burden of the exposed surrogates. Given modern hardware and software, it's not that much trouble to use insanely long natural keys for joins. 

 


Monday, April 16, 2018 - 12:19:57 PM - JRStern Back To Top

Well, here are a couple more very big factors.  First, that most SQL Server pros, most of the time, do use surrogate keys, most frequently an identity int or bigint, sometimes a GUID.  And that they even use this as the clustered PK more often than not.  And second, that they do this for a good reason, and that's because the CK and PK have special uses in SQL Server, the nonclustered keys go through them, they are used to validate FKs, and more.  SQL Server does not really separate the logical and physical implementations that well.  This causes surrogates to be much more highly used in SQL Server than might otherwise be true. I'd say also that the optimizer often has trouble with multi-field indexes, but that's a whole separate discussion.

 


Monday, April 16, 2018 - 9:28:36 AM - Adel Yousuf Back To Top

Good Topic


Monday, April 16, 2018 - 3:52:24 AM - Arno Tolmeijer Back To Top

 Hi Ben,

 

Great article, but I miss one point: due to security regulations, such as GDPR, encryption and data masking may influence to usability of a natural key. Greetings, Arno Tolmeijer


Monday, April 16, 2018 - 2:48:13 AM - Vinod Arvind Bhilare Back To Top

 

 Hi ,

 

It help us alot for me to improve my SQL knowledge


Learn more about SQL Server tools