SQL Server Primary Key
By: Armando Prato
Every table should have a primary key. A primary key is a unique, non-null value that identifies a row in the database. One of the most highly debated topics in the SQL Server world is deciding what type of data best makes a primary key.
You should always have a way to uniquely identify and access a row. This is one of the tenets behind First Normal Form. What if column data needs to be updated or a specific row deleted? How would you be able to ascertain what row needs action? Beyond pure design, primary keys are required for some SQL Server features (such as Transactional Replication).
There are several common options for defining a primary key including:
- integer columns
- uniqueidentifier columns
- unique natural keys made up of one or more columns in the table
The first common option is to define an integer column as a primary key. You can have your application assign a value or you can assign an identity property to the integer column (i.e. int identity(1,1)) and let SQL Server generate the value for you (which is my preference).
If you're working with distributed applications or the application could become distributed in the future, an integer column is probably not your best bet. There will likely be primary key collosions when collecting data from various distributed sites since the same primary key value may exist in different databases. In this case, making a uniqueidentifier (GUID) column your primary key is a better bet. Like with the identity property for integers, you can choose to have SQL Server generate the value for you by specifying either NEWID() or NEWSEQUENTIALID(). What is the difference? NEWID() will randomly assign a value whereas NEWSEQUENTIALID() will assign a sequential value. This is an important consideration. My preference is to use NEWSEQUENTIALID() in this case especially if my primary key is the clustered index in my table since it mitigates logical fragmentation of the table.
Another common option is to use a natural key as your primary key. For instance, a natural primary key could be a combination of LastName, FirstName, and MiddleName. I'll admit I'm biased against this approach for a couple of reasons. First, all the columns should be NOT NULL (known values) for this to be effective. Second, any child tables would need to have this same group of columns defined to establish a foreign key relationship. Moreover, what if any of the columns in the natural primary key require a change? This would necessitate cascading through and updating all child entries.
There is no right answer here though you will read passionate debate on all sides. Evaluate the options available against your needs and choose what you're comfortable with.