Nice article on the advantages/disadvantages of identity and uniqueidentifier keys. However, there is a relatively simple solution to the problem of fragmentation when using uniqueidentifier keys: don't use the primary key as a clustered index.
By default, when you create a primary key, it is created as a clustered index. But it does not have to be so. You can choose any other column as the clustered index, then the data rows will be physically ordered by that column.
How are the queries on the table going to be used? If it is by and large going to be random access, then perhaps a DateInserted date column would be appropriate. If there will frequent access of a customer table by territory number, it may be best to set up a clustered index by that column. Or, choose not to have any clustered index, and rows will always be added to the end of the last data page, and time-consuming page splits will never occur.
There is no rule, in practice or theory, that the primary key must be a clustered index. A little more thought on how the table will be used and accessed can resolve the fragmentation issue, and provide better performance.
Hope this helps
That is a great point. Its silly for Microsoft to have left the clustered index on an identity or guid column for that matter. It makes no sense and certainly lends nothing to the performance. It is usually the first thing I have to change on a newly created table.
Choosing another column for the clustered index might get around the fragmentation issue, but the primary key is often the best candidate for the clustered index. Sequential GUIDs might be a better option to get around the fragmentation issue. Also remember that the fragmentation issue would apply (to a lesser extent) to a non-clustered index created on the GUID column.
It's normally a good idea for the clustered index to be unique - if it's not SQL Server will add a uniqueifier to make the clustered index unique (additional overhead). Due to the fragmentation issues already mentioned, it's also a good idea to to have an auto-incermenting clustered index - Identities and Sequential GUIDs both fit the bill here. Another thing you need to consider is if the chosen column(s) for the clustered index are likely to be updated - Ideally you want to pick a column that is never/rarely updated - the primary key also fits the bill here.
I'd advise people to keep the clustered index as the primary key, unless they have a good reason for choosing a different column (MS made this a default for good reason). I'm not saying that you always should use the primary key as a clustered index, but it's not a bad choice in the absence of another candidate for the clustered index.
In terms of query performance (rather than write performance), you might want to use a column that is frequently used in ORDER BY, GROUP BY, JOINS (foreign keys) or where the column is often used to select a range of values etc.
As mentioned, a date column might be a good candidate if it's populated with a GetDate()/GetUTCDate() - you would expect a very low number of duplicate values and the column will be auto-incrementing. It might also unlikely that the column will ever be updated. If queries frequently sort on this column and filter it by a range of values then it would be a good candidate for the clustered index.
Also note that best practice will vary depending on the type of database. If you are creating a data warehouse/reporting database rather than an OLTP database, you will have different priorities.