By: Armando Prato
Overview
We will likely want indexes to help queries. One of the most important indexing decisions is to determine which column(s) will make up the clustered index.
Explanation
You may or may not have read it before but I'll re-hash it here. You get only 1 clustered index per table since the clustered index determines the logical ordering of the data in the table and it makes up the actual table itself. Not having a clustered index should be conscious decision that should be considered carefully.
How can a clustered index help and why do you want one? If you're grouping, sorting, or range searching large sets of data, clustering on the column(s) involved in your query can considerably speed up the query and will consume fewer resources (i.e. CPU, memory). The data that makes up a clustered index is stored as a B tree structure with the data itself linked in doubly linked list fashion in clustered key order making it ideal for sorting, grouping and range queries.
Consider this example where we SELECT a range of rows from AdventureWorks table Production.TransactionHistory using the clustered, unique primary key on TransactionID:
set statistics io on go select * from Production.TransactionHistory where TransactionID between 211000 and 211200 go
Examining the I/O counts we see that 6 pages of data needed to be read to satisfy the request
Now let's drop and re-create the primary key as a non-clustered one.
alter table Production.TransactionHistory drop constraint PK_TransactionHistory_TransactionID go alter table Production.TransactionHistory add constraint PK_TransactionHistory_TransactionID primary key nonclustered (TransactionID) go
Re-running the same query and re-examining the I/O counts we see that 205 pages of data are now needed to satisfy the same request
SQL Server automatically makes the primary key a clustered index (if not defined otherwise). The primary key does not have to be the clustered index of the table if other columns make more sense to support critical queries.
The choice of the clustered index is a balancing act between query performance vs. table health. Highly volatile tables (i.e. tables that experience a lot of inserts), can experience severe logical fragmentation. If your clustered index is on not on monotonically increasing data (i.e. an identity, sequentially increasing datetime, sequential uniqueidentifier) and the table is highly volatile, you will experience rapid fragmentation of the table which can result in extra I/O to retrieve the data to satisfy a query request.
If no columns make good candidates for a clustered index, I usually make an available monotonically increasing value the clustered index.
Additional Information
- Clustered Tables vs. Heap Tables
- SQL Server Tables Without A Clustered Index
- SQL Server Fragmentation Storage basics and Access Methods Part 1 of 9