By: Ben Snaidero
Overview
In this section we will cover what a SQL Server clustered index is and things you should know about clustered indexes.
What is a clustered index
A clustered index is a special index which physically orders the data according to the indexed columns. The leaf nodes of the index store the data for the rest of the columns in the table so when a lookup is performed on this type of index there are no other structures that need to be referenced.
Why create clustered indexes
There are a couple benefits to having a clustered index on a table. First when querying the table, it requires fewer IO operations since an extra lookup is not needed to get any/all of the non-key column data. This data is stored in the leaf node of the index. Second it gives us a way to reorganize the table data. If the table has no clustered index it is then stored in a heap structure. When our data becomes fragmented over time due to DML operations the only way to fix this fragmentation is to reload all the data into a new table. With a clustered index in place we can run and index reorganize or rebuild to address the fragmentation which in some cases can be done online while the table is still accessible to other processes.
When should clustered indexes be created
As a general rule of thumb is it's best to create a clustered index on a table when the same columns are heavily used in the WHERE clause portion of a query. Queries that perform a lot of range scans with the indexed columns in the WHERE clause can also benefit greatly from having a clustered index on these columns. In both cases since the data is output in basically the same order that it is queried we end up using less resources to execute the query. In all cases we will use less disk IO and in some cases (depending on the ORDER BY clause) we can save ourselves memory and CPU by not having to perform a sort since data is already ordered. The following example shows how no extra lookups are required to fetch the actual data and also that no sort is required as the data is already in the correct order.
SELECT * FROM Sales.SalesOrderDetail WHERE SalesOrderID BETWEEN 43755 AND 43759 ORDER BY SalesOrderID, SalesOrderDetailID;
How to create a clustered index
There are two ways that a clustered index can be created on a table, either through a primary key constraint or simply using the create index statement. In both cases we simply have to add the CLUSTERED keyword to either the ALTER TABLE or CREATE INDEX statement as shown in the below TSQL examples.
-- Adding clustered index through constraint ALTER TABLE Sales.Currency ADD CONSTRAINT PK_Currency_CurrencyCode PRIMARY KEY CLUSTERED (CurrencyCode ASC); -- Adding clustered index without constraint CREATE CLUSTERED INDEX IX_Person_LastNameFirstName ON Person.Person (LastName ASC,FirstName ASC);
How to change clustered indexes
It's important to note that since the data is stored as part of the index you can only have one clustered index defined on each table in your database.
If you already have a clustered index on a table but think that other columns in the table would be better suited as the columns for the clustered index then switching is a bit more complicated and does require some downtime. Since all non-clustered indexes on the table reference the row data in the clustered index if we simply drop the current clustered index and recreate a new one then all of the non-clustered indexes will end up being updated twice. It's updated once when the existing clustered index is dropped and all the data is moved into a heap. It will be updated a second time when the new clustered index is created and all the data is moved into this structure. With all these index updates they will probably end up being be very fragmented and also require a rebuild when all this maintenance is done.
A better solution would be to follow these steps:
- Drop all non-clustered indexes
- Drop clustered index
- Create new clustered index
- Re-create all non-clustered indexes (including dropped clustered index as non-clustered)
The following example goes through this process for the SalesOrderDetail table and changes the clustered index from the SalesOrderID and SalesOrderDetailID columns to the ProductID column.
DROP INDEX AK_SalesOrderDetail_rowguid ON Sales.SalesOrderDetail; DROP INDEX IX_SalesOrderDetail_ProductID ON Sales.SalesOrderDetail; ALTER TABLE Sales.SalesOrderDetail DROP CONSTRAINT PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID; CREATE CLUSTERED INDEX IX_SalesOrderDetail_ProductID ON Sales.SalesOrderDetail (ProductID ASC); ALTER TABLE Sales.SalesOrderDetail ADD CONSTRAINT PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID PRIMARY KEY NONCLUSTERED (SalesOrderID ASC, SalesOrderDetailID ASC); CREATE UNIQUE NONCLUSTERED INDEX AK_SalesOrderDetail_rowguid ON Sales.SalesOrderDetail (rowguid ASC);
Should the Primary key be the clustered index
As was noted earlier the main reason for creating a clustered index is to reduce the disk IO for queries where the same columns are heavily used in the WHERE clause as we will have to perform fewer lookups to get the actual column data. These columns that are in the WHERE clause most often may or may not be the primary key column(s). A simple example of this would be the case where the primary key column of a table is an identity column but most queries that hit the table using some other column(s) in the WHERE clause. If we look at the HumanResources.Employee table we can see that there are many other alternate key columns for this table and it could be the case that one of these is used more often in the WHERE clause. If that was the case then it would make sense to change one of these indexes to be clustered index and make the primary key non-clustered.