By: Ben Snaidero | Comments (7) | Related: > Indexing
Problem
I've read many SQL Server best practice documents that state every table should have a clustered index as opposed to leaving it as a heap structure with nonclustered indexes. Most of the documentation notes the administrative benefits of using clustered indexes. Are there any performance implications, either positive or negative, from implementing a clustered index on a table?
Solution
In order to test the performance of each of these types of table/index structures we will create two identical tables with one difference. One will have a primary key created with a clustered index and the other will have the primary key created using a nonclustered index so the table data will stay in the heap structure. Below is the T-SQL to create and load sample data into these tables.
-- Create table and indexes CREATE TABLE testtable ([col1] [int] NOT NULL PRIMARY KEY CLUSTERED, [col2] [int] NULL, [col3] [int] NULL, [col4] [varchar](50) NULL, [col5] uniqueidentifier); -- Load sample data into table DECLARE @val INT SELECT @val=1 WHILE @val < 5000000 BEGIN INSERT INTO testtable (col1, col2, col3, col4, col5) VALUES (@val,round(rand()*100000,0), round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid()) SELECT @val=@val+1 END GO -- Create sample table and indexes CREATE TABLE testtable2 ([col1] [int] NOT NULL PRIMARY KEY NONCLUSTERED, [col2] [int] NULL, [col3] [int] NULL, [col4] [varchar](50) NULL, [col5] uniqueidentifier); INSERT INTO testtable2 SELECT * FROM testtable;
One thing I noticed right away was that the heap table was using more space since the table and index are separate structures. With the clustered index the data is stored in the leaf pages of the index so less space is used. Below is a chart that shows how much space is used by each object after the initial load.
TableName | IndexName | Used (KB) | Reserved (KB) | Rows |
---|---|---|---|---|
testtable | PK__testtabl__357D0D3E3D086A66 | 257952 | 257992 | 4999999 |
testtable2 | HEAP | 256992 | 257032 | 4999999 |
testtable2 | PK__testtabl__357D0D3F2CBA35D8 | 89432 | 89608 | 4999999 |
Now that we have some data loaded let's perform another data load and monitor the performance using SQL Profiler and check the space usage after the inserts complete. Below is the T-SQL to load another 100,000 records each table.
-- insert when all pages are full DECLARE @val INT SELECT @val=5000000 WHILE @val < 5100000 BEGIN INSERT INTO testtable (col1, col2, col3, col4, col5) VALUES (@val,round(rand()*100000,0), round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid()) SELECT @val=@val+1 END GO DECLARE @val INT SELECT @val=5000000 WHILE @val < 5100000 BEGIN INSERT INTO testtable2 (col1, col2, col3, col4, col5) VALUES (@val,round(rand()*100000,0), round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid()) SELECT @val=@val+1 END GO
Looking first at the space usage as one would expect (since the pages were all full ie. nothing has been deleted from the table) each object is using proportionately more space. Both the heap and clustered index increased in size by almost the same amount.
TableName | IndexName | Used (KB) | Reserved (KB) | Rows |
---|---|---|---|---|
testtable | PK__testtabl__357D0D3E3D086A66 | 263128 | 263176 | 5099999 |
testtable2 | HEAP | 262392 | 262472 | 5099999 |
testtable2 | PK__testtabl__357D0D3F2CBA35D8 | 91216 | 91272 | 5099999 |
A little more interesting is the SQL Profiler output. Due to the fact that there are two objects to be updated the nonclustered index and heap table insert requires slightly more CPU and performs more reads and writes than the clustered index insert. It also takes slightly longer to complete.
Index Type | CPU (ms) | Reads | Writes | Duration (ms) |
---|---|---|---|---|
CLUSTERED | 3500 | 304919 | 654 | 11288 |
HEAP | 3890 | 406083 | 904 | 11438 |
Let's now randomly delete some data and then run another 100,000 record insert to see if having free space on some of the pages in the table makes any difference with performance. Below is the T-SQL to perform both the delete and the data load.
-- remove 1000000 random records from each table DELETE FROM testtable WHERE col1 in (SELECT TOP 1000000 col1 FROM testtable ORDER BY newid()); DELETE FROM testtable2 WHERE col1 not in (SELECT col1 FROM testtable); GO -- insert when there is free space in pages DECLARE @val INT SELECT @val=5100000 WHILE @val < 5200000 BEGIN INSERT INTO testtable (col1, col2, col3, col4, col5) VALUES (@val,round(rand()*100000,0), round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid()) SELECT @val=@val+1 END GO DECLARE @val INT SELECT @val=5100000 WHILE @val < 5200000 BEGIN INSERT INTO testtable2 (col1, col2, col3, col4, col5) VALUES (@val,round(rand()*100000,0), round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid()) SELECT @val=@val+1 END GO
Looking at the SQL Profiler output we see a much larger performance gap between the two types of structures this time with the table with the clustered index having much better performance.
Index Type | CPU (ms) | Reads | Writes | Duration (ms) |
---|---|---|---|---|
CLUSTERED | 3562 | 304859 | 653 | 10334 |
HEAP | 4973 | 422142 | 7053 | 13042 |
This difference is most likely due to the fact that when inserting data into a heap table the database engine will search for empty space within each page to store the new data being inserted. This is done since the data in the heap table is unsorted so it can go anywhere. The table with the clustered index simply has data appended to the end of the clustered index since the primary key is increasing. Let's also take a look at the space usage for the table. If the heap structure is indeed reusing space it should not have grown any more since we deleted 1,000,000 records and only inserted 100,000. Below is the table sizes after this last batch of inserts completed. We can see here that the size of the heap table has not changed confirming our earlier assumption.
TableName | IndexName | Used (KB) | Reserved (KB) | Rows |
---|---|---|---|---|
testtable | PK__testtabl__357D0D3E3D086A66 | 268304 | 268360 | 4199999 |
testtable2 | HEAP | 262392 | 262472 | 4199999 |
testtable2 | PK__testtabl__357D0D3F2CBA35D8 | 93008 | 93064 | 4199999 |
Summary
In this test case we showed that using a clustered index gave us better performance in every category that we measured, CPU, IO and total duration. The only side effect of using a clustered index was that it did use a little bit more space. This space could also be reclaimed by rebuilding the index during a maintenance window (or online if you are running Enterprise Edition). In any case, before implementing any index changes in production you should always test in your own environment to ensure you also see a positive improvement in performance.
Next Steps
- More tips on using clustered indexes:
- More tips on indexing:
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips