Learn more about SQL Server tools

mssqltips logo
giveaway
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

SQL Server Insert Performance for Clustered Indexes vs. Heap Tables


By:   |   Read Comments (4)   |   Related Tips: More > Indexing

Problem

I've read many SQL Server best practice documents that state every table should have a clustered index as opposed to leaving it as a heap structure with nonclustered indexes.  Most of the documentation notes the administrative benefits of using clustered indexes.  Are there any performance implications, either positive or negative, from implementing a clustered index on a table?

Solution

In order to test the performance of each of these types of table/index structures we will create two identical tables with one difference.   One will have a primary key created with a clustered index and the other will have the primary key created using a nonclustered index so the table data will stay in the heap structure.  Below is the T-SQL to create and load sample data into these tables.

-- Create table and indexes
CREATE TABLE testtable ([col1]  [int] NOT NULL PRIMARY KEY CLUSTERED,
                        [col2] [int]  NULL,
                        [col3] [int]  NULL,
                        [col4] [varchar](50) NULL,
                        [col5] uniqueidentifier); 
 
-- Load sample data into table
DECLARE @val INT
SELECT @val=1
WHILE @val < 5000000
BEGIN  
   INSERT INTO testtable (col1, col2, col3, col4, col5) 
       VALUES (@val,round(rand()*100000,0),
               round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid())
   SELECT @val=@val+1
END
GO
 
-- Create sample table and indexes
CREATE TABLE testtable2 ([col1]  [int] NOT NULL PRIMARY KEY NONCLUSTERED,
                         [col2] [int]  NULL,
                         [col3] [int]  NULL,
                         [col4] [varchar](50) NULL,
                         [col5] uniqueidentifier); 
 
INSERT INTO testtable2 SELECT * FROM testtable;
   

One thing I noticed right away was that the heap table was using more space since the table and index are separate structures.  With the clustered index the data is stored in the leaf pages of the index so less space is used.  Below is a chart that shows how much space is used by each object after the initial load.

TableName IndexName Used (KB) Reserved (KB) Rows
testtable PK__testtabl__357D0D3E3D086A66 257952 257992 4999999
testtable2 HEAP 256992 257032 4999999
testtable2 PK__testtabl__357D0D3F2CBA35D8 89432 89608 4999999

Now that we have some data loaded let's perform another data load and monitor the performance using SQL Profiler and check the space usage after the inserts complete.  Below is the T-SQL to load another 100,000 records each table.

-- insert when all pages are full
DECLARE @val INT
SELECT @val=5000000
WHILE @val < 5100000
BEGIN  
   INSERT INTO testtable (col1, col2, col3, col4, col5) 
       VALUES (@val,round(rand()*100000,0),
               round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid())
   SELECT @val=@val+1
END
GO
 
DECLARE @val INT
SELECT @val=5000000
WHILE @val < 5100000
BEGIN  
   INSERT INTO testtable2 (col1, col2, col3, col4, col5) 
       VALUES (@val,round(rand()*100000,0),
               round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid())
   SELECT @val=@val+1
END
GO
   

Looking first at the space usage as one would expect (since the pages were all full ie. nothing has been deleted from the table) each object is using proportionately more space. Both the heap and clustered index increased in size by almost the same amount.

TableName IndexName Used (KB) Reserved (KB) Rows
testtable PK__testtabl__357D0D3E3D086A66 263128 263176 5099999
testtable2 HEAP 262392 262472 5099999
testtable2 PK__testtabl__357D0D3F2CBA35D8 91216 91272 5099999

A little more interesting is the SQL Profiler output.  Due to the fact that there are two objects to be updated the nonclustered index and heap table insert requires slightly more CPU and performs more reads and writes than the clustered index insert.  It also takes slightly longer to complete.

Index Type CPU (ms) Reads Writes Duration (ms)
CLUSTERED 3500 304919 654 11288
HEAP 3890 406083 904 11438

Let's now randomly delete some data and then run another 100,000 record insert to see if having free space on some of the pages in the table makes any difference with performance.  Below is the T-SQL to perform both the delete and the data load.

-- remove 1000000 random records from each table
DELETE FROM testtable WHERE col1 in (SELECT TOP 1000000 col1 FROM testtable ORDER BY newid());
DELETE FROM testtable2 WHERE col1 not in (SELECT col1 FROM testtable);
GO
 
-- insert when there is free space in pages
DECLARE @val INT
SELECT @val=5100000
WHILE @val < 5200000
BEGIN  
   INSERT INTO testtable (col1, col2, col3, col4, col5) 
       VALUES (@val,round(rand()*100000,0),
               round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid())
   SELECT @val=@val+1
END
GO
 
DECLARE @val INT
SELECT @val=5100000
WHILE @val < 5200000
BEGIN  
   INSERT INTO testtable2 (col1, col2, col3, col4, col5) 
       VALUES (@val,round(rand()*100000,0),
               round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid())
   SELECT @val=@val+1
END
GO
   

Looking at the SQL Profiler output we see a much larger performance gap between the two types of structures this time with the table with the clustered index having much better performance.

Index Type CPU (ms) Reads Writes Duration (ms)
CLUSTERED 3562 304859 653 10334
HEAP 4973 422142 7053 13042

This difference is most likely due to the fact that when inserting data into a heap table the database engine will search for empty space within each page to store the new data being inserted.  This is done since the data in the heap table is unsorted so it can go anywhere.  The table with the clustered index simply has data appended to the end of the clustered index since the primary key is increasing.   Let's also take a look at the space usage for the table.  If the heap structure is indeed reusing space it should not have grown any more since we deleted 1,000,000 records and only inserted 100,000.  Below is the table sizes after this last batch of inserts completed.  We can see here that the size of the heap table has not changed confirming our earlier assumption.

TableName IndexName Used (KB) Reserved (KB) Rows
testtable PK__testtabl__357D0D3E3D086A66 268304 268360 4199999
testtable2 HEAP 262392 262472 4199999
testtable2 PK__testtabl__357D0D3F2CBA35D8 93008 93064 4199999 

Summary

In this test case we showed that using a clustered index gave us better performance in every category that we measured, CPU, IO and total duration. The only side effect of using a clustered index was that it did use a little bit more space.  This space could also be reclaimed by rebuilding the index during a maintenance window (or online if you are running Enterprise Edition).  In any case, before implementing any index changes in production you should always test in your own environment to ensure you also see a positive improvement in performance.

Next Steps


Last Update:






About the author
MSSQLTips author Ben Snaidero Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on performance tuning.

View all my tips
Related Resources





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


SQL tips:

*Enter Code refresh code     



Friday, July 07, 2017 - 11:03:34 AM - Chris Back To Top

Hello,

Thank you for this article. It is definitely a good discussion starting point, but I agree more with the other commenters. As Uwe pointed out, you're inserting into two structures for the heap and only one for the clustered index. And, as sheen pointed out, you're not necessarily grabbing and inserting into the middle of those structures when you're performing your inserts and deletes.

The other thing I noticed is that you're also assuming that your primary key is an increasing integer. What if you have a composite primary key based off of birthdate and zip code? I know that a solid primary key will be incremental, but that is not always the case. I have seen plenty of code that used non-sequential GUIDs as the primary key. That would almost certainly slow things down for the clustered index inserts, as it has to make the comparisons and then insert new data pages or move data to other pages as the new data that falls between ranges is added.  There are plenty of online discussions regarding the creation of primary and foreign keys, so I don't want to get into that here, but you must also realize that not every clustered index will be in a nice, tidy, incremental format.

I appreciate the work you've done here for this article. Thank you for sharing.


Friday, July 07, 2017 - 10:07:24 AM - Ben Snaidero Back To Top

 @sheen

I would assume performance would be similar to the delete/insert test case I did above where I randomly deleted data since for the heap it would still search for free space and then update the index accordingly and the clustered index delete/insert would be exactly the same as my case.  You could test this theory yourself using the setup script from the tip to confirm.

Thanks for reading


Friday, July 07, 2017 - 8:58:20 AM - sheen Back To Top

 Hi,

Thanks for doing this comparision.

What is the performance result if you try to delete and insert some data in the middle of the range (like < 5mil) ? Don't delete and insert the same data.

 


Friday, July 07, 2017 - 8:15:28 AM - Uwe Ricken Back To Top

 

 Hi Ben,

interesting article but to be honest - you are dealing with different scenarios. To have a really comparision between CI and HEAP you shouldn't have an additonal index on the heap. :)

The CI will have only ONE insert but the HEAP has to deal with the insert in the table AND the NCI.

From my experience a heap can be so much better than a CI if you distribute the table on different files in a file group. The real hotspot in a HEAP is the PFS.


Learn more about SQL Server tools