SQL Server Index Basics
SQL Server offers two types of traditional indexes, clustered and non-clustered. I need to understand why I need indexes, the difference between the two, and when to choose one over the other. Once I understand that I need to know how to actually create them.
This tip will explain how indexes help SQL Server operate, the differences between the clustered and non-clustered indexes, how to choose which type to use, and give example scripts to make each kind.
Why do you need SQL Server indexes?
Imagine walking into a grocery that does not organize the goods in their aisles. Everything for sale is randomly placed on a shelf in a random aisle. If one wanted to purchase cake flour, they would have to go up and down every single aisle looking for cake flour until they found it. It would take a long time to find that one item. Contrast that with a grocery store that organizes their aisles and has overhead signage that directs patrons to an aisle containing "baking needs". The same customer would now be able to go direct to the baking needs aisle to find their cake flour. It would take considerably less time for a customer to find their item in this store compared to the first.
SQL Server operates in much the same way as a patron in a grocery store. Indexes are the method by which SQL Server sorts and organizes the rows in tables to make it easier to find the desired row(s). In the absence of organization SQL Server will look at every single row – an operation referred to as a scan. With the introduction of indexes SQL Server can more quickly find the row(s) it needs. An operation that uses an index to jump to desired the part of a table is called a seek.
SQL Server Index types
The first type of index offered by SQL Server is a clustered index. The clustered index is the method by which SQL Server sorts the actual data table. Because the entire table is only stored on disk once and that single copy can only be sorted one way, there can be only one clustered index per table. The index is allowed to include more than one column.
What if there is a need to sort the same table more than one way? For instance, a table where each row represents one person may need to be sorted by a government identification number for some queries, by a family name for other queries, and by a login name for yet another group of queries. If the clustered index sorts the table by government identification number and another clustered index cannot be made, how can this be accomplished? The answer is non-clustered indexes. A non-clustered index is a copy of a subset of columns from the table sorted by those copied columns rather than the clustered index columns. There can be many non-clustered indexes on a single table. In this example, two non-clustered indexes could be created sorting the people by family name in one and by login name in another. Now all 3 types of queries will be able to seek rather than scan.
Advantages and Disadvantages of both types
The clustered index essentially is the table. It contains all of the columns of the table rather than just a subset of columns. This means that queries that use the clustered index to perform their seek will be able to return any and every column they need as they are all stored on the index. On the downside, any updates to the column(s) that make up the clustered index may cause the entire row to have to be moved to another place in the clustered index. On a wide table (many columns) that can be a very resource intensive operation.
A non-clustered index contains only a subset of columns so a query that uses it for a seek will quickly find the row(s) desired, but may have to make a second query, called a key lookup, against the clustered index to return any columns requested that aren’t part of the index. The idea of a second query happening sounds scary, but that all happens automatically within the SQL Server engine and requires no effort on the part of the developer.
Since the non-clustered index has fewer columns than the clustered index, updates to it that require the row copy to be moved within the non-clustered index are not as resource intensive.
Use cases for SQL Server clustered and non-clustered indexes
Every table, with almost no meaningful exception, should have a clustered index. Since there can be only one clustered index much care must be made when selecting which column or group of columns will be the clustered index. After making a selection for the clustered index, all other columns that need to be sorted for the purpose of turning query scans into seeks would necessarily need to be non-clustered indexes.
When choosing a column or group of columns to be the clustered index of a table there are several considerations. First, clustered indexes should be unique. This isn’t an absolute requirement enforced by SQL Server, but when designing a table, a clustered index should be a very firm rule enforced by the data architect.
Some other considerations for a good clustered index are to include only columns that never change values once they are created. Such columns are called immutable and some examples could be an identity column, an order number, a government ID number or a GUID. For a table that gets many new rows something that is ever increasing, like an identity column, can be a wise choice. Another good property of a clustered index is for it to be as narrow as possible. If there is debate between 2 or more options that are both unique and immutable (maybe an employee ID number and a government ID number) choose the narrower column.
Now that the clustered index is selected a series of non-clustered indexes can now be created. Try to create non-clustered indexes on columns that are likely to be searched. In the event of a person table where the clustered index is on the EmployeeID this might mean an index on (FamilyName, GivenName), (GovernmentID), and (HomeOffice). Be sure to review the indexes after the system has been under load for a period to see how well the indexes are performing. This tip explains how to use SQL Server DMVs to determine the efficacy of the current index structure and get recommendations on how to fix and issues.
Performance considerations for SQL Server Indexes
The number one rule of indexing is to understand that indexes are not magic methods of increasing performance. Too often a new SQL Server professional realizes how much a well-placed index can improve the speed of a query and gets the idea to index all of the columns. Surely, now all of the queries will be fast, right? Wrong. Very wrong.
Indexes are not free. They must be created, stored, updated, and maintained – a process which will be covered briefly later in this tip. Every time a row is inserted, deleted, or updated in a table; all of the non-clustered indexes that include any effected columns must also be updated. This is known as the write penalty. It’s worth paying that penalty on an index that is used often, but not on indexes that are not often being used to perform seek operations. Keep this write penalty in mind when creating new indexes as having too many can hinder performance more than any performance gains earned by having the indexes. If a table has many indexes or duplicate indexes then removing some of those that are used less often can help performance.
Sample Scripts to create clustered and non-clustered indexes
In this section, 2 tables will be created and some sample indexes will be created against them. Feel free to copy and paste this code into SSMS or ADS to see how it works. Change some of the columns around and experiment in a no-risk environment.
Notice that the clustered index on both tables use the unique constraint and are on single, narrow columns. The indexes on the other tables are allowed, but do not need to consider these options. The non-clustered indexes are placed on columns that are likely to be searched, not on every column on the tables.
USE TempDB; GO CREATE TABLE Customer (CustomerCode CHAR(6) , FamilyName NVARCHAR(50) , GivenName NVARCHAR(50) , AddressLine1 NVARCHAR(100) , AddressLine2 NVARCHAR(100) , City NVARCHAR(64) , StateProvince NVARCHAR(64) , PostalCode CHAR(8) , EMailAddress NVARCHAR(255) , SalesYTD MONEY); CREATE TABLE Orders (OrderNumber INT IDENTITY (1000,1) , OrderDate DATETIME2(0) , CustomerCode CHAR(6) , InvoiceTotal MONEY); CREATE UNIQUE CLUSTERED INDEX CX_CUSTOMER ON Customer (CustomerCode); CREATE NONCLUSTERED INDEX IX_NAME ON Customer (FamilyName, GivenName); CREATE NONCLUSTERED INDEX IX_EMAIL ON Customer (EMailAddress); CREATE UNIQUE CLUSTERED INDEX CX_ORDERS ON Orders (OrderNumber); CREATE NONCLUSTERED INDEX IX_DATE ON Orders (OrderDate); CREATE NONCLUSTERED INDEX IX_CUSTOMER ON Orders (CustomerCode);
SQL Server Clustered Index and Non-Clustered Index Maintenance Best Practices
When indexes are first created, they have all of the rows neatly sorted contiguously on disk. As time goes by, some of the rows will surely change which can necessitate that they move to a new part of the index. This happens, for example, when someone moves to a new city and the index that tracks city needs to move that row to a new place on disk. New rows show up and older rows are deleted. SQL Server does its best to fit these rows into open spaces within the many data pages that make up the index, but that can’t always happen. Over time most indexes will end up with pages being all over the place and not in the best order. This is called fragmentation of the indexes. Index maintenance is the process by which the indexes are defragmented.
This tip will focus on using the maintenance plan wizard to perform 3 tasks of index maintenance. There are a whole host of tips dedicated to covering this process in much more detail which may be useful, especially in larger environments.
Start by opening SSMS and browsing to Maintenance Plans, right click on the object and choose Maintenance Plan Wizard from the context menu.
One of the first steps will ask for a schedule to be created. Some of the operations that will be called are blocking operations. The others do not block, but they are still very busy operations and can hamper performance. For these reasons, the job needs to be scheduled during a period of low usage.
A busier SQL Server may need to see this run every day. Others may be able to get away with running it only weekly. Consider the maintenance windows allowed, how busy the server is, and make a selection. The job schedule will look exactly like a SQL Server Agent Job because it will be run by the SQL Server Agent.
Click "Next" until this "Select Maintenance Tasks" screen comes up. Select these 3 options. The first 2 are defragmentation related and the 3rd item has to do with updating meta data about the indexes that is the final step in index maintenance.
On the next screen, this author prefers this order of operations.
Choose a target database or group of databases on the next screen. This example shows "All databases". This example also shows a 10% sample rate which is enough for most common scenarios, but may be too high or low in others. Start with 10 percent and consider fine tuning the process as you become more familiar with it.
On the next screen, select the same database list as the previous screen. Check all 4 boxes on the screen along with the "Fast" radio button. This author prefers to start with 10, 1000, and 7 and the numeric values, but this is another scenario where fine tuning will be valuable as you learn more about the process.
The next screen has a few extra options than the last, but those can be ignored and left at the default. Make the final 3 options the same as were chosen in the prior step, except the fragmentation percentage which needs to be a number larger that that which was selected in the prior step. 25 can be a good starting point.
Click next or finish until the wizard is closed.
About the author
View all my tips