Identify SQL Server Indexes With Duplicate Columns

Problem

Although not as much of a SQL Server performance issue as missing indexes where they could be used, having too many indexes defined on a table can cause a lot of unnecessary IO on your SQL Server database as well as use up additional disk space. This tip will provide a query that will identify any tables that have indexes that share leading columns so you can decide which ones are not actually needed and remove them.

Solution

Sample SQL Server Table Setup

In order to provide you with some test cases I’ve included some T-SQL to create a couple of sample tables. These two tables provide examples of the following test cases:

  1. Exact duplicate indexes
  2. Indexes that shares leading columns
  3. Indexes that share first column, but subsequent columns are different

Of course you can always skip this step and test the queries below in one of your own development environments. Either way here is the T-SQL to setup these sample tables.

-- Create sample table and indexes
CREATE TABLE testtable1 ([col1] [int] NOT NULL primary key  clustered,
                            [col2] [int]  NULL,
                            [col3] [int]  NULL,
                            [col4] [varchar](50) NULL); 
CREATE INDEX idx_testtable1_col2col3 on testtable1 (col2  asc, col3 asc);
CREATE INDEX idx_testtable1_col2col4 on testtable1 (col2  asc, col4 asc);
CREATE INDEX idx_testtable1_col3 on testtable1 (col3  asc);
CREATE INDEX idx_testtable1_col3col4 on testtable1 (col3  asc, col4 asc);
CREATE TABLE testtable2 ([col1] [int] NOT NULL primary key  clustered,
                        [col2] [int]  NULL,
                        [col3] [int]  NULL,
                        [col4] [varchar](50) NULL); 
 
CREATE INDEX idx_testtable2_col3col4 on testtable2 (col3  asc, col4 asc);
CREATE INDEX idx_testtable2_col3col4_1 on testtable2 (col3  asc, col4 asc);

Build Script

In an ideal world you would have a naming convention for your SQL Server indexes that would identify which columns make up the index. Since these types of standards don’t always exist it often happens that duplicate indexes can get created on tables and in order to identify these indexes we’ll have to use the following catalog views:

To build this query we will break the problem down into two steps. The first thing we will need is a list of all the tables and indexes with their associated columns. For the columns that make up the index we will create a comma separated list of values as is described in this link using the STUFF function and the FOR XML PATH clause. Here is the T-SQL to retrieve this listing. Note the WHERE clause which excludes any tables that are not user created.

select distinct object_name(i.object_id) tablename,i.name indexname,
        (select distinct stuff((select ', ' + c.name
                                  from sys.index_columns ic1 inner join 
                                       sys.columns c on ic1.object_id=c.object_id and 
                                                        ic1.column_id=c.column_id
                                 where ic1.index_id = ic.index_id and 
                                       ic1.object_id=i.object_id and 
                                       ic1.index_id=i.index_id
                                 order by index_column_id FOR XML PATH('')),1,2,'')
           from sys.index_columns ic 
          where object_id=i.object_id and index_id=i.index_id) as columnlist
  from sys.indexes i inner join 
       sys.index_columns ic on i.object_id=ic.object_id and i.index_id=ic.index_id inner join
       sys.objects o on i.object_id=o.object_id 
 where o.is_ms_shipped=0

Now that we have a complete list of all our tables and indexes (including their columns) we can use that output and join it to itself comparing the columnlist output column. All we have to do is join on table name and compare the columnlist column from the first result set with the substring of the columnlist column from the second result set using the length of the first columnlist column. Of course we will omit exact matches except where the index name is also different. Here is the T-SQL for this query.

select t1.tablename,t1.indexname,t1.columnlist,t2.indexname,t2.columnlist from
   (select distinct object_name(i.object_id) tablename,i.name indexname,
             (select distinct stuff((select ', ' + c.name
                                       from sys.index_columns ic1 inner join 
                                            sys.columns c on ic1.object_id=c.object_id and 
                                                             ic1.column_id=c.column_id
                                      where ic1.index_id = ic.index_id and 
                                            ic1.object_id=i.object_id and 
                                            ic1.index_id=i.index_id
                                      order by index_column_id FOR XML PATH('')),1,2,'')
                from sys.index_columns ic 
               where object_id=i.object_id and index_id=i.index_id) as columnlist
       from sys.indexes i inner join 
        sys.index_columns ic on i.object_id=ic.object_id and 
                                    i.index_id=ic.index_id inner join
            sys.objects o on i.object_id=o.object_id 
      where o.is_ms_shipped=0) t1 inner join
   (select distinct object_name(i.object_id) tablename,i.name indexname,
             (select distinct stuff((select ', ' + c.name
                                       from sys.index_columns ic1 inner join 
                                            sys.columns c on ic1.object_id=c.object_id and 
                                                             ic1.column_id=c.column_id
                                      where ic1.index_id = ic.index_id and 
                                            ic1.object_id=i.object_id and 
                                            ic1.index_id=i.index_id
                                      order by index_column_id FOR XML PATH('')),1,2,'')
                from sys.index_columns ic 
               where object_id=i.object_id and index_id=i.index_id) as columnlist
       from sys.indexes i inner join 
        sys.index_columns ic on i.object_id=ic.object_id and 
                                    i.index_id=ic.index_id inner join
            sys.objects o on i.object_id=o.object_id 
 where o.is_ms_shipped=0) t2 on t1.tablename=t2.tablename and 
       substring(t2.columnlist,1,len(t1.columnlist))=t1.columnlist and 
       (t1.columnlist<>t2.columnlist or 
         (t1.columnlist=t2.columnlist and t1.indexname<>t2.indexname))

And there you have it. A query that will provide a complete listing of all indexes that share leading columns. Please note that the way the query is written if you have two indexes that are exact duplicates it will show up twice in the result set. The only thing left to do now is to decide which indexes we should remove so that we can reclaim some space and reduce the IO required for any DML statements executed against these tables. The following tips can be used to help you determine which indexes you should remove.

Next Steps

Leave a Reply

Your email address will not be published. Required fields are marked *