Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 
Best Practices for Migrating SQL Server to Azure - free webinar
 

Be Careful with Key Order in SQL Server Missing Index Recommendations


By:   |   Last Updated: 2018-07-17   |   Comments (1)   |   Related Tips: More > Indexing

Problem

I have been working with SQL Server for a long time, but I’m still constantly learning new things. Generally, its missing index recommendations can be useful at a high level, and some systems rely on them heavily. I haven’t spent a ton of time analyzing them because of the limitations in what they offer, such as often suggesting redundant indexes, never offering filtered indexes, and recommending indexes for queries that have run exactly once. And sometimes, these indexes are not created correctly, specifically in terms of the order of key columns. Not too long ago, I learned why.

Solution

It turns out that SQL Server just orders key columns based on their ordinal position in sys.columns. Let’s look at a simple example; we’ll create a table (with, on my system, about 85,000 rows):

CREATE TABLE dbo.DateFirst
(
  id int IDENTITY(1,1) NOT NULL,
  dt datetime NOT NULL,
  ln nvarchar(200) NOT NULL,
  CONSTRAINT pk_DateFirst PRIMARY KEY (id)
);
;WITH x AS
(  
  SELECT 
    dt = CONVERT(date,DATEADD(DAY,1-(o.[object_id]/10.0 * column_id) % 1000, modify_date)),
    ln = LEFT(o.name + c.name, 200)
  FROM sys.all_objects AS o
  INNER JOIN sys.all_columns AS c
  ON o.[object_id] BETWEEN c.[column_id] - 20 AND c.[column_id] + 20
)
INSERT dbo.DateFirst(dt,ln) 
SELECT dt = MAX(dt), ln FROM x GROUP BY ln;	

If we run a simple query against this table, where we use equality predicates on both the datetime and string columns, we get the following index recommendation:

SELECT id FROM dbo.DateFirst
  WHERE dt = '2018-05-11'
    AND ln = N'sysallocunitsabort_state'
    AND 1 > (SELECT 0); -- to make optimization non-trivial
/*
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[DateFirst] ([dt],[ln])-- order of table
*/			

You might conclude that it just listed them that way because of the order of the predicates. But if you reverse the order, putting the lncheck before dt, the exact same index is suggested:

SELECT id FROM dbo.DateFirst
  WHERE ln = N'sysallocunitsabort_state'
    AND dt = '2018-05-11'
    AND 1 > (SELECT 0); -- to make optimization non-trivial/*
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[DateFirst] ([dt],[ln])-- order of table
*/			

It can actually get a little more complex than that, because the key columns will be divided into two sets: those that satisfy equality predicates, and those that satisfy inequality predicates. If we change one of the predicates to be inequality, we now get the key columns listed in the opposite order (but really, they are two groups of columns listed in order of equality and then inequality):

SELECT id FROM dbo.DateFirst
  WHERE dt <> '2018-05-11'
    AND ln = N'sysallocunitsabort_state'
    AND 1 > (SELECT 0);
/*
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[DateFirst] ([ln],[dt])-- NOT order of table  
*/			

For fabricated data like this, it may not really matter. The point is that a recommended index in a real world scenario might be much better suited to having a specific leading key column, and it may not reflect the order of the columns in the table.

Let’s create a new version of the table with those columns flipped, and insert the same data:

CREATE TABLE dbo.DateLast
(
  id int IDENTITY(1,1) NOT NULL,
  ln nvarchar(200) NOT NULL,
  dt datetime NOT NULL,
  CONSTRAINT pk_DateLast PRIMARY KEY(id)
);
INSERT dbo.DateLast(ln,dt) SELECT ln,dt FROM dbo.DateFirst;

Now, we’ll run similar queries, and see the recommendations we get:

SELECT id FROM dbo.DateLast
  WHERE dt = '2018-05-11'
    AND ln = N'sysallocunitsabort_state'
    AND 1 > (SELECT 0);
/*
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[DateLast] ([ln],[dt])-- order of table
*/
SELECT id FROM dbo.DateLast
  WHERE ln = N'sysallocunitsabort_state'
    AND dt = '2018-05-11'
    AND 1 > (SELECT 0);
/*
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[DateLast] ([ln],[dt])-- order of table
*/
SELECT id FROM dbo.DateLast
  WHERE ln <> N'sysallocunitsabort_state'
    AND dt = '2018-05-11'
    AND 1 > (SELECT 0);
/*
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[DateFirst] ([dt],[ln])-- NOT order of table
*/			

Again, when all predicates use equality against a column, they’re all treated equally, and so SQL Server falls back to sorting by ordinal position in the table itself. Only when we introduce inequality against a column does it treat that column differently. And it is easy to prove that if all columns use inequality, the order would again fall back to ordinal position within the table.

You would think that important factors like density and selectivity would come into play, but they don’t. If you were creating these indexes yourself, and you knew your data, you would look at the predicates and deduce that the leading key column should usually be the one that is most selective, since it eliminates the largest portion of the B-tree the quickest.

Credit goes to Brent Ozar and Bryan Rebok, who recently solved this mystery in this Stack Exchange question a question on Database Administrators Stack Exchange.

Summary

So, what is the fix? I don’t know. Heuristics are tough, and sometimes the correct index structure depends on future information that SQL Server doesn’t even know yet. All I can suggest is that you pay particular attention to the key column order when you are creating indexes that come from individual index recommendations and, more importantly, the Database Engine Tuning Advisor (DTA).

Next Steps

Read on for related tips and other resources:



Last Updated: 2018-07-17


next webcast button


next tip button



About the author
MSSQLTips author Aaron Bertrand Aaron Bertrand (@AaronBertrand) is a Product Manager at SentryOne, with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and serves as a community moderator for the Database Administrators Stack Exchange.

View all my tips
Related Resources





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Tuesday, July 17, 2018 - 4:12:05 PM - Ray Herring Back To Top

 I once had a co-worker who insisted on automated index creation based on the then new missing indexes DMV.  This co-worker created a daily job to run a query against the DMV for all tables and then create every recommended index.

The short term result was pretty overwhelming :)


Learn more about SQL Server tools