Retaining historical index usage statistics for SQL Server Part 1 of 3

By:   |   Comments (8)   |   Related: 1 | 2 | 3 | > Indexing


Problem

Starting with Microsoft SQL Server 2005, DBAs were able to have a greater insight into the inner workings of their supported SQL Server instances through the use of Dynamic Management Views and Functions, collectively known as DMOs (Dynamic Management Objects.)

Some of the most-beneficial information concerns index usage.  This metadata is collected by the Query Optimizer Engine and cached for querying via the sys.dm_db_index_usage_stats DMV as well as the collection of Missing Index DMOs (sys.dm_db_missing_index_groups, sys.dm_db_missing_index_group_stats, sys.dm_db_missing_index_columns, and sys.dm_db_missing_index_details.)  There is a very critical shortcoming in the underlying process however; the information, since it is cached and not persisted in a physical table, is permanently lost once a SQL Server service restart occurs.  In this three-part series I'll look at what you can do to persist this data so you don't find yourself waiting for usage stats to compile over a reasonable period of time just so you can harness the power of the indexing DMOs in your performance tuning process. 

When determining how to persist the data I originally thought the process would be extremely straight-forward.  On the surface one would think that you would simply a build code construct that would drop the persisted table if it exists, then do a SELECT...INTO from the DMV into a new physical table.  Well, that would be great if you didn't care about losing all your persisted data after a service restart, but that would be exactly what would happen if this script was run after a service restart.  Therefore the code is slightly more complex than that.  This being Part One, we will focus on setting up the structure of that code, the necessary underlying objects, and the suggested process to make it all work for you in regards to the sys.dm_db_index_usage_stats that record information on which indexes are (and just as importantly are not being utilized.)  Part Two will focus on a similar process for the sys.dm_index_operational_stats DMF.  Part Three will wrap up this series with persisting a collected view from the four Missing Index DMOs.

Solution

As I've stated before in many of the tips I've written here at MSSQLTips.com, I have a dedicated database on each of my SQL Server instances specifically for administrative objects I use in my daily tasks and automated administrative processes.  It is here that I have created the necessary objects for the metadata persisting processes I'm outlining in this series.  I'm providing the code below to create the database and all necessary objects we will be discussing, if you have your own dedicated administrative database please feel free to alter the code accordingly.  As always, any code you see presented here should be thoroughly tested in a non-production environment before applying to a live production instance of SQL Server.

Step One: Create the Administrative Database and All Necessary Objects

I separate my database objects within my administrative database by schema.  You'll see below that after creation of the database, I create a MetaBot schema that will serve as the schema for all objects of this process type.

USE [master]
GO
CREATE DATABASE [iDBA] ON  PRIMARY 
  
(
  
NAME N'iDBA'
  
FILENAME N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\iDBA.mdf' 
  
SIZE 10MB 
  
FILEGROWTH 5MB
  
)
  
LOG ON 
  
(
  
NAME N'iDBA_log'
  
FILENAME N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\iDBA_log.ldf' 
  
SIZE 5120KB 
  
FILEGROWTH 5120KB
  
)
GO

USE [iDBA]
GO
CREATE SCHEMA [MetaBot] AUTHORIZATION [dbo]
GO

From here we only require two user-created database objects:  the repository table that will persist the data (MetaBot.dm_db_index_usage_stats) and the stored procedure used to collect it.  Let's first look at the structure of the table and I'll make my case as to why it's slightly more-complex than you would expect and why it's not normalized.

CREATE TABLE [MetaBOT].[dm_db_index_usage_stats](
  
[database_id] [smallint] NOT NULL,
  
[object_id] [int] NOT NULL,
  
[index_id] [int] NOT NULL,
  
[user_seeks] [bigint] NOT NULL,
  
[user_scans] [bigint] NOT NULL,
  
[user_lookups] [bigint] NOT NULL,
  
[user_updates] [bigint] NOT NULL,
  
[last_user_seek] [datetime] NULL,
  
[last_user_scan] [datetime] NULL,
  
[last_user_lookup] [datetime] NULL,
  
[last_user_update] [datetime] NULL,
  
[system_seeks] [bigint] NOT NULL,
  
[system_scans] [bigint] NOT NULL,
  
[system_lookups] [bigint] NOT NULL,
  
[system_updates] [bigint] NOT NULL,
  
[last_system_seek] [datetime] NULL,
  
[last_system_scan] [datetime] NULL,
  
[last_system_lookup] [datetime] NULL,
  
[last_system_update] [datetime] NULL,
  
[last_poll_user_seeks] [bigint] NOT NULL,
  
[last_poll_user_scans] [bigint] NOT NULL,
  
[last_poll_user_lookups] [bigint] NOT NULL,
  
[last_poll_user_updates] [bigint] NOT NULL,
  
[last_poll_system_seeks] [bigint] NOT NULL,
  
[last_poll_system_scans] [bigint] NOT NULL,
  
[last_poll_system_lookups] [bigint] NOT NULL,
  
[last_poll_system_updates] [bigint] NOT NULL,
  
[date_stamp] [datetime] NOT NULL,
 
CONSTRAINT [PK_dm_db_index_usage_stats] PRIMARY KEY CLUSTERED 
[database_id] ASC,
  
[object_id] ASC,
  
[index_id] ASC
)WITH (FILLFACTOR = 90ON [PRIMARY]
ON [PRIMARY]

CREATE NONCLUSTERED INDEX [IX_user_reads] ON [MetaBOT].[dm_db_index_usage_stats] 
  
([user_scans][user_seeks][user_lookups])
  
WITH (FILLFACTOR = 80ON [PRIMARY]

CREATE NONCLUSTERED INDEX [IX_user_writes] ON [MetaBOT].[dm_db_index_usage_stats] 
  
([user_updates])
  
WITH (FILLFACTOR = 80ON [PRIMARY]

The table is (for the most part) a duplicate in structure of the sys.dm_db_index_usage_stats Dynamic Management View with a few notable exceptions:

  • collection of columns prefixed with "last_poll_"
  • date_stamp column

The last_poll_... columns are vital for the calculations that go into the incrementing and updating of usage counts each time the stored procedure will run against sys.dm_db_index_usage_stats.  If your instance never is subject to a service restart these columns would not be necessary, however we all know that is not the case.  Even the best-constructed environments must still have routine maintenance that will require bringing the system down.  The date_stamp column will record when the last refresh of MetaBot.dm_db_index_usage_stats occurred.  It also serves to determine how the usage values are incremented in that it is compared to the last time the services were cycled.  Depending on whether that occurred pre or post the last refresh occurred, a different calculation is used to update the related columns.  We will go over those possible calculations after we look at the stored procedure code.  The table I'll present should make this much clearer to you.

Speaking of the stored procedure code, I'm presenting it below.

USE iDBA
GO

CREATE PROCEDURE MetaBot.usp_persist_dm_db_index_usage_stats AS 
DECLARE 
@last_service_start_date datetime
DECLARE @last_data_persist_date datetime

--Determine last service restart date based upon tempdb creation date
SELECT @last_service_start_date =  
  
SD.[create_date] 
  
FROM sys.databases SD 
  
WHERE SD.[name] 'tempdb'
  
--Return the value for the last refresh date of the persisting table
SELECT @last_data_persist_date 
  MAX
(MDDIUS.[date_stamp]
  
FROM [iDBA].[MetaBOT].[dm_db_index_usage_stats] MDDIUS

--Take care of updated records first
IF @last_service_start_date @last_data_persist_date
  
BEGIN
     
--Service restart date > last poll date
     
PRINT 'The latest persist date was ' 
        
CAST(@last_data_persist_date AS VARCHAR(50)) + 
        
'; no restarts occurred since ' 
        
CAST(@last_service_start_date AS VARCHAR(50)) +
        
'  (' CAST(DATEDIFF(d@last_service_start_date@last_data_persist_dateAS VARCHAR(10)) + 
        
' days ago.)'

     
UPDATE MDDIUS
     
SET 
        
MDDIUS.[user_seeks] MDDIUS.[user_seeks]+(SDDIUS.[user_seeks] MDDIUS.[last_poll_user_seeks]),
        
MDDIUS.[user_scans] MDDIUS.[user_scans]+(SDDIUS.[user_scans] MDDIUS.[last_poll_user_scans]),
        
MDDIUS.[user_lookups] MDDIUS.[user_lookups]+(SDDIUS.[user_lookups] MDDIUS.[last_poll_user_lookups]),
        
MDDIUS.[user_updates] MDDIUS.[user_updates]+(SDDIUS.[user_updates] MDDIUS.[last_poll_user_updates]),
        
MDDIUS.[last_user_seek] SDDIUS.[last_user_seek],
        
MDDIUS.[last_user_scan] SDDIUS.[last_user_scan],
        
MDDIUS.[last_user_lookup] SDDIUS.[last_user_lookup],
        
MDDIUS.[last_user_update] SDDIUS.[last_user_update],
        
MDDIUS.[system_seeks] MDDIUS.[system_seeks]+(SDDIUS.[system_seeks] MDDIUS.[last_poll_system_seeks]),
        
MDDIUS.[system_scans] MDDIUS.[system_scans]+(SDDIUS.[system_scans] MDDIUS.[last_poll_system_scans]),
        
MDDIUS.[system_lookups] MDDIUS.[system_lookups]+(SDDIUS.[system_lookups] MDDIUS.[last_poll_system_lookups]),
        
MDDIUS.[system_updates] MDDIUS.[system_updates]+(SDDIUS.[system_updates] MDDIUS.[last_poll_system_updates]),
        
MDDIUS.[last_system_seek] SDDIUS.[last_system_seek],
        
MDDIUS.[last_system_scan] SDDIUS.[last_system_scan],
        
MDDIUS.[last_system_lookup] SDDIUS.[last_system_lookup],
        
MDDIUS.[last_system_update] SDDIUS.[last_system_update],
        
MDDIUS.[last_poll_user_seeks] SDDIUS.[user_seeks],
        
MDDIUS.[last_poll_user_scans] SDDIUS.[user_scans],
        
MDDIUS.[last_poll_user_lookups] SDDIUS.[user_lookups],
        
MDDIUS.[last_poll_user_updates] SDDIUS.[user_updates],
        
MDDIUS.[last_poll_system_seeks] SDDIUS.[system_seeks],
        
MDDIUS.[last_poll_system_scans] SDDIUS.[system_scans],
        
MDDIUS.[last_poll_system_lookups] SDDIUS.[system_lookups],
        
MDDIUS.[last_poll_system_updates] SDDIUS.[system_updates],
        
MDDIUS.date_stamp GETDATE()
     
FROM [sys].[dm_db_index_usage_stats] SDDIUS INNER JOIN 
        
[iDBA].[MetaBot].[dm_db_index_usage_stats] MDDIUS
           
ON SDDIUS.[database_id] MDDIUS.[database_id]
              
AND SDDIUS.[object_id] MDDIUS.[object_id]
              
AND SDDIUS.[index_id] MDDIUS.[index_id]
  
END
ELSE
  BEGIN
     
--Service restart date < last poll date
     
PRINT 'Lastest service restart occurred on ' 
        
CAST(@last_service_start_date AS VARCHAR(50)) + 
        
' which is after the latest persist date of ' 
        
CAST(@last_data_persist_date AS VARCHAR(50))
     
     
UPDATE MDDIUS
     
SET 
        
MDDIUS.[user_seeks] MDDIUS.[user_seeks]SDDIUS.[user_seeks],
        
MDDIUS.[user_scans] MDDIUS.[user_scans]SDDIUS.[user_scans],
        
MDDIUS.[user_lookups] MDDIUS.[user_lookups]SDDIUS.[user_lookups],
        
MDDIUS.[user_updates] MDDIUS.[user_updates]SDDIUS.[user_updates],
        
MDDIUS.[last_user_seek] SDDIUS.[last_user_seek],
        
MDDIUS.[last_user_scan] SDDIUS.[last_user_scan],
        
MDDIUS.[last_user_lookup] SDDIUS.[last_user_lookup],
        
MDDIUS.[last_user_update] SDDIUS.[last_user_update],
        
MDDIUS.[system_seeks] MDDIUS.[system_seeks]SDDIUS.[system_seeks],
        
MDDIUS.[system_scans] MDDIUS.[system_scans]SDDIUS.[system_scans],
        
MDDIUS.[system_lookups] MDDIUS.[system_lookups]SDDIUS.[system_lookups],
        
MDDIUS.[system_updates] MDDIUS.[system_updates]SDDIUS.[system_updates],
        
MDDIUS.[last_system_seek] SDDIUS.[last_system_seek],
        
MDDIUS.[last_system_scan] SDDIUS.[last_system_scan],
        
MDDIUS.[last_system_lookup] SDDIUS.[last_system_lookup],
        
MDDIUS.[last_system_update] SDDIUS.[last_system_update],
        
MDDIUS.[last_poll_user_seeks] SDDIUS.[user_seeks],
        
MDDIUS.[last_poll_user_scans] SDDIUS.[user_scans],
        
MDDIUS.[last_poll_user_lookups] SDDIUS.[user_lookups],
        
MDDIUS.[last_poll_user_updates] SDDIUS.[user_updates],
        
MDDIUS.[last_poll_system_seeks] SDDIUS.[system_seeks],
        
MDDIUS.[last_poll_system_scans] SDDIUS.[system_scans],
        
MDDIUS.[last_poll_system_lookups] SDDIUS.[system_lookups],
        
MDDIUS.[last_poll_system_updates] SDDIUS.[system_updates],
        
MDDIUS.date_stamp GETDATE()
     
FROM [sys].[dm_db_index_usage_stats] SDDIUS INNER JOIN 
        
[iDBA].[MetaBot].[dm_db_index_usage_stats] MDDIUS
           
ON SDDIUS.[database_id] MDDIUS.[database_id]
              
AND SDDIUS.[object_id] MDDIUS.[object_id]
              
AND SDDIUS.[index_id] MDDIUS.[index_id]
  
END   

--Take care of new records next
     
INSERT INTO [iDBA].[MetaBot].[dm_db_index_usage_stats]
        
(
        
[database_id][object_id][index_id]
        
[user_seeks][user_scans][user_lookups],
        
[user_updates][last_user_seek][last_user_scan],
        
[last_user_lookup][last_user_update][system_seeks],
        
[system_scans][system_lookups][system_updates],
        
[last_system_seek][last_system_scan]
        
[last_system_lookup][last_system_update],
        
[last_poll_user_seeks],    [last_poll_user_scans]
        
[last_poll_user_lookups][last_poll_user_updates],
        
[last_poll_system_seeks][last_poll_system_scans]
        
[last_poll_system_lookups][last_poll_system_updates],
        
[date_stamp]
        
)
     
SELECT SDDIUS.[database_id]SDDIUS.[object_id]SDDIUS.[index_id]
        
SDDIUS.[user_seeks]SDDIUS.[user_scans]SDDIUS.[user_lookups],
        
SDDIUS.[user_updates]SDDIUS.[last_user_seek]SDDIUS.[last_user_scan],
        
SDDIUS.[last_user_lookup]SDDIUS.[last_user_update]SDDIUS.[system_seeks],
        
SDDIUS.[system_scans]SDDIUS.[system_lookups]SDDIUS.[system_updates],
        
SDDIUS.[last_system_seek]SDDIUS.[last_system_scan]
        
SDDIUS.[last_system_lookup]SDDIUS.[last_system_update],
        
SDDIUS.[user_seeks]SDDIUS.[user_scans]SDDIUS.[user_lookups],
        
SDDIUS.[user_updates],SDDIUS.[system_seeks],
        
SDDIUS.[system_scans]SDDIUS.[system_lookups]
        
SDDIUS.[system_updates]GETDATE()  
     
FROM [sys].[dm_db_index_usage_stats] SDDIUS LEFT JOIN 
        
[iDBA].[MetaBot].[dm_db_index_usage_stats] MDDIUS
           
ON SDDIUS.[database_id] MDDIUS.[database_id]
           
AND SDDIUS.[object_id] MDDIUS.[object_id]
           
AND SDDIUS.[index_id] MDDIUS.[index_id]
     
WHERE MDDIUS.[database_id] IS NULL 
        AND 
MDDIUS.[object_id] IS NULL
        AND 
MDDIUS.[index_id] IS NULL

The code is comprised of a few different logical sections:

  • Variable declarations and value assignments
  • Update of existing values with new usage counts
  • Insertion of new rows associated with indexes that have, up to this point, not been used in any read or write processes.

The two variables in the stored procedure determine which UPDATE script runs in the confines of the stored procedure.  This is necessary, because the correct calculation for incrementing the index usage columns depends upon either the last_poll column pertaining to the specific user or system index action (seek, scan, update, or lookup) or the last persisted value for the corresponding usage column.  The following table illustrates this issue much better than I can do in this limited space. M is the alias for MetaBot.dm_db_index_usage_stats, S is the alias for the sys.dm_db_index_usage_stats DMV.

  M.user_scans(existing) M.last_poll (existing) S.user_scans M.user_scans (new) Calculation M.last_poll (new) Calculation
Initial Population (INSERT) 0 0 10 10 = S.col 10 = S.col
restart_date < last date_stamp value 10 10 15 15 = M.col + (S.col - M.last_poll _col) 15 = S.col
SQL Server service restart occurs, all cached values lost, increments reset to 0
restart_date > last_date_stamp value 15 15 5 20 = M.col + S.col 5 = S.col
restart_date < last date_stamp value 15 5 8 18 = M.col + (S.col - M.last_poll _col) 8 = S.col

When a unique record is first encountered it is inserted into the persisted table.  A unique record is based upon the combination key of database_id, object_id, and index_id in the case of the sys.dm_db_index_usage_stats DMV.  If this combination of identifiers already exists in the persisted table, the stored procedure relies upon the underlying determination of whether or not a system restart has been encountered since the last time the metadata was persisted.  If it has not, then the usage columns are incremented by adding the existing value of that field to the difference between the new value from the DMV and the last value that was read from the DMV when the last run of the stored procedure executed.  If a restart did occur, then the counters are all reset to 0.  This means that the formula previously outlined would result in incorrect numbers being persisted.  Instead you must take the sum of the existing column value and the corresponding value from the DMV as the new value to be persisted.  In all cases the calculation for the last_poll column value will remain unchanged as will the code associated with the INSERT of new index usage rows.

You may notice that we do risk the loss of counter data.  The only way to ensure that all usage data is persisted would be if this stored procedure would run prior to any service restart - not a viable option.  However, unlike user data, we are using this metadata for metrics in determining whether an index is of any value - is it used significantly to offset the overhead of writes against it.  So long as we record significant data over the lifetime of the instance we have no need to capture all usage values within reason.

The final step would be to create a SQL Server Agent Job for the execution of this stored procedure.  I'll leave that up to you.  In our environment I run this process six times daily.  It's very lightweight and I've not found it to interfere with normal user operations or backup cycles.  I do not run it during the timeframe of scheduled index maintenance, but that is by choice on my part.  I encourage you to test with schedules that fit your individual needs.

In the next tip of the series we will create a similar process for the sys.dm_db_index_operational_stats DMF, which needs to be handled in a slightly different manner as it is a function with the narrower scope of a database versus the instance scope of sys.dm_db_index_usage_stats.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Tim Ford Tim Ford is a Senior Database Administrator with MindBody.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Tuesday, May 21, 2019 - 10:07:36 PM - Jet Krack Back To Top (80147)

have been using the script in this port.  But getting different INDEX_ID's for each specific INDEX, over time in the history table.  Anyone experiencing the same issue?

name database_id object_id index_id

IX_POLICY_POLICYSEQ 15 267200052 1

IX_POLICY_POLICYSEQ 15 267200052 150

IX_POLICY_POLICYSEQ 15 267200052 153

IX_POLICY_POLICYSEQ 15 267200052 154

IX_POLICY_POLICYSEQ 15 267200052 155

IX_POLICY_POLICYSEQ 15 267200052 156

IX_POLICY_POLICYSEQ 15 267200052 157

IX_Table_1 15 267200052 1

IX_Table_1 15 267200052 150

IX_Table_1 15 267200052 153

IX_Table_1 15 267200052 154

IX_Table_1 15 267200052 155

IX_Table_1 15 267200052 156

IX_Table_1 15 267200052 157

IX_POLICY_POLICYNUM 15 267200052 1

IX_POLICY_POLICYNUM 15 267200052 150


Wednesday, August 15, 2018 - 11:29:34 AM - Will Back To Top (77198)

 One thing to note within the update statement under the else is that the dates may get turned to null if a user/system has not performed a seek, scan, lookup, or update since the last instance restart. To fix this, I implemented a case statement for this portion of the query. Example below:

 MDDIUS.[last_user_seek] = CASE WHEN SDDIUS.[last_user_seek] IS NULL AND MDDIUS.[last_user_seek] IS NOT NULL THEN MDDIUS.[last_user_seek] ELSE SDDIUS.[last_user_seek] END


Wednesday, December 14, 2016 - 5:04:58 AM - Koenraad Dendievel Back To Top (44967)

Tim,

Thanks very much for this.

I added a create_date column to register when an index was added.

I also added a cleanup script that removes indices that no longer exist from the persistence table.


Monday, January 4, 2016 - 6:25:09 AM - Chris Back To Top (40346)

Hi Tim,

 

trying to get index name from index id column that is retained in the tables but finding it difficult. Do you have queries to retrieve information from these tables? ex: table name, index name etc...

 


Thursday, August 20, 2015 - 6:35:05 AM - Marc Back To Top (38496)

Thank for this nice post. I just implemented it and it works nice.

My thoughts: Isn't the development of the figures over time a relevant / interesting information? I am thinking of creating a report that is able to display the usage as a line chart over time to point out for example that an index is not used anymore e.g. by another index change or something. In order to get this data, instead of updating the rows, whenever the procedure runs a new insert must be made. Is there anyone sharing this considerations?


Tuesday, December 11, 2012 - 10:51:53 AM - Tiago Back To Top (20876)

Hi there,

 

Thanks for posting this useful information.

Altough i have here a situation that i need your help. I am dealing with a situation where our developers load data in single row mode that cause my user_updates from sys.dm_db_index_usage_stats to be equal to the number of rows inserted.

With this approach i cannot say which index isn't being used and must be deleted.

When they choose the FastLoad option(BulkLoad) the counter increments by Number of Rows/Buffer Size and with that approach i can trust in kpis given by the DMV.

How do you deal with single row insertions(when thousands of rows are being inserted) that causes this behavior to the DMV data?

 

Thanks a lot

 


Tuesday, May 12, 2009 - 7:29:41 AM - admin Back To Top (3360)

Thanks for pointing this out. 

This has been fixed.

 


Tuesday, May 12, 2009 - 7:21:53 AM - dpsligar Back To Top (3359)

Nice tip!

 However, there is a syntax error on line 78 of the stored procedure in the download file. There is an extra "m":

MDDIUS.[system_seeks] = MDDIUS.[systemm_seeks]+ SDDIUS.[system_seeks],

Hope this helps.

 















get free sql tips
agree to terms