Managing Underused Indexes in SQL Server Availability Groups – Part 1

Problem

As a part of optimizing performance, I evaluate index usage across many instances and databases. I often find that some indexes aren’t used much or, at least on first glance, appear to go unused. Since we use availability groups (AG), different workloads run against replicas in different roles. All writes happen on the primary, obviously. However, some queries only happen on read-only secondaries (either because read-only routing is in use, or because some processes are manually directed at specific secondaries, or both). Unfortunately, index usage is not rolled up anywhere across all replicas. This means that looking at the primary alone gives an incomplete picture. How do I make sure I account for index activity everywhere, not just on the primary?

Solution

I would love to have a simple stored procedure that aggregates index usage across the primary and all secondaries. Ideally, I’d pass in a database name and a table name, and the procedure would determine the set of replicas, collect the index usage details from all of them, and report on the results.

Old Approach

An embarrassingly short time ago, I would have achieved this by:

  • Manually connecting to the primary and getting output from sys.dm_db_index_usage_stats,
  • Pasting that first set in a spreadsheet,
  • Manually collecting the same output from each replica,
  • Pasting that output alongside the existing data (and probably fidgeting with the rows, because not all indexes always return rows on all replicas), and
  • Manually comparing reads vs. writes.

The only benefit is that, once I had the list, I could subjectively remove rows from the spreadsheet without having to adjust query logic to filter. But aside from that? Torture. So, I set out to make this process faster and less manual.

Basically, the query I would run manually is:

USE [Database I'm after];
DECLARE @TableName sysname = N'dbo.BasketWeavingStats';
DECLARE @sql nvarchar(max) = N'SELECT @@SERVERNAME,
           i.name,
           s.user_seeks,
           s.user_scans,
           s.user_lookups,
           s.user_updates,
           s.last_user_scan,
           s.last_user_seek
      FROM sys.indexes AS i
      LEFT OUTER JOIN sys.dm_db_index_usage_stats AS s
        ON i.object_id   = s.object_id
       AND i.index_id    = s.index_id
       AND s.database_id = DB_ID()
     WHERE i.object_id   = OBJECT_ID(@TableName)
       AND i.index_id NOT IN (0,1); /* not touching the PK */';
 
EXEC sys.sp_executesql @sql, N'@TableName sysname', @TableName;

From here, we want to insert the results of the query from each replica into a single #temp table.

CREATE TABLE #IndexUsage
(
  ServerName   nvarchar(128),
  IndexName    nvarchar(128),
  UserSeeks    bigint,
  UserScans    bigint,
  UserLookups  bigint,
  UserUpdates  bigint,
  LastScan     datetime,
  LastSeek     datetime 
);

We could do this manually using:

INSERT #IndexUsage 
  EXEC [EachReplica].[Database I'm after].sys.sp_executesql @sql, 
       N'@TableName sysname', 
         @TableName;

Table-Valued Function Approach

That’s still too many steps and can be error-prone. To run this query everywhere, let’s create an inline table-valued function (on one (or all) of the replicas) to return all of the other replicas for a given database. This will be used to pull this data from the other relevant nodes in the AG. So, it assumes we have linked servers for them (and if not, we can do similar things from PowerShell or elsewhere).

CREATE OR ALTER FUNCTION dbo.GetOtherReplicas
(
  @DatabaseName sysname
/* 
   What are all the non-local replicas for a given database?
   Are they all linked servers? 
   Notes: 
     - Linked servers may use FQDN/aliases (replica names should not)
     - @@SERVERNAME is unreliable (e.g. if a server is renamed)
     - named instances may also complicate things
*/
)
RETURNS TABLE
AS
  RETURN
  (
    SELECT ServerName = s.name
      FROM master.sys.availability_databases_cluster AS dc
     INNER JOIN master.sys.availability_replicas AS ar 
           ON dc.group_id = ar.group_id
      LEFT OUTER JOIN sys.servers AS s 
           ON ar.replica_server_name = s.name
           /* or ON s.name = a.replica_server_name + N'.your.FQDN.domain.suffix' */
     WHERE dc.database_name = @DatabaseName
       AND LOWER(ar.replica_server_name) NOT IN 
       (
           LOWER(@@SERVERNAME), 
           LOWER(CONVERT(sysname, SERVERPROPERTY('ServerName'))), 
           LOWER(CONVERT(sysname, SERVERPROPERTY('ComputerNamePhysicalNetBIOS')))
       )
  );

Then I can insert into the #temp table for the local instance, then loop through the other replicas with an additional linked server qualifier.

DECLARE @DatabaseName sysname = N'Database I''m After',
        @TableName    sysname = N'dbo.BasketWeavingStats';
DECLARE @Counter     tinyint = 1,
        @ServerCount tinyint,
        @RemoteExec  nvarchar(384),
        @LocalExec   nvarchar(256) = QUOTENAME(@DatabaseName) 
                                   + N'.sys.sp_executesql';
DROP TABLE IF EXISTS #servers;
CREATE TABLE #servers
(
  id         int identity(1,1), 
  ServerName sysname
);
INSERT #servers(ServerName) 
SELECT ServerName 
  FROM dbo.GetOtherReplicas(@DatabaseName);
SELECT @ServerCount = @@ROWCOUNT;
IF EXISTS (SELECT 1 FROM #servers WHERE ServerName IS NULL)
BEGIN
  /* 
      missing linked server
      you can decide here if you want to fix that or collect data without it 
  */
  RAISERROR(N'At least one replica is not linked.', 11, 1) WITH NOWAIT;
  /* RETURN; */
END
DECLARE @sql nvarchar(max) = N'SELECT @@SERVERNAME,
        i.name,
        s.user_seeks,
        s.user_scans,
        s.user_lookups,
        s.user_updates,
        s.last_user_scan,
        s.last_user_seek
   FROM sys.indexes AS i
   LEFT OUTER JOIN sys.dm_db_index_usage_stats AS s
     ON i.object_id   = s.object_id
    AND i.index_id    = s.index_id
    AND s.database_id = DB_ID()
  WHERE i.object_id   = OBJECT_ID(@TableName)
    AND i.index_id NOT IN (0,1); /* not touching the PK */';
INSERT #IndexUsage EXEC @LocalExec @sql, N'@TableName sysname', @TableName;
WHILE @Counter <= @ServerCount
BEGIN
  SELECT @RemoteExec = QUOTENAME(ServerName) + N'.' + @LocalExec
    FROM #servers
   WHERE id = @Counter 
     AND ServerName IS NOT NULL;
  INSERT #IndexUsage EXEC @RemoteExec @sql, N'@TableName sysname', @TableName;
  SET @Counter += 1;
END

That gets me all the index usage details, and I can run analysis from there.

Sample Index Evaluation

For example, this will tell me a simple sum of reads and writes, as well the last time the index was read, across all replicas:

SELECT IndexName,
       Reads    = SUM(COALESCE(UserSeeks, 0) 
                      + COALESCE(UserScans, 0) 
                      + COALESCE(UserLookups, 0)), 
       Writes   = SUM(COALESCE(UserUpdates, 0)), 
       LastRead = MAX(COALESCE(LastSeek, LastScan))
  FROM #IndexUsage
  GROUP BY IndexName;
Index usage results across all nodes

Writes always happen on the primary, so their source is less important. Reads, however, may come from any replica.

Identify Reads by Replica

We can use dynamic conditional aggregation to show which replica handled which reads.

DECLARE @OutputColumns nvarchar(max), 
        @sql nvarchar(max);
 
SELECT @OutputColumns = STRING_AGG
  (
    CONVERT(nvarchar(max), 
    CONCAT
    (
      QUOTENAME(N'Reads_' + ServerName), 
      N' = MAX(CASE ServerName WHEN N''' + ServerName + ''' THEN Reads END),', 
      QUOTENAME(N'LastRead_' + ServerName), 
      N' = MAX(CASE ServerName WHEN N''' + ServerName + ''' THEN LastRead END)'
    )), 
    N','
  )
  FROM
  (
    SELECT ServerName FROM #IndexUsage GROUP BY ServerName
  ) AS s;
  SELECT @sql = N'SELECT IndexName, ' + @OutputColumns + ', Writes = MAX(Writes)
    FROM 
    (
      SELECT IndexName, ServerName,
             Reads    = SUM(COALESCE(UserSeeks, 0) 
                        + COALESCE(UserScans, 0) 
                        + COALESCE(UserLookups, 0)), 
             Writes   = SUM(COALESCE(UserUpdates, 0)), 
             LastRead = MAX(COALESCE(LastSeek, LastScan))
        FROM #IndexUsage
       GROUP BY IndexName, ServerName
    ) AS x GROUP BY IndexName;';
PRINT @sql;
EXEC sys.sp_executesql @sql;

This produces a query like this:

SELECT IndexName, 
       [Reads_SrvA]    = MAX(CASE ServerName WHEN N'SrvA' THEN Reads END),
       [LastRead_SrvA] = MAX(CASE ServerName WHEN N'SrvA' THEN LastRead END) ,
       [Reads_SrvB]    = MAX(CASE ServerName WHEN N'SrvB' THEN Reads END),
       [LastRead_SrvB] = MAX(CASE ServerName WHEN N'SrvB' THEN LastRead END) ,
       [Reads_SrvC]    = MAX(CASE ServerName WHEN N'SrvC' THEN Reads END),
       [LastRead_SrvC] = MAX(CASE ServerName WHEN N'SrvC' THEN LastRead END),
       Writes = MAX(Writes)
  FROM 
  (
    SELECT IndexName, ServerName,
           Reads    = SUM(COALESCE(UserSeeks, 0) 
                      + COALESCE(UserScans, 0) 
                      + COALESCE(UserLookups, 0)), 
           Writes   = SUM(COALESCE(UserUpdates, 0)), 
           LastRead = MAX(COALESCE(LastSeek, LastScan))
      FROM #IndexUsage
     GROUP BY IndexName, ServerName
  ) AS x 
  GROUP BY IndexName;

When we run this query, we can see some of these indexes are barely used at all, depending on the server:

Index usage showing reads on individual nodes

Now we are armed with additional data about how these indexes are used across all nodes in the availability group, instead of just the one we happen to be querying against right now.

(And I could tie in whether any given replica is currently the primary, but the goal is really to make sure we capture the reads from everywhere, not only on the replica we’re connected to. So, while the per-replica information is available, the first query is actually a lot more useful to me. That may change in the future if we ever get the ability to create indexes only on a secondary or that don’t get replicated to secondaries.)

In the next tip, I’ll talk about how we pull in additional data, such as size and index columns, to make even better-informed decisions.

Next Steps

Review the following tips and other resources:

5 Comments

  1. Thanks Aaron. Since each replica in the Availability Group has different workload requirements. So, is it possible to have different set of indexes in a database for each replica? For example, can a table have an index only on secondary replica for SELECT queries but not on primary replica since primary is used only for DML operations?

    • Hi Noman, no, it is currently not possible to have different indexes on secondaries. But it has been a highly requested feature, so, there’s always a possibility we’ll get that in a future version. Today you can control various optimizer, statistics, and Query Store behaviors on the secondary, but you can’t mess directly with data structures.

  2. @SHAMS It applies to any secondary, regardless of its current state (it might have been readable 10 minutes ago). But if a secondary is *always* not readable, what information do you expect to find there?

Leave a Reply

Your email address will not be published. Required fields are marked *