Monitoring SQL Server Availability Group Direct Seeding

By: Mark Wilkinson | Comments | Related: > Availability Groups

Problem

Direct seeding is a new feature available for Availability Groups in SQL Server 2016 and later. Direct seeding allows you to bypass the usual backup/restore process associated with setting up Availability Groups and instead, add databases to an Availability Group with a single command. Direct seeding works by using the same technology some third-party backup tools use, VDI (Virtual Device Interface). Using VDI, the secondary node takes a backup of the database from the primary node, streams it across the network, and restores it to the secondary. Once enabled, adding a database to the Availability Group is as easy as issuing a single ALTER AVAILABILITY GROUP ADD DATABASE command on the primary.

While it is a great feature, if you work in a large environment with either very large databases or a high number of databases in your Availability Groups, you will likely run into problems the first time you attempt direct seeding. There are a lot of things that can go wrong like memory exhaustion, worker thread exhaustion, or even simple network problems. To help troubleshoot these issues, two new DMVs have been introduced: sys.dm_hadr_automatic_seeding and sys.dm_hadr_physical_seeding_stats. These both have a lot of information in them, but the data may not always be complete. If the seeding process hangs for some reason and never completes, these DMVs make not update as expected, leaving you wondering if the process is still running. Luckily there is a quick query you can run to see which secondary databases are still restoring backups from the primary.

Solution

As I said above, there are now two new DMVs available to query to get information on direct seeding:

sys.dm_hadr_automatic_seeding: This DMV shows general success/fail information for each database that was seeded, as well as retry attempts, and related error and failure codes if the seeding operations failed.
sys.dm_hadr_physical_seeding_stats: This DMV shows transient data related to the current state of seeding operations, including things like transfer rate, IO wait, source and target server names, estimated completion time, and a lot more. In my opinion this DMV would be a LOT more useful if the data was persisted. I have seen cases where data is either incorrect or missing, so always combine the information from this DMV with information from the error logs for a more complete picture of what is going on.

While the DMVs can be useful, the simplest way to get a look at direct seeding progress is using performance counters. Performance counters are great when you just need to quickly run a query to see what is happening. When combined with the new DMVs mentioned above, you can make better choices about how to proceed if it looks like direct seeding has stopped working.

Using the data in the `sys.dm_os_performance_counters` DMV we can capture a single counter over time and determine which databases are being seeded, and what the data transfer rate is for direct seeding operations. The counter we are interested in is `SQLServer:Databases\Backup/Restore Throughput/sec`. This counter tells you the combined backup and restore throughput per second for a given database. In the case of direct seeding, a restore operation is happening, so that traffic will register in this counter.

This query must be run on the secondary replica in the availability group, the one you are seeding to. Typically, I like to capture the counter data for a five second window of time:

IF OBJECT_ID('tempdb..#Seeding') IS NOT NULL DROP TABLE #Seeding;

SELECT  GETDATE() AS CollectionTime,
        instance_name,
        cntr_value
INTO    #Seeding
FROM    sys.dm_os_performance_counters
WHERE   counter_name = 'Backup/Restore Throughput/sec';

WAITFOR DELAY '00:00:05'

SELECT  LTRIM(RTRIM(p2.instance_name)) AS [DatabaseName],
        (p2.cntr_value - p1.cntr_value) / (DATEDIFF(SECOND,p1.CollectionTime,GETDATE())) AS ThroughputBytesSec
FROM    sys.dm_os_performance_counters AS p2
        INNER JOIN #Seeding AS p1
            ON p2.instance_name = p1.instance_name
WHERE   p2.counter_name LIKE 'Backup/Restore Throughput/sec%'
ORDER BY
        ThroughputBytesSec DESC;

This query will output a list of databases and the current throughput in bytes/second for restores or backups occurring on those databases. In cases where it looks like direct seeding has stalled, this query can quickly tell you if any of your secondary databases are being restored to.

If you are in a situation where it seems like seeding has stalled, or you have a few failed seeding attempts that aren't retying, there is a simple query you can execute on the primary to get things moving again:

ALTER AVAILABILITY GROUP <MyAG> MODIFY REPLICA ON '<MySecondary>' WITH ( SEEDING_MODE = AUTOMATIC );

This is the command you would normally use to enable direct seeding, but it also has the effect of retrying failed attempts, and restarting attempts that are stuck for any reason. It might take a minute or two for things to get started again, but you should start seeing new records in the seedings DMVs right away.

Next Steps

Read more about direct seeding
Try direct seeding in a test environment and take a look at the data in these DMVs and the performance counter mentioned
Experiment with the new set of Events for direct seeding available via Extended Events

About the author

Mark Wilkinson is a Database Wizard at ChannelAdvisor where he enjoys fixing problems, teaching others how to fix problems and learning new things.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Monitoring SQL Server Availability Group Direct Seeding

Problem

Solution

Next Steps

About the author

Comments For This Article