Monitoring SQL Server Availability Group Direct Seeding
Direct seeding is a new feature available for Availability Groups in SQL Server 2016 and later. Direct seeding allows you to bypass the usual backup/restore process associated with setting up Availability Groups and instead, add databases to an Availability Group with a single command. Direct seeding works by using the same technology some third-party backup tools use, VDI (Virtual Device Interface). Using VDI, the secondary node takes a backup of the database from the primary node, streams it across the network, and restores it to the secondary. Once enabled, adding a database to the Availability Group is as easy as issuing a single ALTER AVAILABILITY GROUP ADD DATABASE command on the primary.
While it is a great feature, if you work in a large environment with either very large databases or a high number of databases in your Availability Groups, you will likely run into problems the first time you attempt direct seeding. There are a lot of things that can go wrong like memory exhaustion, worker thread exhaustion, or even simple network problems. To help troubleshoot these issues, two new DMVs have been introduced: sys.dm_hadr_automatic_seeding and sys.dm_hadr_physical_seeding_stats. These both have a lot of information in them, but the data may not always be complete. If the seeding process hangs for some reason and never completes, these DMVs make not update as expected, leaving you wondering if the process is still running. Luckily there is a quick query you can run to see which secondary databases are still restoring backups from the primary.
As I said above, there are now two new DMVs available to query to get information on direct seeding:
- sys.dm_hadr_automatic_seeding: This DMV shows general success/fail
information for each database that was seeded, as well as retry attempts, and
related error and failure codes if the seeding operations failed.
- sys.dm_hadr_physical_seeding_stats: This DMV shows transient data related to the current state of seeding operations, including things like transfer rate, IO wait, source and target server names, estimated completion time, and a lot more. In my opinion this DMV would be a LOT more useful if the data was persisted. I have seen cases where data is either incorrect or missing, so always combine the information from this DMV with information from the error logs for a more complete picture of what is going on.
While the DMVs can be useful, the simplest way to get a look at direct seeding progress is using performance counters. Performance counters are great when you just need to quickly run a query to see what is happening. When combined with the new DMVs mentioned above, you can make better choices about how to proceed if it looks like direct seeding has stopped working.
Using the data in the `sys.dm_os_performance_counters` DMV we can capture a single counter over time and determine which databases are being seeded, and what the data transfer rate is for direct seeding operations. The counter we are interested in is `SQLServer:Databases\Backup/Restore Throughput/sec`. This counter tells you the combined backup and restore throughput per second for a given database. In the case of direct seeding, a restore operation is happening, so that traffic will register in this counter.
This query must be run on the secondary replica in the availability group, the one you are seeding to. Typically, I like to capture the counter data for a five second window of time:
IF OBJECT_ID('tempdb..#Seeding') IS NOT NULL DROP TABLE #Seeding; SELECT GETDATE() AS CollectionTime, instance_name, cntr_value INTO #Seeding FROM sys.dm_os_performance_counters WHERE counter_name = 'Backup/Restore Throughput/sec'; WAITFOR DELAY '00:00:05' SELECT LTRIM(RTRIM(p2.instance_name)) AS [DatabaseName], (p2.cntr_value - p1.cntr_value) / (DATEDIFF(SECOND,p1.CollectionTime,GETDATE())) AS ThroughputBytesSec FROM sys.dm_os_performance_counters AS p2 INNER JOIN #Seeding AS p1 ON p2.instance_name = p1.instance_name WHERE p2.counter_name LIKE 'Backup/Restore Throughput/sec%' ORDER BY ThroughputBytesSec DESC;
This query will output a list of databases and the current throughput in bytes/second for restores or backups occurring on those databases. In cases where it looks like direct seeding has stalled, this query can quickly tell you if any of your secondary databases are being restored to.
If you are in a situation where it seems like seeding has stalled, or you have a few failed seeding attempts that aren't retying, there is a simple query you can execute on the primary to get things moving again:
ALTER AVAILABILITY GROUP <MyAG> MODIFY REPLICA ON '<MySecondary>' WITH ( SEEDING_MODE = AUTOMATIC );
This is the command you would normally use to enable direct seeding, but it also has the effect of retrying failed attempts, and restarting attempts that are stuck for any reason. It might take a minute or two for things to get started again, but you should start seeing new records in the seedings DMVs right away.
- Read more about direct seeding
- Try direct seeding in a test environment and take a look at the data in these DMVs and the performance counter mentioned
- Experiment with the new set of Events for direct seeding available via Extended Events
About the author
View all my tips