By: Alejandro Cobar | Comments (7) | Related: > Availability Groups
Problem
In the part 1 of this tip series, I showed you how to build the inventory of replicas that form an Availability Group. However, that’s just half of the picture, because you must be able to check the status of the databases within Availability Groups and this has to be a constant check to make sure the databases are “available”.
Solution
I will be presenting a script that gathers basic, but very useful information for each Availability Group database. This information will be obtained from the performance counter objects and with PowerShell we are going to be able to gather this information from all the availability databases from the replicas inventory you already have.
SQL Server Availability Groups Monitoring Script
The Availability Groups inventory is built first, because that way you can run this second script against a smaller subset of SQL Server instances. You can probably do it within one run, but that wouldn’t be as efficient.
Also, there are some performance counter objects that must be obtained from the primary replica and others from the secondary ones.
Information from Primary Replica
- Log Bytes Flushed/sec: A log flush occurs when data is written from the log cache to the physical transaction log file on disk, every time a transaction is committed. In order to get the rate, you have to obtain a couple of samples and then subtract the values. That’s the reason why the “WAITFOR DELAY” is there, so you can have an accurate delta. This will give you an idea of the activity going on currently at the databases in the Primary Replica.
SELECT instance_name, cntr_value INTO #Logflushes1 FROM sys.dm_os_performance_counters WHERE object_name LIKE '%:Databases%' AND counter_name = 'Log Bytes Flushed/sec'; WAITFOR DELAY '00:00:01'; SELECT instance_name, cntr_value INTO #Logflushes2 FROM sys.dm_os_performance_counters WHERE object_name LIKE '%:Databases%' AND counter_name = 'Log Bytes Flushed/sec'; SELECT ag.name AS 'GroupName' ,db_name(drs.database_id) AS 'Database' ,(SELECT (mf.size * 8) / 1024.0 FROM sys.master_files mf WHERE mf.type = 0 AND mf.database_id = drs.database_id) AS 'MDFSize' ,(SELECT (mf.size * 8) / 1024.0 FROM sys.master_files mf WHERE mf.type = 1 AND mf.database_id = drs.database_id) AS 'LDFSize' ,drs.synchronization_state_desc AS 'State' ,drs.synchronization_health_desc 'Health' ,ags.primary_replica AS 'PrimaryReplica' ,CONVERT(DECIMAL(10,2), log_flushes / 1024.0 / 1024.0) AS 'LogFlushed' FROM sys.availability_groups ag JOIN sys.dm_hadr_availability_group_states ags ON ag.group_id = ags.group_id JOIN sys.dm_hadr_availability_replica_cluster_states cs ON ags.group_id = cs.group_id JOIN sys.dm_hadr_availability_replica_states rs ON rs.replica_id = cs.replica_id JOIN sys.dm_hadr_database_replica_states drs ON rs.replica_id=drs.replica_id JOIN (SELECT l1.instance_name, l2.cntr_value - l1.cntr_value log_flushes FROM #Logflushes1 l1 JOIN #Logflushes2 l2 ON l2.instance_name = l1.instance_name ) log_flushes ON log_flushes.instance_name = DB_NAME(drs.database_id) WHERE rs.role_desc = 'PRIMARY'; DROP TABLE #Logflushes1; DROP TABLE #Logflushes2;
Information from Secondary Replica
- Redone Bytes/sec: Amount of log records redone on the secondary database in the last second.
- Log Bytes Received/sec: Amount of log records received by the secondary replica for the database in the last second.
- Log Send Queue Size: Amount of log records of the primary database that has not been sent to the secondary databases, in kilobytes (KB).
- Send Rate: Average rate at which primary replica instance sent data during last active period, in kilobytes (KB)/second.
- Send Latency: Time (in seconds) it will take the Primary Replica to fully send all the log records to the Secondary Replica.
- Redo Queue Size: Amount of log records that the Secondary Replica has to process after being received by the Primary Replica.
- Redo Rate: Rate at which the Secondary Replica is effectively processing the log records received from the Primary Replica.
SELECT instance_name, cntr_value INTO #redo1 FROM sys.dm_os_performance_counters WHERE object_name LIKE '%:Database Replica%' AND counter_name = 'Redone Bytes/sec'; SELECT instance_name, cntr_value INTO #send1 FROM sys.dm_os_performance_counters WHERE object_name LIKE '%:Database Replica%' AND counter_name = 'Log Bytes Received/sec'; WAITFOR DELAY '00:00:01'; SELECT instance_name, cntr_value INTO #redo2 FROM sys.dm_os_performance_counters WHERE object_name LIKE '%:Database Replica%' AND counter_name = 'Redone Bytes/sec'; SELECT instance_name, cntr_value INTO #send2 FROM sys.dm_os_performance_counters WHERE object_name LIKE '%:Database Replica%' AND counter_name = 'Log Bytes Received/sec'; SELECT DB_NAME(rs.database_id) AS 'Database' ,r.replica_server_name AS 'SecondaryReplica' ,CONVERT(DECIMAL(10,2), rs.log_send_queue_size / 1024.0) AS 'LogSendQueueSize' ,CONVERT(DECIMAL(10,2), send_rate / 1024.0 / 1024.0) AS 'LogSendRate' ,CONVERT(DECIMAL(10,2), log_send_queue_size / CASE WHEN send_rate = 0 THEN 1 ELSE send_rate / 1024.0 END) AS 'SendLatency' ,CONVERT(DECIMAL(10,2), rs.redo_queue_size / 1024.0) AS 'RedoQueueSize' ,CONVERT(DECIMAL(10,2), redo_rate.redo_rate / 1024.0 / 1024.0) AS 'RedoRate' ,CONVERT(DECIMAL(10,2), rs.redo_queue_size / CASE WHEN redo_rate.redo_rate = 0 THEN 1 ELSE redo_rate.redo_rate / 1024.0 END) AS 'RedoLatency' FROM sys.dm_hadr_database_replica_states rs JOIN sys.availability_replicas r ON r.group_id = rs.group_id AND r.replica_id = rs.replica_id JOIN (SELECT l1.instance_name, l2.cntr_value - l1.cntr_value redo_rate FROM #redo1 l1 JOIN #redo2 l2 ON l2.instance_name = l1.instance_name ) redo_rate ON redo_rate.instance_name = DB_NAME(rs.database_id) JOIN (SELECT l1.instance_name, l2.cntr_value - l1.cntr_value send_rate FROM #send1 l1 JOIN #send2 l2 ON l2.instance_name = l1.instance_name ) send_rate ON send_rate.instance_name = DB_NAME(rs.database_id); DROP TABLE #send1; DROP TABLE #send2; DROP TABLE #redo1; DROP TABLE #redo2;
PowerShell Script to Automate the Data Collection of SQL Server Availability Group Databases
Below is the PowerShell script that unifies all the code described above. This will let you grab all the information from your environment and store it in a central place for data collection, analysis and monitoring. Note: you will have to update some of the parameters in the PowerShell script below to match your environment settings.
$labServer = "XXX" $inventoryDB = "XXX" #Clean the table where the monitoring information will be stored Invoke-Sqlcmd -Query "TRUNCATE TABLE XXX" -Database $inventoryDB -ServerInstance $server Write-Host "Fetching all the Primary Replicas from the inventory" #Fetch all the instances that have a PRIMARY role in an Availability Group #This list of instances should ideally come from the list we already built in Part 1 of this series /* This is an example of the result set that your query must return ############################################################ # name # instance # group # ############################################################ # server1.domain.net,45000 # server1\MSSQLSERVER1 # AG1 # # server2.domain.net,45000 # server2 # server2 # AG1 # # server3.domain.net,45000 # server3 # server3 # AG2 # # server4.domain.net,45000 # server4\MSSQLSERVER2 # AG2 # ############################################################ */ $PrimaryReplicasLookupQuery = /*Put in here your query that returns the Primary Replicas only*/ $PrimaryReplicas = Invoke-Sqlcmd -ServerInstance $server -Database $inventoryDB -Query $PrimaryReplicasLookupQuery #Grab the current information from each Primary Replica foreach ($PrimaryReplica in $PrimaryReplicas){ #Fetch all the Secondary Replicas per Primary Replica Write-Host "Fetching all the Secondary Replicas of Availability Group" $PrimaryReplica.group $SecondaryReplicasLookupQuery = /*Pun in here your query that returns the Secondary Replica for the Primary Replica being evaluated*/ $SecondaryReplicas = Invoke-Sqlcmd -ServerInstance $server -Database $inventoryDB -Query $SecondaryReplicasLookupQuery $PrimaryReplicaInformationQuery = " SELECT instance_name, cntr_value INTO #Logflushes1 FROM sys.dm_os_performance_counters WHERE object_name LIKE '%:Databases%' AND counter_name = 'Log Bytes Flushed/sec'; WAITFOR DELAY '00:00:01'; SELECT instance_name, cntr_value INTO #Logflushes2 FROM sys.dm_os_performance_counters WHERE object_name LIKE '%:Databases%' AND counter_name = 'Log Bytes Flushed/sec'; SELECT ag.name AS 'GroupName' ,db_name(drs.database_id) AS 'Database' ,(SELECT (mf.size * 8) / 1024.0 FROM sys.master_files mf WHERE mf.type = 0 AND mf.database_id = drs.database_id) AS 'MDFSize' ,(SELECT (mf.size * 8) / 1024.0 FROM sys.master_files mf WHERE mf.type = 1 AND mf.database_id = drs.database_id) AS 'LDFSize' ,drs.synchronization_state_desc AS 'State' ,drs.synchronization_health_desc 'Health' ,ags.primary_replica AS 'PrimaryReplica' ,CONVERT(DECIMAL(10,2), log_flushes / 1024.0 / 1024.0) AS 'LogFlushed' FROM sys.availability_groups ag JOIN sys.dm_hadr_availability_group_states ags ON ag.group_id = ags.group_id JOIN sys.dm_hadr_availability_replica_cluster_states cs ON ags.group_id = cs.group_id JOIN sys.dm_hadr_availability_replica_states rs ON rs.replica_id = cs.replica_id JOIN sys.dm_hadr_database_replica_states drs ON rs.replica_id=drs.replica_id JOIN (SELECT l1.instance_name, l2.cntr_value - l1.cntr_value log_flushes FROM #Logflushes1 l1 JOIN #Logflushes2 l2 ON l2.instance_name = l1.instance_name ) log_flushes ON log_flushes.instance_name = DB_NAME(drs.database_id) WHERE rs.role_desc = 'PRIMARY'; DROP TABLE #Logflushes1; DROP TABLE #Logflushes2; " Write-Host "Obtaining information from Primary Replica" $PrimaryReplica.instance $PrimaryReplicaResults = Invoke-Sqlcmd -Query $PrimaryReplicaInformationQuery -ServerInstance $PrimaryReplica.name -ErrorAction Stop -querytimeout 30 #Now that we have the information from the Primary Replica, let's complement it with the information captured for the Secondary Replica foreach ($SecondaryReplica in $SecondaryReplicas){ $SecondaryReplicaInformationQuery = " SELECT instance_name, cntr_value INTO #redo1 FROM sys.dm_os_performance_counters WHERE object_name LIKE '%:Database Replica%' AND counter_name = 'Redone Bytes/sec'; SELECT instance_name, cntr_value INTO #send1 FROM sys.dm_os_performance_counters WHERE object_name LIKE '%:Database Replica%' AND counter_name = 'Log Bytes Received/sec'; WAITFOR DELAY '00:00:01'; SELECT instance_name, cntr_value INTO #redo2 FROM sys.dm_os_performance_counters WHERE object_name LIKE '%:Database Replica%' AND counter_name = 'Redone Bytes/sec'; SELECT instance_name, cntr_value INTO #send2 FROM sys.dm_os_performance_counters WHERE object_name LIKE '%:Database Replica%' AND counter_name = 'Log Bytes Received/sec'; SELECT DB_NAME(rs.database_id) AS 'Database' ,r.replica_server_name AS 'SecondaryReplica' ,CONVERT(DECIMAL(10,2), rs.log_send_queue_size / 1024.0) AS 'LogSendQueueSize' ,CONVERT(DECIMAL(10,2), send_rate / 1024.0 / 1024.0) AS 'LogSendRate' ,CONVERT(DECIMAL(10,2), log_send_queue_size / CASE WHEN send_rate = 0 THEN 1 ELSE send_rate / 1024.0 END) AS 'SendLatency' ,CONVERT(DECIMAL(10,2), rs.redo_queue_size / 1024.0) AS 'RedoQueueSize' ,CONVERT(DECIMAL(10,2), redo_rate.redo_rate / 1024.0 / 1024.0) AS 'RedoRate' ,CONVERT(DECIMAL(10,2), rs.redo_queue_size / CASE WHEN redo_rate.redo_rate = 0 THEN 1 ELSE redo_rate.redo_rate / 1024.0 END) AS 'RedoLatency' FROM sys.dm_hadr_database_replica_states rs JOIN sys.availability_replicas r ON r.group_id = rs.group_id AND r.replica_id = rs.replica_id JOIN (SELECT l1.instance_name, l2.cntr_value - l1.cntr_value redo_rate FROM #redo1 l1 JOIN #redo2 l2 ON l2.instance_name = l1.instance_name ) redo_rate ON redo_rate.instance_name = DB_NAME(rs.database_id) JOIN (SELECT l1.instance_name, l2.cntr_value - l1.cntr_value send_rate FROM #send1 l1 JOIN #send2 l2 ON l2.instance_name = l1.instance_name ) send_rate ON send_rate.instance_name = DB_NAME(rs.database_id); DROP TABLE #send1; DROP TABLE #send2; DROP TABLE #redo1; DROP TABLE #redo2; " Write-Host "Obtaining information from Secondary Replica" $SecondaryReplica.instance $SecondaryReplicaResults = Invoke-Sqlcmd -Query $SecondaryReplicaInformationQuery -ServerInstance $SecondaryReplica.name -ErrorAction Stop -querytimeout 30 } #Fisrt let's insert the information of the Primary Replicas followed by their respective Secondary Replicas foreach($PrimaryReplicaResult in $PrimaryReplicaResults){ #Build the INSERT statement if it returned at least 1 row if($PrimaryReplicaResults.Length -ne 0){ #Build the insert statement $insert = "INSERT INTO XXX VALUES" $insert += " ( '"+$PrimaryReplicaResult.PrimaryReplica+"', '"+$PrimaryReplicaResult.Database+"', "+$PrimaryReplicaResult.MDFSize+", "+$PrimaryReplicaResult.LDFSize+", '"+$PrimaryReplicaResult.State+"', '"+$PrimaryReplicaResult.Health+"', '"+$PrimaryReplicaResult.PrimaryReplica+"', "+$PrimaryReplicaResult.LogFlushed+", 0, 0, 0, 0, 0, 0, 0 ), " #Store the Primary Replicas results in our centralized database Invoke-Sqlcmd -Query $insert.Substring(0,$insert.LastIndexOf(',')) -ServerInstance $server -Database $inventoryDB } } #Now let's update the information of each Primary Replica with the information of their respective Secondary Replica foreach($SecondaryReplicaResult in $SecondaryReplicaResults){ #Build the UPDATE statement if it returned at least 1 row if($SecondaryReplicaResults.Length -ne 0){ #Build the update statement $update = "UPDATE XXX" $update += " SET SecondaryReplica = '"+$SecondaryReplicaResult.SecondaryReplica+"', LogSendQueueSize = "+$SecondaryReplicaResult.LogSendQueueSize+", LogSendRate = "+$SecondaryReplicaResult.LogSendRate+", SendLatency = "+$SecondaryReplicaResult.SendLatency+", RedoQueueSize = "+$SecondaryReplicaResult.RedoQueueSize+", RedoRate = "+$SecondaryReplicaResult.RedoRate+", RedoLatency = "+$SecondaryReplicaResult.RedoLatency+" WHERE PrimaryReplica = '"+$PrimaryReplicaResult.PrimaryReplica+"' AND [Database] = '"+$SecondaryReplicaResult.Database+"'; " #Execute the UPDATE statement Invoke-Sqlcmd -Query $update -ServerInstance $server -Database $inventoryDB } } } Write-Host "Done!"
After executing the PowerShell script, you should be able to have something that looks lot like the following:
With this final output, you can choose to do what you want:
- Build a report, through Reporting Services and automate it as needed.
- Build an HTML table and have it sent to you.
- Build any kind of alerting mechanism you feel comfortable with.
Next Steps
- You can add as many performance counters as you like. There’s a very useful list at Microsoft’s website, where you can even see from what replica you have to capture the value for a particular counter.
- The scripts presented in this tip are focused towards Availability Groups that have 1 secondary replica per Availability Group. A future version will be able to handle any number of replicas available, although the presentation of the results will have to vary a bit (at least this can be a very good starting point if you have nothing at all).
- Download the PowerShell script.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips