SQL Server Monitoring for all Availability Groups


By:   |   Updated: 2020-12-24   |   Comments   |   Related: More > Availability Groups


Problem

Some organizations rely on SQL Server AlwaysOn Availability Groups (AG) and knowing the status for all Availability Groups in your organization can be quite helpful, especially if there are any issues.  During my time as a SQL Server DBA, I have faced rounds of patching sessions that require back and forth failovers of the AGs to reduce the downtime as much as possible. During these sessions, I've always wanted an easy way to know if all the AGs are healthy so that the overall work can be done as quickly as possible. To achieve that, I have come up with a solution that can be easily implemented in your own environment to have vital information about all the Availability Groups under your supervision.

Solution

I will present a PowerShell script that will collect information about all the Availability Groups, Availability Replicas, and Availability Databases spread across the entire database infrastructure.

Prerequisites

In order to use this module, you'll need the following:

  • Create the core objects found here.
  • In addition to the core objects from the link above, this particular module will require the objects deployed by the module called Get-MSSQL-Instance-Values. Why? The core database objects revolve around a central master server list that can be used to promote several other modules which have also been published on MSSQLTips.com. So, with this in mind, the central solution must consider all the instances under your support. This can go from SQL Server 2000 all the way to SQL Server 2019; however, let's remember that Availability Groups is a technology that was introduced in SQL Server 2012. If you have a bunch of instances below 2012, it would be a complete waste of time to try to build an Availability Groups inventory by attempting to find any information from those instances. This is where the Get-MSSQL-Instance-Values module comes into play. I can limit the subset of instances to traverse by selecting only those >= 2012.
  • Here's a screenshot of what the execution of the script looks like if you don't have the objects loaded from the Get-MSSQL-Instance-Values module:
instance values

Database Objects

For this particular module, several tables will be created.

Notice the structure of the tables so you that can get an idea of what data will be stored in each.

If you want to add/remove things, make sure to adjust the respective structures within the PowerShell script and adapt the respective logic that will handle what you want to modify.

This solution was tested in a traditional on-premises Windows Server setup. If you'd like to add things like Availability Groups in Linux using Pacemaker, Availability Groups without a Failover Cluster (SQL 2017 and beyond), or something built in Azure, you might need to adapt the overall logic of the script to include those cases.

Tables

  • inventory.AvailabilityGroups
    • id: internal id used as Primary Key for the table.
    • group_name: the name of the Availability Group.
    • seeding_mode: the seeding mode for the Availability Group.
    • listener: the listener created for the Availability Group (if any).
    • listener_port: the port assigned to the listener (if any).
    • wsfc: the name of the Windows Failover Cluster housing the Availability Group.
    • health: the text description of the current health status of the Availability Group.
    • data_collection_timestamp: the timestamp value when the record was fetched from the instance.
  • inventory.AvailabilityGroupReplicas
    • id: internal id used as Primary Key for the table.
    • group_id: the id that ties to the table inventory.AvailabilityGroups
    • replica: the name of the SQL Server instance acting as replica of the Availability Group.
    • role: the current role of the replica (PRIMARY, SECONDARY).
    • replica_health: the current health status of the replica.
    • availability_mode: the current availability mode (SYNCHRONOUS/ASYNCHRONOUS COMMIT).
    • failover_mode: the current failover mode for the replica (AUTOMATIC/MANUAL).
    • connections_in_primary_role: states what types of connections are allowed to this replica when it is acting as primary.
    • connections_in_secondary_role: states what types of connections are allowed to this replica when it is acting as secondary.
    • seeding_mode: the current seeding mode for the replica.
    • endpoint_url: the name of the endpoint used to establish communication.
    • data_collection_timestamp: the timestamp when the information was fetched from the SQL Server instance.
  • inventory.AvailabilityGroupDatabases
    • group_id: the id that ties to the table inventory.AvailabilityGroups
    • database: the name of the Availability Database in the Group.
    • mdf_size: the current size (in MB) of the data file size.
    • ldf_size: the current size (in MB) of the log file size.
    • state: the current state of the database (SYNCHRONIZED, SYNCHRONIZING, etc.).
    • health: the current health status of the database.
    • primary_replica_id: the id that ties to the table inventory.AvailabilityGroupReplicas, to know which is the current replica acting as Primary.
    • data_collection_timestamp: the timestamp when the information was fetched from the SQL Server instance.

Important Note:

There will be times when you simply don't want to collect this information from all the registered instances you have in your Master Server List. To address this, simply modify the query text inside the variable $instanceLookupQuery within the PowerShell script and you should be good to go.

PowerShell Script

The PowerShell script that collects the information from each instance is called:

  • Get-MSSQL-AvailabilityGroups.ps1.

The script has some validations that will help you check if some key elements are missing for the script to run successfully. For instance, it will confirm that the inventory.MasterServerList table exists and that it has at least 1 active instance registered to be able to have something to work with.

If you have followed along with the other modules, you will notice that we have been storing all objects in "C:\temp", but you can use any folder you want.  If you make a change to the central folder location you will need to edit the first line in the following PowerShell script to specify the new folder location.

How to Use

Navigate to the folder where you created the files and run the PowerShell script as follows:

Option 1

  • Right click on Get-MSSQL-AvailabilityGroups.ps1 and select Run with PowerShell

Option 2

  • Open a command window and navigate to the folder where you saved the above files and run
powershell "C:\temp\Get-MSSQL-AvailabilityGroups.ps1"

Database Objects Deployment

Here are the database objects that you should see in your central database after executing the PowerShell script:

powershell script

Query Availability Group Tables

To demonstrate the output that the script produces, I have setup a test lab environment to have a 2 node Windows Failover Cluster that will house 2 Availability Groups (1 SQL Server 2012 instance and 1 in SQL Server 2016 instance).

Here's the output of the following queries, respectively:

SELECT * FROM inventory.AvailabilityGroups;
SELECT * FROM inventory.AvailabilityGroupReplicas;
SELECT * FROM inventory.AvailabilityGroupDatabases;

For the Availability Group "TestAG", I created a database listener (purely for demonstration purposes). I created only a database called "test" in each Availability Group. These particular results show that everything looks perfectly fine for all the deployed Availability Groups.

group name

What does it look like if something goes wrong with any of the replicas? To demonstrate that, I have manually stopped instance "DB2" and this is what the output looks like after re-executing the script above:

availability groups

From the table inventory.AvailabilityGroups, you can already see that the group itself is "NOT_HEALTHY". But with just that information alone you still have more digging to do because you don't know exactly which replica the issue actually is. That's where the inventory.AvailabilityGroupReplicas table comes in. Here you can see that replica called "DB2" is the one marked as "NOT_HEALTHY", precisely the one I manually stopped.

For this particular example, the inventory.AvailabilityGroupDatabases table won't show anything outstanding because the Primary Replica for the "TestAG" group is still up and serving incoming connections.

Important Note:

Every time the script is executed, the information from the 3 inventory Availability Group tables is wiped and only the most up-to-date information is stored. If you'd like to preserve the previous information, you have to adapt that logic within the script.

Checking for Errors

To check for errors query the monitoring.ErrorLog table using the following query:

SELECT *
FROM monitoring.ErrorLog
WHERE script = 'Get-MSSQL-AvailabilityGroups'

If you'd like to know the SQL Server instance that got the errors, issue the query like this:

SELECT 
  CASE WHEN msl.instance = 'MSSQLSERVER' THEN msl.server_name 
       ELSE CONCAT(msl.server_name,'\',msl.instance) 
  END AS instance,
  e.script,
  e.message,
  e.error_timestamp
FROM monitoring.ErrorLog e
JOIN inventory.MasterServerList msl ON msl.serverId = e.serverId
WHERE e.script = 'Get-MSSQL-AvailabilityGroups'

Useful queries

Check if there's at least one Availability Group that's not in a HEALTHY state:

SELECT *
FROM inventory.VW_AvailabilityGroupsHealth
WHERE Health <> 'HEALTHY'

Check if there's at least one Availability Group Replica that's not in a HEALTHY state:

SELECT *
FROM VW_AvailabilityGroupReplicasHealth
WHERE Health <> 'HEALTHY'

Check if there's an Availability Group Database that's either not synchronizing data or its status is not HEALTHY:

SELECT *
FROM inventory.AvailabilityGroupDatabases
WHERE state NOT IN ('SYNCHRONIZING','SYNCHRONIZED') OR health <> 'HEALTHY'

Download Scripts

Next Steps


Last Updated: 2020-12-24


get scripts

next tip button



About the author
MSSQLTips author Alejandro Cobar Alejandro Cobar is an MCP, SQL Server DBA and Developer with 10+ years of experience working in a variety of environments.

View all my tips
Related Resources





Comments For This Article





download





Recommended Reading

Connect to SQL Server Availability Group replica with SSMS when Readable Secondary is Read-intent only

Configure SQL Server AlwaysOn Availability Group on a Multi-Subnet Cluster

Fix SQL Server AlwaysOn Availability Group Error: 1408 Joining database on secondary replica resulted in an error

What is SQL Server AlwaysOn?

SQL Server AlwaysOn Availability Group Backup Preference Setting














get free sql tips
agree to terms