By: Tracy Boggiano | Comments | Related: > Availability Groups
Problem
In some environments, DBAs like to use to a third-party tool to monitor their servers especially when they have a lot of servers. When you have a small number of servers having all your alerts come from SQL Server Agent is not so bad, but when you get to where you are managing more, it is nice to be able to have a dashboard to see what is happening across your environment. Also, for alert 1480 which is the alert for the Availability Group is changing roles, you get multiple alerts per server for every failover. If you use the AlwaysOn_health session, you can write a custom alert that only alerts once per failover.
Solution
There are just a few alerts we want to be alerted on when it comes to Availability Groups, the main ones are:
- When the role as changed to the primary alert number 1480
- When the availability group is not ready for automatic failover if it is set up for automatic failover alert number 41405
Introduction to the AlwaysOn_health Extended Events Session
Let’s take a look at the AlwaysOn_health extended events session and how you would go about query it for the relevant error numbers.
First, let’s take a look at where the AlwaysOn_health session is. In SSMS, under Management you will find Extended Events, then under Sessions you will see the AlwaysOn_health session.
If you expand it, you can right-click on package0.eventfile and click View Target Data to get an idea of the type of data that is stored in this extended event.
If you script out the extended event session, you get the following code and see that they track more error messages than just the availability group error messages. Notice the 823 and 824 error numbers telling us about database corruption.
CREATE EVENT SESSION [AlwaysOn_health] ON SERVER ADD EVENT sqlserver.alwayson_ddl_executed , ADD EVENT sqlserver.availability_group_lease_expired , ADD EVENT sqlserver.availability_replica_automatic_failover_validation , ADD EVENT sqlserver.availability_replica_manager_state_change , ADD EVENT sqlserver.availability_replica_state , ADD EVENT sqlserver.availability_replica_state_change , ADD EVENT sqlserver.error_reported ( WHERE ( [error_number] = ( 9691 ) OR [error_number] = ( 35204 ) OR [error_number] = ( 9693 ) OR [error_number] = ( 26024 ) OR [error_number] = ( 28047 ) OR [error_number] = ( 26023 ) OR [error_number] = ( 9692 ) OR [error_number] = ( 28034 ) OR [error_number] = ( 28036 ) OR [error_number] = ( 28048 ) OR [error_number] = ( 28080 ) OR [error_number] = ( 28091 ) OR [error_number] = ( 26022 ) OR [error_number] = ( 9642 ) OR [error_number] = ( 35201 ) OR [error_number] = ( 35202 ) OR [error_number] = ( 35206 ) OR [error_number] = ( 35207 ) OR [error_number] = ( 26069 ) OR [error_number] = ( 26070 ) OR [error_number] > ( 41047 ) AND [error_number] < ( 41056 ) OR [error_number] = ( 41142 ) OR [error_number] = ( 41144 ) OR [error_number] = ( 1480 ) OR [error_number] = ( 823 ) OR [error_number] = ( 824 ) OR [error_number] = ( 829 ) OR [error_number] = ( 35264 ) OR [error_number] = ( 35265 ) ) ) , ADD EVENT sqlserver.hadr_db_partner_set_sync_state , ADD EVENT sqlserver.lock_redo_blocked ADD TARGET package0.event_file ( SET filename = N'AlwaysOn_health.xel', max_file_size = ( 5 ), max_rollover_files = ( 4 )) WITH ( MAX_MEMORY = 4096KB , EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS , MAX_DISPATCH_LATENCY = 30 SECONDS , MAX_EVENT_SIZE = 0KB , MEMORY_PARTITION_MODE = NONE , TRACK_CAUSALITY = OFF , STARTUP_STATE = ON ); GO
How to Query the Extended Events Session for When a Server Becomes the Primary
Now you can write some queries against the extended event and pull out the data you need for alerts. For example, for Role Change alert number 1480 here is the query to pull out only when a server becomes the primary so you only get one alert. The first part of the code goes and finds the current file on the operating system that holds the extended event session data. Then we SELECT from that WHERE the availability_replica_state_change change became the PRIMARY_NORMAL state.
DECLARE @FileName NVARCHAR(4000) SELECT @FileName = target_data.value('(EventFileTarget/File/@name)[1]','nvarchar(4000)') FROM ( SELECT CAST(target_data AS XML) target_data FROM sys.dm_xe_sessions s JOIN sys.dm_xe_session_targets t ON s.address = t.event_session_address WHERE s.name = N'AlwaysOn_health' ) ft SELECT XEData.value('(event/@timestamp)[1]','datetime2(3)') AS event_timestamp, XEData.value('(event/data[@name="previous_state"]/text)[1]', 'varchar(255)') AS previous_state, XEData.value('(event/data[@name="current_state"]/text)[1]', 'varchar(255)') AS current_state, XEData.value('(event/data[@name="availability_replica_name"]/value)[1]', 'varchar(255)') AS availability_replica_name, XEData.value('(event/data[@name="availability_group_name"]/value)[1]', 'varchar(255)') AS availability_group_name FROM ( SELECT CAST(event_data AS XML) XEData, * FROM sys.fn_xe_file_target_read_file(@FileName, NULL, NULL, NULL) WHERE object_name = 'availability_replica_state_change' ) event_data WHERE XEData.value('(event/data[@name="current_state"]/text)[1]', 'varchar(255)') = 'PRIMARY_NORMAL' ORDER BY event_timestamp DESC;
How to Query the Extended Events Session for Error Numbers
Next we can query for specific error numbers that are that referenced in the extended events session. In the query below we are looking for the error numbers related to possible database corruption issues.
DECLARE @FileName NVARCHAR(4000) SELECT @FileName = target_data.value('(EventFileTarget/File/@name)[1]','nvarchar(4000)') FROM ( SELECT CAST(target_data AS XML) target_data FROM sys.dm_xe_sessions s JOIN sys.dm_xe_session_targets t ON s.address = t.event_session_address WHERE s.name = N'AlwaysOn_health' ) ft SELECT XEData.value('(event/@timestamp)[1]','datetime2(3)') AS event_timestamp, XEData.value('(event/data[@name="error_number"]/value)[1]', 'int') AS error_number, XEData.value('(event/data[@name="severity"]/value)[1]', 'int') AS severity, XEData.value('(event/data[@name="message"]/value)[1]', 'varchar(max)') AS message FROM ( SELECT CAST(event_data AS XML) XEData, * FROM sys.fn_xe_file_target_read_file(@FileName, NULL, NULL, NULL) WHERE object_name = 'error_reported' ) event_data WHERE XEData.value('(event/data[@name="error_number"]/value)[1]', 'int') IN (823, 824, 829) ORDER BY event_timestamp DESC;
Next Steps
- Setup these queries in your third-party monitoring tool and add the event_timestamp to the WHERE clause.
- Check out these additional resources.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips