Using Extended Events to troubleshoot SQL Server issues
As a DBA, you'll encounter elusive performance, connectivity and locking problems that you'll need to troubleshoot. There are many tools that you can use such as Profiler. In addition to these tools, SQL Server 2008 offers extended events ("XEvents"), which you can use as a powerful tracing system. By default, the "system_health" extended events session is always on, and can provide you help to locate the source of trouble much faster.
Extended Events ("XEvents") is one of the new SQL 2008 features that I like most. XEvents provide a high performing and configurable mechanism for capturing information about events inside the Database Engine. Like the SQL Server default trace, the system_health XEvents session is always on and consumes few resources. Taking time to check its output when you experience certain problems can be really helpful.
You can see that the system_health session is working on your SQL machine by running this code:
SELECT * FROM sys.dm_xe_sessions
Even if you did not create and start any XEvents session, this query will return one record for the system_health session.
SQL Server 2008 does not include a visual interface to manage the XEvents sessions, but you can download the SQL Server 2008 Extended Events SSMS Addin. This tool, offered by Jonathan Kehayias, will help you visualize and edit the XEvents sessions on your machine, as well as create new XEvents sessions. Download the .msi file and run it on your machine. In SSMS open the View menu and choose "Show Extended Event Session Explorer" as shown below.
You'll see the system_health session and all its details
Let's explore the content of this session. Right click on "system_health" and choose Script session as ---> Create. You'll obtain this script:
CREATE EVENT SESSION [system_health] ON SERVER ADD EVENT sqlserver.error_reported( ACTION (package0.callstack, sqlserver.session_id, sqlserver.sql_text, sqlserver.tsql_stack) WHERE (([severity]>=(20) OR ([error]=(17803) OR [error]=(701) OR [error]=(802) OR [error]=(8645) OR [error]=(8651) OR [error]=(8657) OR [error]=(8902))))), ADD EVENT sqlos.scheduler_monitor_non_yielding_ring_buffer_recorded, ADD EVENT sqlserver.xml_deadlock_report, ADD EVENT sqlos.wait_info( ACTION (package0.callstack, sqlserver.session_id, sqlserver.sql_text) WHERE (([duration]>(15000) AND ([wait_type]>(31) AND ([wait_type]>(47) AND [wait_type]<(54) OR [wait_type]<(38) OR [wait_type]>(63) AND [wait_type]<(70) OR [wait_type]>(96) AND [wait_type]<(100) OR [wait_type]=(107) OR [wait_type]=(113) OR [wait_type]>(174) AND [wait_type]<(179) OR [wait_type]=(186) OR [wait_type]=(207) OR [wait_type]=(269) OR [wait_type]=(283) OR [wait_type]=(284)) OR [duration]>(30000) AND [wait_type]<(22))))), ADD EVENT sqlos.wait_info_external( ACTION (package0.callstack, sqlserver.session_id, sqlserver.sql_text) WHERE (([duration]>(5000) AND ([wait_type]>(365) AND [wait_type]<(372) OR [wait_type]>(372) AND [wait_type]<(377) OR [wait_type]>(377) AND [wait_type]<(383) OR [wait_type]>(420) AND [wait_type]<(424) OR [wait_type]>(426) AND [wait_type]<(432) OR [wait_type]>(432) AND [wait_type]<(435) OR [duration]>(45000) AND ([wait_type]>(382) AND [wait_type]<(386) OR [wait_type]>(423) AND [wait_type]<(427) OR [wait_type]>(434) AND [wait_type]<(437) OR [wait_type]>(442) AND [wait_type]<(451) OR [wait_type]>(451) AND [wait_type]<(473) OR [wait_type]>(484) AND [wait_type]<(499) OR [wait_type]=(365) OR [wait_type]=(372) OR [wait_type]=(377) OR [wait_type]=(387) OR [wait_type]=(432) OR [wait_type]=(502)))))) ADD TARGET package0.ring_buffer( SET max_memory=4096) WITH (MAX_MEMORY = 4096KB, EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS, MAX_DISPATCH_LATENCY = 30 SECONDS, MAX_EVENT_SIZE = 0KB, MEMORY_PARTITION_MODE = NONE, TRACK_CAUSALITY = OFF, STARTUP_STATE = ON)
Alternatively you can find this script as part of the u_tables.sql file, located in the Install directory of your SQL Server installation.
As you can see from the definition, this session's output will hold the callstack, sessionID, TSQL and TSQL call stack for errors with severity > 20, other errors (17803, etc.) which are memory pressure related, non-yielding scheduler problems, deadlocks and several types of waits. You can find more details on these waits here.
The session's output is captured in a "ring_buffer" target observing a "first event in, first out" rule. The ring_buffer is memory resident and it captures the data in raw (binary) format. The data is held while the session is "on". Once you stop the session, the memory allocated to the ring_buffer is freed and the data dissapear.
You can see the content of the ring_buffer by joining the dm_xe_session_targets and sys.dm_xe_sessions DMVs and casting the binary data as XML.
SELECT CAST(xet.target_data AS XML) FROM sys.dm_xe_session_targets xet JOIN sys.dm_xe_sessions xe ON (xe.address = xet.event_session_address) WHERE xe.name = 'system_health'
Each "event" element has a set of "data" children and a set of "action" children. The actions are specified in the session's definition. The "data" elements contain the values of the columns of type "data" for each event. Those columns are listed by the sys.dm_xe_object_columns DMV. Let's parse this XML and see the content in tabular form. Since each event returns a different set of data columns, I'll give below an example for the "error_reported" events.
DECLARE @x XML = (select CAST(xet.target_data as xml) from sys.dm_xe_session_targets xet join sys.dm_xe_sessions xe on (xe.address = xet.event_session_address) where xe.name = 'system_health') SELECT t.e.value('@name', 'varchar(50)') AS EventName ,t.e.value('@timestamp', 'datetime') AS DateAndTime ,t.e.value('(data[@name="error"]/value)', 'int') AS ErrNo ,t.e.value('(data[@name="severity"]/value)', 'int') AS Severity ,t.e.value('(data[@name="message"]/value)', 'varchar(max)') AS ErrMsg ,t.e.value('(action[@name="sql_text"]/value)', 'varchar(max)') AS sql_text FROM @x.nodes('//RingBufferTarget/event') AS t(e) WHERE t.e.value('@name', 'varchar(50)') = 'error_reported'
One of the most helpful features of the system_health monitoring is the ability of tracking deadlocks back in time. Since the target is a ring buffer, how much data is stored depends on the target capacity, the activity on the machine, and on other settings such as the maximum number of occurrences that will be retained for each event defined in the event session. But you have the chance to find past deadlocks recorded in the system_health session output.
Run the below code in 2 separate SSMS windows.
--Window1 USE AdventureWorks BEGIN TRAN UPDATE Person.Address SET AddressLine1 = 'New address' WHERE AddressID = 20 WAITFOR DELAY '0:0:10' SELECT * FROM Person.Address WHERE AddressID = 25 --Window 2 USE AdventureWorks BEGIN TRAN UPDATE Person.Address SET AddressLine1 = 'New address' WHERE AddressID = 25 WAITFOR DELAY '0:0:10' SELECT * FROM Person.Address WHERE AddressID = 20
Eventually one of the processes will be chosen as victim. Both queries will appear in the system_health output, in a deadlock report that you can obtain by running the code below.
SELECT CAST( REPLACE( REPLACE(XEventData.XEvent.value('(data/value)', 'varchar(max)'), '
', ' '), ' ',' ') AS XML) AS DeadlockGraph FROM (SELECT CAST(target_data AS XML) AS TargetData from sys.dm_xe_session_targets st join sys.dm_xe_sessions s on s.address = st.event_session_address where name = 'system_health') AS Data CROSS APPLY TargetData.nodes ('//RingBufferTarget/event') AS XEventData (XEvent) where XEventData.XEvent.value('@name', 'varchar(4000)') = 'xml_deadlock_report'
I've ran this code on SQL 2008 SP1. I used REPLACE because of a minor SQL 2008 bug described in this Microsoft KB article. To fix this, either apply the "Cumulative Update 6 for SQL Server 2008 Service Pack 1" or upgrade to the SP2. Click here to view the XML fragment I obtained on my machine.. The "victim-list" element contains the ID of the process chosen as victim. In our case there is only one victim, however multiple victims can be listed here.
Both queries appear in the XML fragment (see the "inputbuf" child of the "process-list" element), and you can see immediately which pieces of code caused the deadlock. The "process-list" element contains both processes ("spids") which participate in the deadlock. The "process" element offers data such as the spid (here 53 and 54), the database id (here currentdb = 18, i.e. AdventureWorks on my machine), the login name, the isolation level, the client application name. The "resource-list" element contains the resources involved in the deadlock. Look at the "owner-list" and "waiter-list" elements and see how the 2 processes blocked each other. The "associatedObjectId" attribute of the "keylock" element will help you identify the table name and the index name. Because this is a partition ID, you'll have to run this piece of code:
SELECT OBJECT_NAME(i.object_id), i.name FROM sys.partitions AS p INNER JOIN sys.indexes AS i ON i.object_id = p.object_id AND i.index_id = p.index_id WHERE p.partition_id = 72057594043432960
Replace the partition_id with the one you've obtained when you reproduced the deadlock on your machine. You'll obtain "Address" as the table name and "PK_Address_AddressID" as the index name. The partition id can be found also in the "waitresource" attribute of the "process" element.
Unfortunately this XML output cannot be graphically visualized in Management Studio, like the output produced by a SQL Trace, because the Extended Events deadlock graph uses a different schema, displaying multiple victims.
- Make sure to remember about the default trace and the system_health extended events session at the right moment. They're always on, and they can offer valuable help in troubleshooting.
- It is a good idea to get used to the extended events feature, since it seems that it will be used more and more for future SQL Server version diagnostic and troubleshooting capabilities.
About the author
View all my tips