solving sql server problems for millions of dbas and developers since 2006
join MSSQLTips for free SQL Server tips














































SQL Server sys.dm_os_wait_stats DMV Queries

MSSQLTips author Tim Ford By:   |   Read Comments (10)   |   Related Tips: More > Dynamic Management Views and Functions

Problem
I get so frustrated when trying to determine where to start when a user comes to me and tells me that "the database is slow".  Launching Performance Monitor and running a trace in Profiler is inefficient.  Unless I know what I'm looking for to begin with it's hard to narrow things down using those tools!  Do you have any suggestions for quickly determining if I should be looking at CPU, Memory, or I/O issues without the needle v. haystack dilemma that comes with diving right in to Perfmon and Profiler?

Solution
There is definitely a better option to quickly determining where you should spend your performance tuning resources on when encountering a "slow database".  It's also associated with my favorite SQL Server subject: Dynamic Management Views.  There is a specific DMV that you can quickly query to determine if you're dealing with CPU, Memory, or Disk (I/O) issues: sys.dm_os_wait_stats.  Every time SQL Server is forced to wait for a resource it records the wait.  Yes, just like all your exes, SQL Server holds a grudge as well!  You can query this listing of wait instance via the sys.dm_os_wait_stats DMV.  The column list for that DMV is shown below:

  • wait_type - the type of wait being encountered, the current list of wait types is available from Microsoft here. There are currently 201 wait types in SQL Server 2005; 485 in SQL Server 2008 as of SP1.
  • waiting_tasks_count - the cumulative number of tasks that have registered this wait since the last time SQL Server services were restarted.
  • wait_time_ms - the cumulative amount of wait time registered for all tasks encountering this type of wait since the SQL Server services were restarted.
  • max_wait_time_ms - the maximum wait time for any of the tasks encountering this wait since last SQL Server service restart.
  • signal_wait_time_ms - the amount of time, in milliseconds, any request waited after being signaled that the resource was freed, until a worker was assigned to handle the request. A high signal wait time is indicative of CPU issues in that the thread still had to wait for CPU assignment even after the resource being waited on was freed up.

I made mention above that the information is cumulative for this DMV.  This information is held in cache.  It will be lost/reset upon services restart.  If you want to maintain history for this metadata you may want to consider persisting the information in a physical table.  More about that later in this tip though.

So, what does the data from this DMV look like?  Let's see some of the results based upon a simple SELECT *... query:

SELECT FROM sys.dm_os_wait_stats;

Looking at the sample output you'll see that results are returned for each wait_type, regardless of whether or not there were any cumulative tasks that waited on resources for that specific type.  You can also see that lock types are included in the output of sys.dm_os_wait_stats.  A general query of this type doesn't really tell us much though.  It's the other things we can do with this DMV that are telling.

Glenn Berry, in his chapter on DMVs in the recent MVP Deep Dive Book, had a great query that utilizes sys.dm_os_wait_stats.  I've taken that query and altered it slightly for my use to get a snapshot of waits in percentage form at the current point in time.  By creating a common table expression to build a Waits table you can filter out irrelevant wait types and then look at a listing of just those waits that comprise the top N% (in this case 95%) of the waits on the SQL Server instance:

WITH Waits AS
 
(
 
SELECT 
   
wait_type
   
wait_time_ms 1000. AS wait_time_s,
   
100. wait_time_ms SUM(wait_time_msOVER() AS pct,
   
ROW_NUMBER() OVER(ORDER BY wait_time_ms DESCAS rn
 
FROM sys.dm_os_wait_stats
 
WHERE wait_type 
   
NOT IN
     
('CLR_SEMAPHORE''LAZYWRITER_SLEEP''RESOURCE_QUEUE',
   
'SLEEP_TASK''SLEEP_SYSTEMTASK''SQLTRACE_BUFFER_FLUSH''WAITFOR',
   
'CLR_AUTO_EVENT''CLR_MANUAL_EVENT')
   ) 
-- filter out additional irrelevant waits
   
SELECT W1.wait_type,
 
CAST(W1.wait_time_s AS DECIMAL(122)) AS wait_time_s,
 
CAST(W1.pct AS DECIMAL(122)) AS pct,
 
CAST(SUM(W2.pctAS DECIMAL(122)) AS running_pct
FROM Waits AS W1
 
INNER JOIN Waits AS W2 ON W2.rn <= W1.rn
GROUP BY W1.rn
 
W1.wait_type
 
W1.wait_time_s
 
W1.pct
HAVING SUM(W2.pct) - W1.pct 95-- percentage threshold;

In this case you'll see that since the last SQL Server services restart, I'm primarily dealing with ASYNC_NETWORK_IO and OLEDB waits on my SQL Server instance.  I know there to be issues with a specific application on my server that causes these waits due to performing row-by-row processing of SQL batch results sets returned to the application.  The SOS_SCHEDULER_YIELD wait occurs whenever a task offers up its place in queue to allow another process to run in its stead.  It is indicative of CPU issues that may need to be addressed.  Specific waits will point you in the direction of where to focus your tuning resources because the cause for certain waits are attributed directly to CPU, memory, or I/O.  The PAGEIOLATCH_... waits are indicative of I/O issues, as is WRITELOG.  CXPACKET is an indicator of waits due to queries going parallel and running across a number of CPU cores.  I strongly suggest following the Microsoft CSS Team's blog and reading through the number of entries associated with wait types for the continual evolution of precision tuning using sys.dm_os_wait_stats.

The important thing to remember is that the metrics for the columns returned when querying sys.dm_os_wait_stats is cumulative.  Like all Dynamic Management Object information these values accumulate over time and are wiped-clean at a SQL Server service restart.  This is why I mentioned persisting the information in a static table on a dedicated administrative database located on each managed SQL Server instance.  You can then query the information just for the period between the last two stats collections.  I use a SQL Server Agent job that runs hourly to persist the information to the static table.  You could either run the code manually when desired or schedule as works best in your organization.  The query is lightweight, not really too noticeable on a decently-sized server.  The template script below will allow you to do just that.  Note that the remaining scripts in this post use template notation.  CTRL+SHIFT+M in SQL Server Management Studio will provide a form to enter parameter values that are correct for your environment

USE [<database_name,,Foo>];
--Create table to persist wait stats information:

CREATE TABLE <table_schema_name,,dbo>.<table_name,,dm_os_wait_stats>
(
 
[wait_type] [nvarchar](60) NOT NULL,
 
[waiting_tasks_count] [bigint] NOT NULL,
 
[wait_time_ms] [bigint] NOT NULL,
 
[max_wait_time_ms] [bigint] NOT NULL,
 
[signal_wait_time_ms] [bigint] NOT NULL,
 
[capture_time] [datetime] NOT NULL,
 
[increment_id] [int] NOT NULL
);

ALTER TABLE <table_schema_name,,dbo>.<table_name,,dm_os_wait_stats
 
ADD  DEFAULT (GETDATE()) FOR [capture_time];

--Insert wait stats info in a datestamped format for later querying:
DECLARE @DT DATETIME ;
SET @DT GETDATE() ;
DECLARE @increment_id INT;

SELECT @increment_id = MAX(increment_id) + FROM <table_schema_name,,dbo>.<table_name,,dm_os_wait_stats>;
SELECT @increment_id ISNULL(@increment_id, 1)
 

INSERT INTO <database_name,,Foo>.<table_schema_name,,dbo>.<table_name,,dm_os_wait_stats>
 (
[wait_type][waiting_tasks_count][wait_time_ms][max_wait_time_ms],
 
[signal_wait_time_ms][capture_time][increment_id])
SELECT [wait_type][waiting_tasks_count][wait_time_ms][max_wait_time_ms]
 
[signal_wait_time_ms]@DT@increment_id
FROM sys.dm_os_wait_stats;

The following information would be persisted to the physical table of your choosing.  If using the defaults in the template it would be Foo.dbo.dm_os_wait_stats.  Note how it creates and increment_id and datestamp for the data collected:

Then you can run a query similar to this and only view counts resulting from recent activity - not activity over days, months, weeks, etc (trailing columns removed from screen shot for space concerns):

--Return persisted information from table
USE [<database_name,,Foo>];

DECLARE @max_increment_id INT

------------------------------------------------------------------
--Determine most-recent increment_id
------------------------------------------------------------------
SELECT @max_increment_id = MAX(increment_id)
FROM <table_schema_name,,dbo>.<table_name,,dm_os_wait_stats>
   
------------------------------------------------------------------
--Present Waits results for period
------------------------------------------------------------------
SELECT DOWS1.wait_type
 (
DOWS1.waiting_tasks_count DOWS2.waiting_tasks_countAS [waiting_tasks_count],
 (
DOWS1.wait_time_ms DOWS2.wait_time_msAS [wait_time_ms],
 
DOWS1.max_wait_time_ms
 (
DOWS1.signal_wait_time_ms DOWS2.signal_wait_time_msAS [signal_wait_time_ms],
 
DATEDIFF(msDOWS2.capture_timeDOWS1.capture_timeAS [elapsed_time_ms],
 
DOWS1.capture_time AS [last_time_stamp]DOWS2.capture_time AS [previous_time_stamp]
FROM 
 
(
 
SELECT  wait_typewaiting_tasks_countwait_time_msmax_wait_time_ms,
         
signal_wait_time_mscapture_timeincrement_id
 
FROM <table_schema_name,,dbo>.<table_name,,dm_os_wait_stats>
 
WHERE increment_id @max_increment_id
 
)AS DOWS1 
 
INNER JOIN 
 
(
 
SELECT  wait_typewaiting_tasks_countwait_time_msmax_wait_time_ms,
         
signal_wait_time_mscapture_timeincrement_id
 
FROM <table_schema_name,,dbo>.<table_name,,dm_os_wait_stats>
 
WHERE increment_id (@max_increment_id 1)
 )
AS DOWS2 ON DOWS1.wait_type DOWS2.wait_type
WHERE (DOWS1.wait_time_ms DOWS2.wait_time_ms) > 
 
/*
 This can technically be eliminated because we're not persisting these waits:
 AND DOWS1.wait_type NOT IN 
   ('CLR_SEMAPHORE', 'LAZYWRITER_SLEEP', 'RESOURCE_QUEUE',
   'SLEEP_TASK', 'SLEEP_SYSTEMTASK', 'SQLTRACE_BUFFER_FLUSH', 'WAITFOR',
   'CLR_AUTO_EVENT', 'CLR_MANUAL_EVENT')
 */
ORDER BY (DOWS1.wait_time_ms DOWS2.wait_time_msDESC;

Finally, you could re-tool the earlier top N% waits query to examine just the most recent wait stats activity from the persisted table as demonstrated below.  You can utilize this process (building a CTE comparing deltas for just the activity collected since the last two polling periods) to retrofit any query you find online for analyzing results from sys.dm_os_wait_stats.

--wait_stats as percentage just for current collection period:

USE [<database_name,,Foo>];
DECLARE @max_increment_id INT

------------------------------------------------------------------
--Determine most-recent increment_id
------------------------------------------------------------------
SELECT @max_increment_id = MAX(increment_id)
FROM <table_schema_name,,dbo>.<table_name,,dm_os_wait_stats>;
   
------------------------------------------------------------------
--Present Waits results for period
------------------------------------------------------------------
WITH Waits AS
 
(
 
SELECT DOWS1.wait_type
   ((
DOWS1.wait_time_ms DOWS2.wait_time_ms)/1000AS [wait_time_s],
   
100. * (DOWS1.wait_time_ms DOWS2.wait_time_ms) / SUM(DOWS1.wait_time_ms DOWS2.wait_time_msOVER() AS pct,
     
ROW_NUMBER() OVER(ORDER BY (DOWS1.wait_time_ms DOWS2.wait_time_msDESCAS rn
 
FROM 
   
(
   
SELECT  wait_typewaiting_tasks_countwait_time_msmax_wait_time_ms,
      
signal_wait_time_mscapture_timeincrement_id
   
FROM <table_schema_name,,dbo>.<table_name,,dm_os_wait_stats>
   
WHERE increment_id @max_increment_id
   
)AS DOWS1 
   
INNER JOIN 
   
(
   
SELECT  wait_typewaiting_tasks_countwait_time_msmax_wait_time_ms,
      
signal_wait_time_mscapture_timeincrement_id
   
FROM <table_schema_name,,dbo>.<table_name,,dm_os_wait_stats>
   
WHERE increment_id (@max_increment_id 1)
   )
AS DOWS2 ON DOWS1.wait_type DOWS2.wait_type
 
WHERE (DOWS1.wait_time_ms DOWS2.wait_time_ms) > 0
 
)
   
SELECT W1.wait_type,
 
CAST(W1.wait_time_s AS DECIMAL(122)) AS wait_time_s,
 
CAST(W1.pct AS DECIMAL(122)) AS pct,
 
CAST(SUM(W2.pctAS DECIMAL(122)) AS running_pct
FROM Waits AS W1
 
INNER JOIN Waits AS W2 ON W2.rn <= W1.rn
GROUP BY W1.rn
 
W1.wait_type
 
W1.wait_time_s
 
W1.pct
HAVING SUM(W2.pct) - W1.pct 95-- percentage threshold;

Next Steps



Last Update: 2/24/2010


About the author
MSSQLTips author Tim Ford
Tim Ford is a Senior Database Administrator and SQL Server MVP. He has been working with SQL Server since 1999.

View all my tips


print tip Print  
Become a paid author




Recommended For You








Comments and Feedback:
Wednesday, February 24, 2010 - 8:29:57 AM - unclebiguns Read The Tip
Nice post. Since the stats are reset at a restart wouldn't you need to modify the queries that report on your persisted data to NOT subtract the previous entries if there was a server restart between the times you collected the data? You could use the start time for the default trace in sys.traces or the creation time for tempdb from sys.databases.

Wednesday, February 24, 2010 - 4:57:45 PM - gmartinez Read The Tip

Have you tried  this command: DBCC SQLPERF ('sys.dm_os_wait_stats', CLEAR);?

 

 


Thursday, February 25, 2010 - 8:49:34 AM - timmer26 Read The Tip

Doh Jack!

I do that with the other persisted metadata I've written about here. I'll change the script this weekend and repost.  Thanks!


Thursday, February 25, 2010 - 8:53:12 AM - timmer26 Read The Tip

 [quote user="gmartinez"]

Have you tried  this command: DBCC SQLPERF ('sys.dm_os_wait_stats', CLEAR);?

[/quote]

Yes, that is how your clear the wait stats DMV, however, typically you want to retain this info.  I do use it though after making changes/tuning and cutting a final, persisted store of the metadata from the DMV to the physical table.  Then, when reviewing the results from this DMV I am only looking at results since the tuning changes that have been made.


Thursday, February 25, 2010 - 4:04:44 PM - unclebiguns Read The Tip
Not a big deal. I think I saw that in one of your earlier posts. Still a great post and something I need to start moving toward persisting this data and analyzing it. Need to become more proactive. Enjoying the Tips.

Tuesday, March 02, 2010 - 5:09:30 PM - Ranga Read The Tip

Does adding this to the insert script ignore wait_types that do not have any impact ? 

where wait_time_ms > 0


Wednesday, October 12, 2011 - 5:31:01 PM - Bennett Read The Tip

Awsome post!!!  I love the idea of implementing a permanent table to capture and persist the wait statistics.  I'm implementing it right now.  One thing has me scratching my head.  Is there a reason that increment_id is *not* an identity column? 


Tuesday, May 14, 2013 - 11:38:06 AM - Laura Read The Tip

I have ran this

WITH Waits AS
 (
 SELECT 
   wait_type, 
   wait_time_ms / 1000. AS wait_time_s,
   100. * wait_time_ms / SUM(wait_time_ms) OVER() AS pct,
   ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS rn
 FROM sys.dm_os_wait_stats
 WHERE wait_type 
   NOT IN
     ('CLR_SEMAPHORE', 'LAZYWRITER_SLEEP', 'RESOURCE_QUEUE',
   'SLEEP_TASK', 'SLEEP_SYSTEMTASK', 'SQLTRACE_BUFFER_FLUSH', 'WAITFOR',
   'CLR_AUTO_EVENT', 'CLR_MANUAL_EVENT')
   ) -- filter out additional irrelevant waits
   
SELECT W1.wait_type,
 CAST(W1.wait_time_s AS DECIMAL(12, 2)) AS wait_time_s,
 CAST(W1.pct AS DECIMAL(12, 2)) AS pct,
 CAST(SUM(W2.pct) AS DECIMAL(12, 2)) AS running_pct
FROM Waits AS W1
 INNER JOIN Waits AS W2 ON W2.rn <= W1.rn
GROUP BY W1.rn, 
 W1.wait_type, 
 W1.wait_time_s, 
 W1.pct
HAVING SUM(W2.pct) - W1.pct < 95; -- percentage threshold;

 

The output looks like this

BROKER_TASK_STOP           439656.09      59.11      59.11
BROKER_RECEIVE_WAITFOR 177571.00      23.87      82.98
WRITELOG                           37817.08       5.08       88.06
SOS_SCHEDULER_YIELD         34304.94       4.61       92.67
BACKUPIO                           12172.66       1.64        94.31
PAGEIOLATCH_SH                  7717.22        1.04       95.35

 

I do not know how to interpret this.  Can you help me?

 

 

 

 


Thursday, December 19, 2013 - 8:00:40 PM - Bill Read The Tip

 "I do not know how to interpret this.  Can you help me?"

My reply is 7 months late, but I will take a stab at answering (for the sake of the blog).

We do not know whether the presented metrics reflect a specific period of time during which a problem was known to be occurring (over the majority of that time period), whether the metrics reflect a period of time during which a much briefer problem period may have occurred, or whether the above metrics simply reflect has happened since SQL Server was last restarted or since these metrics were last cleared (AKA the metrics reflect no noticeable problem).

Wait types are accumulated by hundreds (if not a thousand) worker threads running in parallel. There are both interdependencies and independencies between wait types. What those threads are doing depends upon what T-SQL clients are submitting (and when), at any given moment.

It is important to concentrate upon the dm_os_wait_stats while a problem is happening, but it is equally important to avoid considering dm_os_wait_stats when no problem is happening. The caveat: It is nice to have baseline data (captured during a time when no problem was happening) for comparison purposes. No system (SQL Server, mechanical, or electrical) is infinitely fast - every system is always waiting on some wait_type. You will face a Herculaen task if your goal is to “eliminate all waits.” Invariably, it is a human judgment that decides when a piece of string (a wait type) is “too long”. Choose your battle wisely :).

Using http://msdn.microsoft.com/en-us/library/ms179984.aspx:

{

BROKER_TASK_STOP == "Occurs when the Service Broker queue task handler tries to shut down the task. The state check is serialized and must be in a running state beforehand." Also see http://blogs.msdn.com/b/sql_service_broker/archive/2008/12/01/service-broker-wait-types.aspx.

BROKER_RECEIVE_WAITFOR == "Occurs when the RECEIVE WAITFOR is waiting. This is typical if no messages are ready to be received." Also see http://blogs.msdn.com/b/sql_service_broker/archive/2008/12/01/service-broker-wait-types.aspx.

WRITELOG == "Occurs while waiting for a log flush to complete. Common operations that cause log flushes are checkpoints and transaction commits."

SOS_SCHEDULER_YIELD == "Occurs when a task voluntarily yields the scheduler for other tasks to execute. During this wait the task is waiting for its quantum to be renewed."

BACKUPIO == "Occurs when a backup task is waiting for data, or is waiting for a buffer in which to store data. This type is not typical, except when a task is waiting for a tape mount."

PAGEIOLATCH_SH== "Occurs when a task is waiting on a latch for a buffer that is in an I/O request. The latch request is in Shared mode. Long waits may indicate problems with the disk subsystem."

}

The Broker% wait_time_s indicate this system's Service Broker should largely be idle. You should compare the above time period's data with another time period (by following "Present Waits results for period" in above code example.)

The WRITELOG wait_time_s indicates that frequent log flushes are overwhelming (queuing) the logs' disks. See  http://blogs.msdn.com/b/sqlsakthi/archive/2011/04/17/what-is-writelog-waittype-and-how-to-troubleshoot-and-fix-this-wait-in-sql-server.aspx (you will need to find that blog's referenced an d important SQLCAT document, because the SQLCAT team has moved their URLs to msdn).

The SOS_SCHEDULER_YIELD wait_time_s indicates threads must yield their scheduled run, so that other threads can take their turn at being run. You should compare the above time period's data with another time period (by following "Present Waits results for period" in above code example.)

The BACKUPIO 12,172.66 wait_time_s is about 1/3rd of SOS_SCHEDULER_YIELD's 34,304.94 wait_time_s. I believe there is qualitative point where a DBA should not worry about "diminishing returns". Worrying about BACKUPIO can be "making a mountain out of a mole hill", especially when "there are bigger fish to fry". The same applies to PAGEIOLATCH_SH.

Assuming the dm_os_wait_stats metrics reflect a specific period of time during which a problem was persistently occurring (for the majority of the time period), I think you should solely concentrate upon the causes of WRITELOG. You can and should revisit the other wait types, with new data in hand, later (i.e. their ranked importance can change, after you address WRITELOG). For WRITELOG, the SQLCAT reference covers inadequate disk performance, while the above msdn blog covers the impact of excessively chatty singleton updates, deletes, or inserts.  Use the perfmon counters identified within the SQLCAT reference to investigate inadequate disk performance. More broadly: Use sqlio to determine your disks' throughput capacities, and consider the system's dm_io_virtual_file_stats. Also for WRITELOG, the msdn blog's cause can be identified by looking for excessively "chatty" DML, using a trace that was caught *while* the WRITELOG waits were accruing (and concentrating upon the number of executions per tokenized DML statement).

Of course, if the data files, the log files, and the backups were all going to the same physical disk (perhaps a RAID 5 disk, with its parity disk write penalty), WRITELOG waits should not be surprising (and neither would be the SOS_SCHEDULER_YIELD, BACKUPIO, and PAGEIOLATCH_SH).

While it is possible to look at DMVs captures at time A, perfmon captures at time B, and traces at time C, the odds are heavily against your uncovering the true cause of a performance problem There is an old story about many blind men who were feeling an elephant and describing entirely different beasts. Similarly, many DBAs can describe many beasts, when those DBAs do not collect DMVs, perfmon, and traces all at the same time. Juggling those data collection tasks is no small feat :)


Tuesday, March 11, 2014 - 5:34:57 PM - Joseph LaBonde Read The Tip

Can you explain the pct and running_pct columns, I am having a dificult time figuring it out.

 

Thanks for the great queries.



Post a Comment or Question

Keep it clean and stay on the subject or we may delete your comment.
Your email address is not published. Required fields are marked with an asterisk (*)

*Name   *Email Notify for updates

Signup for our newsletter


Comments
*Enter Code refresh code


 
Sponsor Information







Copyright (c) 2006-2014 Edgewood Solutions, LLC All rights reserved
privacy | disclaimer | copyright | advertise | about
authors | contribute | feedback | giveaways | user groups | community | events | first timer?
Some names and products listed are the registered trademarks of their respective owners.