Perfmon Counters to Identify SQL Server Disk Bottlenecks
There are several articles about identifying I/O issues related to SQL Server. There are various methods for finding I/O bottlenecks, so which performance counters are needed for a quick evaluation of a possible disk bottleneck?
There is plenty information on the web about how to determine possible I/O bottlenecks. The information available is so vast that it is easy to get lost. A quick determination of a possible disk bottleneck is not an easy task to achieve and mostly depends on the type of storage such as: direct attached disks, SAN, NAS, iSCISI, Virtual Disk and so forth. In most enterprises, the disk volumes are presented to the SQL team by the storage team as a black box which makes it even harder to know to where the data actually sits.
During the 2010 PASS Summit I spent time at the SQL Clinic and had a good talk with an expert about disk analysis and how to quickly identify a possible disk issue. At the end of the conversation, the Premier Field Engineer was able to provide me with a simple, but effective series of counters to be use with Perfmon.
The basic idea is to use Perfmon counters that do not need additional information to be interpreted. Perfmon has several good counters, but only a select set of these counters helps to measure absolute values in the sense that they do not need additional counters or information to interpret them.
So the two primary counters used for quickly analyzing disk performance are:
- Avg. Disk sec/Read - is the average time, in seconds, of a read of data from the disk.
- Avg. Disk sec/Write - is the average time, in seconds, of a write of data to the disk.
These two counters measure latency directly at the layer just above where the storage device drivers hook into the operating system. These help to accurately measure how long the drivers and hardware took to service an I/O request no matter what kind of hardware you have.
When selecting these counters if you have multiple disks you want to get each instance (each individual disk) so you can see if a bottleneck exists on any particular disk.
As a rule of the thumb, if we are running an OLTP system the average should be under 15ms with maximums up to 25ms. If your numbers are below these values then your I/O is in good standing and no additional disk bottleneck analysis is required. This rule of thumb can be used for any type of system. The less time it takes to read or write data the faster your system will be.
Following is output for Avg. Disk sec/Read for the C: drive. We can see that the Average is 0.130 which is 130ms and the Maximum is 1.089 which is 1089ms. So this system has an I/O bottleneck based on our target numbers. You would interpret the Avg. Disk sec/Write counters the same way.
If the above counter values are above our optimal threshold, we can deepen our analysis by considering the following secondary counters:
- Disk Transfers/sec - is the rate of read and write operations on the disk.
- Disk Reads/sec - is the rate of read operations on the disk.
- Disk Writes/sec - is the rate of write operations on the disk.
- Avg. Disk Queue Length - is the average number of both read and write requests that were queued for the selected disk during the sample interval.
- Current Disk Queue Length - is the number of requests outstanding on the disk at the time the performance data is collected.
This tip is only aimed to suggest and recommend which Perfmon counters are need to quickly verify if we are truly experiencing a disk bottleneck. Once you see numbers above the threshold then you need to spend more time digging in deeper.
- How to Identify I/O Bottlenecks in SQL Server
- Free Microsoft Tools to Help Setup and Maintain Perfmon for SQL Server
- System Monitor (Perfmon) Counters for SQL Server 2005
- Setting up Performance Monitor to always collect SQL Server
- Collecting performance counters and using SQL Server to analyze
About the author
View all my tips