Troubleshooting SQL Server Transaction Log Related Wait Types
In my previous tip, First Steps on Performance Troubleshooting, I suggested that when facing a performance problem you should take a look at the transaction log. In this tip I will show you how you can use transaction log related wait types to perform a diagnosis.
Troubleshooting transaction log issues can be difficult due to the fact that there are several different aspects that affect the transaction log and therefore can downgrade its performance. The causes vary from a hardware issue to an improper configuration, which is the most frequent cause. In both cases, the problem will be reflected in the wait types. But by looking at the wait types alone wonít give you the answer per se. As a suggestion, I would recommend that in addition to looking at the wait types you should also look at the workload that generates the waits.
Letís suppose two different scenarios; the first, a database where itís running a bulk insert and the second, a database where it is running an OLTP workload. In both cases the possible transaction log related wait types could be the same. What I mean is that in some scenarios the presence of these wait types are most likely to occur and wonít be a major problem.
The Starting Point
Before digging into the transaction log wait types we should start with the basics. The transaction log files should be on a separate disk other than the data files for dedicated SQL Servers. I have seen some SQL Server installations that use a partitioned disk, but this is the same as having the data and log files on the same drive (in fact they are!) with the addition that the disk head has to move more because of the distance amongst the partitions. The recommendation is to keep database log files on a separate RAID 10 drive, when possible.
LOGMGR_QUEUE wait type
According to Microsoft documentation, this wait type occurs when a task is waiting for any outstanding log I/Os to finish. If you see this wait type you should not be worried because it indicates that the log writer is waiting to be invoked by a session or the checkpoint process. In other words, it is telling you that there is no current log activity. Itís most unlikely to see this wait type on user sessions because the log writer is a background process.
LOGMGR wait type
This wait type is seen when a database is being closed, either when a database is set offline or read only or when the instance is shutting down. In the last case you will only see this wait type if you connect through a Dedicated Administrator Connection (DAC).
LOGMGR_RESERVE_APPEND wait type
This wait type occurs when a task is waiting to see if log truncation frees up log space to allow the task to write a new log record. If you see this wait type you should consider increasing the log size and if the log file has the auto grow property enabled you can increase its value.
If you opt for increasing the auto grow value and not increase the log file size, you have to consider that the instant file initialization feature does not work for log files, so the best you can do is to calculate the required log space and allocate it at once.
After taking these actions, if the problem persists you may have to take a look at the log_reuse_wait and log_reuse_wait_desc columns of the sys.databases catalog view to check what is delaying the log truncation process.
LOGBUFFER and WRITELOG wait types
Now I will explain that the LOGBUFFER and WRITELOG wait types that are very similar, but not equal. Although in the practice to troubleshoot them the approach is mostly the same, there is a marked conceptual difference amongst them.
According to Microsoft, the LOGBUFFER wait type occurs when a task is waiting for space in the log buffer to store a log record, and the WRITELOG wait type occurs while waiting for a log flush to complete. In order to understand the previous definitions, first we need to know what the log buffer is.
The Log Buffer is a small contiguous area of memory where SQL Server stores the log records before writing them into the transaction log file in a process called Log Flush. When SQL Server is processing a transaction that inserts, updates or deletes data, or changes the physical structure of the database, it checks if the data page that is going to be modified is already in memory, otherwise the engine fetches the page from disk into memory. When the page is in memory, SQL Server changes its content in memory and marks the page as ďDirtyĒ. At the same time the database engine creates a new log record in the Log Buffer. At this point is when the LOGBUFFER wait type takes leadership, because it measures the time it takes to create a new log record into the log buffer.
In order to comply with the ACID (Atomicity, Consistency, Isolation, and Durability) principle, SQL Server needs to flush the log buffer (write the log buffer contents into the log file) before writing the dirty data pages into the datafiles. And here is when the WRITELOG wait type arises. It measures the amount of time being spent writing the log buffer contents into the transaction log.
Now that we know the difference between these two wait types we are ready to deduce what may cause them.
Due to the fact that the log buffer is a small portion of memory, a transaction that modifies a big amount of data and fills the log buffer, it will need to be flushed in chunks into to the log file. Also a long running transaction that has log records in the log buffer will release less log buffer space to other transactions that must wait for log buffer space in order to keep going. Here we can see a correlation between these two wait types, because if we think about it the log buffers will be waiting to be flushed (LOGBUFFER wait type) if and only if the log records cannot be written to the transaction log (WRITELOG wait type).
You may think why didn't the Microsoft engineers make the log buffer bigger, imagine the following scenario. Your company deposits your salary into your bank account, the transaction is committed but not stored in the log file and the bank server has a power failure. Since the transaction was committed, you have been paid, but when the bank database is recovered without that transaction in the log file, your deposit will be lost.
The question is what can cause these wait types and how to troubleshoot them. The first thought that comes to mind is to check the disk response time. You can do so by querying the IO related Dynamic Management View sys.dm_io_virtual_file_stats and check the values of io_stall_write_ms and io_stall_queued_write_ms for the log file. Also you can check the performance counters PhysicalDisk:Avg.Disk Queue Length and PhysicalDisk :Disk Write Bytes/sec.
If you donít see any lag on disk activity, you can check the number of virtual log files (VLF). The higher the VLF count the slower the log performance will be. You can check the number of VLFs with the DBCC LOGINFO database control command.
- If you donít know what Virtual Log Files are check out this tutorial: What is a Virtual Log File VLF?
- You can learn more about the problems with many Virtual Log files here: Performance Issue with large number of Virtual Log Files in SQL Server Transaction Log.
- To monitor the number of VLFs you can follow read tip: Monitor Your SQL Server Virtual Log Files with Policy Based Management.
- If you need help identifying IO bottlenecks check out this tip: How to Identify IO Bottlenecks in MS SQL Server. Also you can read the following tutorial: Investigating I/O bottlenecks.
- If you need help configuring your hard disk for SQL Server, check out this tip: Hard Drive Configurations for SQL Server.
- When you are troubleshooting wait types you should look at sys.dm_os_waiting_task as well as sys.dm_os_wait_stats Dynamic Management Views. The following tip will give you an insight: SQL Server sys.dm_os_wait_stats DMV Queries.
- You can learn more about Performance Counters in this tutorial: Performance Monitor Tutorial.
- If you are interested in CPU related wait types you can read this article: A closer look at CXPACKET wait type in SQL Server.
- If you want to avoid problems in your instance you should do maintenance work. If you donít now how or need to go further, check out the Maintenance Tips Category.
- Also check out the Performance Tuning Tips Category.
About the author
View all my tips