Solutions for Azure SQL Database Throttle Log Wait of HADR_THROTTLE_LOG_RATE

Problem

We recently experienced errors with an ETL load into an Azure SQL database and identified the wait as HADR_THROTTLE_LOG_RATE_GOVERNOR, but couldn’t find any information from Microsoft about this, or any possible work-arounds. What’s the best way to handle an ETL load if experiencing this wait?

Solution

Let’s look at what might be causing this wait. Microsoft does not provide documentation on it, but in coming across it multiple times there is a strong correlation with the log usage.

We can use the below query to confirm that this is the wait:

SELECT 
 wait_type
 , last_wait_type
 , *
FROM sys.dm_exec_requests  
    CROSS APPLY sys.dm_exec_sql_text(sql_handle)

A couple of quick points to make is that while Azure-SQL sits on SQL Server, it does not always behave the same way and you will find the nuances as you work with it. When we run across the HADR_THROTTLE_LOG_RATE_GOVERNOR wait, we want to look at our current log usage using sys.dm_db_resource_stats:

SELECT TOP 10 
 end_time
 , avg_log_write_percent
 , *
FROM sys.dm_db_resource_stats
ORDER BY end_time DESC

According to multiple Azure-SQL databases that I reviewed, the end_time field is reported every minute and Microsoft states that this reports the UTC time of the minute interval and the avg_log_write_percent column is what I’ve found correlated with this wait (the usage is over 98% when the wait appears). More than likely, the log use is being throttled by our Azure service tier. I have not seen a situation where this wait arises and the log usage is low; we should make sure that querying the above dm_db_resource_stats is near the time we received the wait.

We can also look at the average DTU use from the query that Microsoft provides on the same page – I’ve added some columns to it to make it more meaningful, such as our DTU limit and average DTUs actually used:

SELECT 
 end_time AS [EndTime]
  , (SELECT Max(v) FROM (VALUES (avg_cpu_percent), (avg_data_io_percent), (avg_log_write_percent)) AS value(v)) AS [AvgDTU_Percent]  
  , ((dtu_limit)*((SELECT Max(v) FROM (VALUES (avg_cpu_percent), (avg_data_io_percent), (avg_log_write_percent)) AS value(v))/100.00)) AS [AvgDTUsUsed]
  , dtu_limit AS [DTULimit]
FROM sys.dm_db_resource_stats

In the odd situation where the log usage is low (which I have not seen), we would want to know what our overall DTU usage is. What’s especially helpful about the above query is that if we know the times we see close to the limit (in general), we can develop around these possible bottlenecks. Using the example of an ETL process, we might want to scale up the tier before the load, then scale down after the load. Likewise, if we’re running ETLs throughout the day from APIs, we might want to consider delineating some of them from each other.

These are some other situations to consider as well:

In OLTP use cases we will want to tune queries further, or scale them for the database tier and needs.
Consider how we batch our ETL\replication\data writing and loading, along with how it’s scheduled.
We can move around Azure-SQL service tiers where we start at S3 and scale up to P2, then scale back down to S3.
Related to the above item, it may be more efficient to scale out with smaller Azure-SQL databases at lower service tiers that can handle micro-loads over for a macro-load. For example, current costs at the time of this article, P2 is over five multiples more expensive than S3 ($150 a month vs. $930 a month), so scaling out three S3 instances might be more cost efficient if the loads are throughout the day and cannot be scheduled.

Next Steps

When running into this wait, check the current avg_log_write_percent.
I would suggest monitoring the DTU usage and waits to determine if increasing scale on a schedule reduces issues.
Consider that in some cases, tuning queries further, adding indexes, etc. may be helpful.

Tim Smith

Tim consults for FinTek Development and teaches the course Automating ETL on Udemy. He’s worked with technology since high school, helping his school win its first TCEA award and continues to work in automation, data architecture, back-end development, and smart contract architecture. Tim enjoys testing new technologies early in the diffusion of innovation curve and was an early adopter of NoSQL and smart contract development. He has a blog at http://www.fintekdev.com/ and helps contribute to local technical and financial events in Texas.

Solutions for Azure SQL Database Throttle Log Wait of HADR_THROTTLE_LOG_RATE_GOVERNOR

Leave a ReplyCancel Reply