Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Scripts          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

SQL Server Transaction Log Grows And Fills Up Drive


By:   |   Last Updated: 2010-09-17   |   Comments (11)   |   Related Tips: More > Database Administration

Problem

I had a production problem recently, the transaction log grew 1 GB every minute and filled up the entire drive. The error message I got was: "The transaction log for database 'MyDB' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases. My SQL Server is clustered and the issue happened after a node failover occurred. In this tip I cover steps that I took to resolve this problem.

Solution

The message I got was the transaction log drive is full and users are unable to access the application, DBAs never want to hear this! The usual DBA question is "what was changed"?

My first priority was to fix the problem immediately, so users can access the application and then do root cause analysis. For the database in question, I changed the database recovery model to SIMPLE. Then I shrank the transaction log file to a reasonable size (note: you can't shrink it below its original size). After these changes, the log growth stopped, but this still did not tell me what the issue was. Something to note is that changing the recovery model to SIMPLE is not possible if Mirroring or Log Shipping is setup unless you remove these first, but these can be reconfigured later, the important thing is to stop the bleeding ASAP.

Other options would have been to create another transaction log file on a different drive that had plenty of space or to clean up files on the drive that was running out of space. But in my case the file was growing so rapidly that I needed to change the recovery model so the growth would stop until this issue was resolved.

Below are the steps that I took to check the size, change the recovery model and shrink the log file.

SELECT recovery_model_desc FROM sys.databases WHERE name = 'MyDB' --Before

ALTER DATABASE MyDB SET recovery simple

SELECT recovery_model_desc FROM sys.databases WHERE name = 'MyDB' --After

EXEC xp_fixeddrives --Check free drive space 

EXEC sp_helpdb MyDB -- Note the size of the log before shrink

DBCC shrinkfile(MyDB_log, 1024) -- shrink log to 1 GB

EXEC sp_helpdb MyDB -- Note the size of the log after shrink

EXEC xp_fixeddrives -- Check free drive space 

First aid is complete, database is up and users are able to use the application again.

Now I need to find out what was causing the log to grow so fast, fix the issue, put the database back in FULL recovery mode, do a full backup and make sure my log backups are working correctly.

The drive now has enough space. I put the database in Full recovery mode again to see if I could catch the offending transaction and I monitored drive space and transaction log size. I immediately noticed the transaction log growing about 1 GB per minute. I then executed the block of code below to find out what may be causing the log growth.

DBCC opentran --get the spid and feed it into the next query

DECLARE @handle BINARY(20)
DECLARE @SPID INT
SET @SPID = 100 -- 100 was the spid from the dbcc opentran

SELECT @handle = sql_handle
FROM MASTER..sysprocesses
WHERE spid = @SPID
SELECT [text] FROM ::fn_get_sql(@handle)
GO 

The culprit was a stored procedure that was showing up every time I executed the above block of code, although the SPID changed each time. I looked into the stored procedure which was not very complicated, but does a bunch of deletes and inserts based on some condition. This procedure was called by an automated process every minute. What happens when a cluster node fail-over happens? The cache is flushed and all the query plans are gone. Since there has not been any issue with this procedure in the past, my first option was to recompile the procedure. Viola, it worked! The transaction log stopped growing so fast and I didn't see this procedure in the open transactions from the above code. So this simple stored procedure that worked fine suddenly caused a major production issue causing the application to shut down and the fix was as simple as a recompile. Quite interesting.

My next step was to start a full database backup ASAP. While the full database backup was running I kicked off transaction log backups every 5 minutes and they seemed to work fine. I changed the transaction log backup to its normal schedule. After the full backup was completed, I reviewed the sql log, drive space, transaction log size, open transactions and everything seemed fine. If it is a cluster, you could try failing back to the other node and see if that fixes the problem, which in my case was the need for a recompile. In a cluster, the storage is shared by the nodes, so there is no benefit from a storage point of view.

After this, I researched what I could have done better in this situation. I knew that the scripts I used were from SQL Server 2000. SQL Server 2005/2008 DMVs would have helped me to find the offending transaction. Another eventful day in a DBAs life. Feels good to have resolved the issue and at the same time, I wanted to find out if I could have handled this any better. So, I armed myself with the latest tools to handle a similar situation. Adam Mechanic has a script to find active transactions here, another good script to find transaction locks is from Mladen Prajdic here.

Next Steps
  • Have scripts/utilities organized so you can get to the issue ASAP when needed.
  • Read more about DBCC SHRINKFILE
  • Read more about DBCC OPENTRAN


Last Updated: 2010-09-17


get scripts

next tip button



About the author
MSSQLTips author Ranga Narasimhan Ranga Narasimhan has been in IT for over 10 years working on Siebel, Visual Basic, Crystal Reports and SQL Server.

View all my tips
Related Resources




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Tuesday, May 07, 2013 - 1:47:29 PM - Shile Back To Top

Hi, 

 

If I have 100 different application databases configured for log shipping, how do I set up maintenance to clean up the sql server transaction log backup files on a regular schedule?

This is because I used the sql default settings(run backup and restore every 15mins) and the amount of files just keeps growing even for databases whose applications gets rarely used.

 

Thanks


Monday, August 20, 2012 - 9:40:23 AM - Dyandev Back To Top

hi Ranga ,

its grate solution and I tried the same as you mentioned above.

Thanks for this solution :)

 

Regards,

Dyandev


Monday, September 20, 2010 - 10:59:33 AM - Ranga Back To Top
Interesting observations and good points. The procedure does the inserts/deletes based on the new data that gets added. Maybe, new data made existing plan to go out of whack! Why did I recomplied first, because, that has helped in the past. This same proc is out there at several clients and there are no complaints of performance issues. But, that is the only fix I did to get the things in order.


Monday, September 20, 2010 - 10:27:52 AM - Bill Back To Top
I concur with Steve's comment - why would recompliling a stored procedure affect the number of transactions written to the transaction log?  Recompliing the procedure may lead to a more efficient plan in terms of IO, but the end result (the number of DELETES or UPDATES) will be the same with an efficient plan or an inefficient plan.

 

Also, it is true that the procedure cache is flushed in the event of a failover (or SQL Server instance restart, which takes place as part of a cluster failover).  However, that means that the procedure will be recomplied the first time it is executed after the SQL Server instance starts back up.  Parameter sniffing taking place in that first compile could have led to an inefficient plan being generated.  However, the author's forced recomplie could just as easily led to an inefficient plan as well (no mention was made of query tuning, query hints, etc). 

 

I would appreciate the author elaborating on why recompiling the procedure led to less transactions being written to the transaction log.


Sunday, September 19, 2010 - 10:44:08 AM - Steve Back To Top
Maybe it's just me, but I fail to see the connection between recompiling a stored procedure and the growth in the transaction log.


Friday, September 17, 2010 - 5:35:16 PM - JohnF Back To Top
I felt that was probably the case Ranga. While many dbas know better than to just change to simple without a good reason, there are also many who do not. Just wanted to mention it in case someone got the wild idea that it was just OK to do this without thinking about it first.

Friday, September 17, 2010 - 4:13:48 PM - Praneeth Back To Top
Well the DB is 1.3 TB  now with 2 large tables of about 200 GB each. The Log disk is 250 GB.  I set it to simple now. Data loss is not an issue. As most important data is replicated to another server and this is more a warehouse. My manger says we will get a new box soon. I got to make it cripple till then I guess. 

 

The other thing that happend is I restarted the SQL service to kill the sql once for all (couldn't wait for it to roll back) the DB started going into recovery every 5 minutes . For next 2 hours.

 

What a day it was.

 

The temp db log file got full the other day when I was out. One of the guys restarted and it worked. 

 

Praneeth


Friday, September 17, 2010 - 3:08:07 PM - Ranga Back To Top
JohnF,

I agree, but in this situation, the T.log grew large enough that backup didn't complete, the app was down! We were basically under water!

 

 


Friday, September 17, 2010 - 3:05:01 PM - Ranga Back To Top
Praneeth,

I understand your situation, it is a large transaction that even rollback takes a long time. May be you need a large Tlog drive. In your case how big is the DB/ Tlog file size etc ?


Friday, September 17, 2010 - 12:43:55 PM - JohnF Back To Top
I would be very leery of changing any database to simple without taking a few steps first, most notably at least getting a log backup. By changing an active db to simple, any recover/roll forward capabilities are now gone and if something hoses the database (quite possible in these stressful types of situations), users are not only going to be without a database while it is restored, they are also going to have to redo all the work they had done for the day. Sometimes in an emergency there may be no choice. The definition of 'emergency' and 'acceptable risks' are varied between companies and databases within companies. But I have yet come into a situation where I couldn't get a log backup in a few minutes while I'm trying to figure out what to do next. And then copied it to a couple of places so I have a backup.

Friday, September 17, 2010 - 10:58:18 AM - Praneeth Back To Top
Hi

Good you were able to slove it.  My situation was more tricky.

 

The log file filled up the disk space between two full back ups (Mine is a warehouse so I  only do nightly full back ups and recovery model simple.) .

 

I identified the culprit  insert statement and killed it. But it didnt die. I got a message that its recovering.

 

DBCC shrink file  did'nt give  much disk space  back as there were no mark up to shrink  . Log back up wouldnt work as the disk is full.

Does you/any one see a way out from here.

Praneeth   

 


Learn more about SQL Server tools