Disaster Recovery Procedures in SQL Server 2005

By: Edwin Sarmiento | Comments (2) | Related: 1 | 2 | 3 | 4 | > Disaster Recovery

Problem

SQL Server 2005 has given us a lot of options on implementing high availability and disaster recovery. More than the technologies themselves, it is important to come up with the proper procedures as we manage different disaster recovery scenarios. How do we come up with procedures for various SQL Server disaster recovery scenarios?

Solution

This series of articles will look at different disaster recovery scenarios and the procedures involved in your recovery plan. When considering disaster recovery options for your SQL Server 2005 database, you should include as many technologies as you can so you'll have a pool of options to choose from if a disaster arises. While having these technologies prove to be important, it is the process that goes with it that makes it effective. For this article, let's take a simple scenario where a user accidentally dropped or truncated a table about 5 hours after a database backup was taken. Restoring from a database backup would mean losing 5 hours worth of data. And for most companies, they would rather opt for loss of time than loss of data. Plus, if this was a very large database, it would take quite some time to recover and bring it online. We'll take this scenario to build a procedural approach to recover the database as quickly as possible while minimizing data loss. We will use the Northwind database to demonstrate the process. Remember to change Northwind's database recovery model to FULL before working through the steps below.

1) Make sure you have a very good backup

For the scenario above, let's say you have a daily backup running at 6:00 AM and there are no database snapshots created on a regular basis. Your database is configured to use a single MDF and LDF file, which is not quite right for disaster recovery. Let's generate a full database backup for our Northwind database which will be the starting point for database recovery. Here is that code:

USE master 
GO 
BACKUP DATABASE Northwind  
   TO DISK = N'D:\DBBackup\NorthwindBackup.bak' 
   WITH NAME = N'Full Database Backup', DESCRIPTION = 'Starting point for recovery', 
    INIT, STATS = 10 
GO

Looking at the Northwind database schema, you won't easily be able to drop the Products, Orders and Customers table due to foreign key constraints defined by other tables like the Order Details table. But I bet you can easily drop the Order Details table. Let's simulate that disaster by dropping the table at around 11:00AM.

DROP TABLE  [Order Details]

2) Contain the problem

Since the database only has a single MDF and LDF file, we couldn't do much. All we can do is take the database offline by setting it to restricted user mode

USE master 
GO 

ALTER DATABASE Northwind 
   SET RESTRICTED_USER 
   WITH ROLLBACK IMMEDIATE  
GO

This will effectively take the database offline, terminating all active connections. This is where the clock starts ticking and we need to take action. Keep in mind your RPO and RTO while proceeding with the next steps.

3) Backup the transaction log

A good DBA would know that the first thing to do when disaster strikes is to backup the transaction log - assuming that your database is set to FULL recovery model. This is to make sure you still have all the active transactions in the log since your last backup. In our scenario, since the last backup - FULL backup, in this case - occurred at 6:00AM.

BACKUP LOG Northwind  
   TO DISK = N'D:\DBBackup\NorthwindBackupLog.trn' 
   WITH NAME = N'Transaction Log Backup' 
   , DESCRIPTION = 'Getting everything to current point in time.',  
       STATS = 10 
GO

4) Restore the database to a known good point-in-time

Now, any user who accidentally dropped a table or caused any damage on a database will not tell you immediately. Sometimes, you may have to dig it up for yourself but that would take quite a lot of time. Since we wanted to bring the database back online as fast as we can, let's just assume a "known good" point-in-time and leave the digging for a later time. In the script below, I have chosen to use 10:42AM in my STOPAT parameter as my "known good" point-in-time for demonstration purposes.

RESTORE DATABASE Northwind 
FROM DISK = N'D:\DBBackup\NorthwindBackup.bak' 
WITH NORECOVERY, RESTRICTED_USER
GO 

RESTORE LOG Northwind 
FROM DISK = N'D:\DBBackup\NorthwindBackupLog.trn' 
WITH RESTRICTED_USER,  
STOPAT = '2008-09-23 10:42:44.00', RECOVERY 
-- use a "known good" point in time 
GO

Although we have restored the database to a "known good" point-in-time, we don't really know how much data we've lost. We need to find out the exact time when the very last INSERT statement occurred prior to the DROP TABLE statement so that we can recover as much data as we possibly can. But we don't want to do this directly on the server as we need to get the database back online as quickly as we can. This is where the succeeding steps would prove to be very valuable

You can validate whether the dropped table has been restored by running a query against it.

SELECT * 
FROM Northwind.dbo.[Order Details]
GO

5) Create a snapshot of the restored point

We will create a database snapshot of the restored database for further processing. This database snapshot will be our reference in recovering data to the exact time prior to the DROP TABLE statement.

USE master 
GO 

CREATE DATABASE Northwind_RestorePointSnapshot 
ON 
( NAME = N'Northwind',  
  FILENAME = N'D:\DBBackup\NorthwindData_RestorePontSnapshot.snap') 
AS SNAPSHOT OF [Northwind]
GO

Depending on the table schema, we can opt to leave it as it is, as what we will do for the Order Details table, or do a few more tasks. If the table has an existing IDENTITY column, we need to create a gap between the maximum value for our IDENTITY column and assumed number of rows which we need to recover. This, of course, will depend on the number of transactions that occur on the server. To identify the maximum value of the IDENTITY column, you can run the DBCC CHECKIDENT command as shown below:

DBCC CHECKIDENT ('tableName')
--Displays the number of rows you have for the restored table 
GO

This will return the maximum value of the IDENTITY column. Let's say that the number of transactions on this table per day is around 4000 records, we can create a gap between the maximum value and the next value. If the maximum value for the IDENTITY column is 25000, we need to add 4000 to it and run the DBCC CHECKIDENT command again with the RESEED parameter (we are simply assuming that you can recover the lost data within a day, thus, the value 400):

DBCC CHECKIDENT ('tableName', RESEED, 29000 )
--Creates a gap of for the IDENTITY column to start the next value at 29000 
GO

6) Bring the database back online

Once you have managed to do that, change the database option to bring it online and allow users to connect back and run their applications.

USE master 
GO 

ALTER DATABASE Northwind 
   SET MULTI_USER
GO

Now the database is back online and the dropped table has been restored. Although everyone is happy by now, our job as a DBA does not stop here. Remember that we still need to recover the lost data from the "known good" point-in-time to the time before the DROP TABLE command was executed. That is the only way we can recover as much data as we can. Though there are a few third-party tools we can use to read the transaction log and recover the data by replaying those transactions back, most of us do not have the luxury of playing around with those tools. So our next best bet would be to use the RESTORE with STOPAT option. It's a bit tedious and sometimes very stressful as one mistake would plunge you back into repeating the entire process. All we need here is to find out the times we did our backups until the end of the tail (transaction log) backup. In our scenario, the last backup occurred at 6:00AM and your "known good" point-in-time is at 10:42.44AM. Therefore, you can start doing a RESTORE with the STOPAT option from 10:42.44AM and change the STOPAT time value to maybe a second. If you are not quite sure when the last backup occurred, you can always query the MSDB database.

SELECT     * 
FROM msdb.dbo.backupset AS a 
  INNER  JOIN msdb.dbo.backupmediafamily AS b 
        ON a.media_set_id = b.media_set_id 
WHERE database_name = 'Northwind' 
ORDER BY backup_finish_date

Note the backup_finish_date and the type columns as this will give you an idea on the time you need to consider for your STOPAT value in your RESTORE command.

7) Restore another copy of the damaged database with a different name for investigation

Restoring another copy of the damaged database with a different name will allow us to work on the restoration of the data without worrying about the availability as we've already managed to bring up the production database. Just make sure you select a different name and database file location for the restored database or you'll end up damaging the already brought up database. We will just repeat what we did in step #4 but with a twist - different name and database file location.

RESTORE DATABASE Northwind_recover  
FROM DISK = N'D:\DBBackup\NorthwindBackup.bak'  
WITH MOVE N'Northwind' TO N'D:\DBBackup\NorthwindData_recover.mdf',   
           MOVE N'Northwind_Log' TO N'D:\DBBackup\NorthwindLog_recover.ldf',   
           STANDBY = N'D:\DBBackup\Northwind_UNDO.bak', 
           STATS = 10 
GO 
RESTORE LOG Northwind_recover  
FROM DISK = N'D:\DBBackup\NorthwindBackupLog.trn'  
WITH STANDBY = N'D:\DBBackup\Northwind_UNDO.bak', 
           STATS = 10, STOPAT = '2008-09-23 10:42:44.00'
GO

Document the value of your STOPAT parameter as this will be the most critical parameter you'll ever work with during this process. Since we just repeated the process in step#4, we know for a fact that the DROP TABLE command has not been executed at this time.

8) Restore the transaction log by increasing the value of the STOPAT parameter

We run another RESTORE LOG command, increasing the STOPAT parameter value by a minute - from 10:42:44.00 to 10:43:44.00

RESTORE LOG Northwind_recover  
FROM DISK = N'D:\DBBackup\NorthwindBackupLog.trn'  
WITH STANDBY = N'D:\DBBackup\Northwind_UNDO.bak', 
           STATS = 10, STOPAT = '2008-09-23 10:43:44.00'
GO

This is the part where it becomes iterative. Don't be frustrated at this point as it will be really tedious. You can increase the value by a minute, 5 minutes, 10 minutes and document the time. Remember to run a test query on the dropped object after running the RESTORE LOG command. I would recommend creating a table for this activity that would look something like this.

TIME	OBJECT EXISTED?
10:43:44.00	YES
10:48:44.00	YES
10:58:44.00	YES
11:03:44.00	NO

With this information, you know for a fact that the table was dropped between 10:58:44.00 to 11:03:44.00. You can repeat step #8 and increase the value of the STOPAT parameter by a minute or even a second if you may since you already have a smaller interval to work with. If you find yourself overshooting the time value of the STOPAT parameter, go back to step #7 armed with the tabular information you've documented in step #8, making the restore process a bit faster. Just remember to use the WITH RECOVERY option at the last RESTORE LOG statement like this

RESTORE LOG Northwind_recover  
FROM DISK = N'D:\DBBackup\NorthwindBackupLog.trn'  
WITH STATS = 10, STOPAT = '2008-09-23 11:01:44.00', RECOVERY 
GO

Once you've managed to restore the database to the time before the DROP TABLE command was executed, you can now do a comparison between what was restored on the production database and what was recovered. You can do this in a number of different ways. Since we already have a database snapshot created earlier, we will use that together with the TableDiff utility. Although the tool was designed for replication, we can use it for disaster recovery as well. A previous tip on SQL Server 2005 tablediff command line utility can give you an overview on how to use this tool but just to highlight that your source database will be the one that you recovered and the destination database will be your database snapshot. This is where your database snapshot would prove to be very important especially if you are dealing with more than one object which is normally the case. If you are not comfortable with command-line utilities, a GUI version was created by the guys from SQLTeam.com. You might want to check that out as well and include it in your DBA toolbox

You can also do an INSERT/SELECT where you insert records on the production database based on a query on the recovered database. Since our Order Details table does not have an IDENTITY column, we can create our own by inserting the records in a temporary table and using the ROW_NUMBER() function:

--This inserts records in a temporary table and assigns a dummy identity value for reference 
SELECT  ROW_NUMBER() OVER (ORDER BY OrderID) AS ROWID, *  
INTO Northwind_recover.dbo.OrderDetailsRecover 
FROM [Order Details] 

--This inserts recovered records from the recovered database into the production database based on
--the dummy identity valuewe have assigned for reference
INSERT INTO Northwind.dbo.[Order Details] (OrderID,ProductId,UnitPrice,Quantity,Discount) 
SELECT OrderID,ProductId,UnitPrice,Quantity,Discount  
FROM Northwind_recover.dbo.OrderDetailsRecover 
WHERE ROWID > 
               ( 
                   SELECT COUNT(*)  
                   FROM Northwind_RestorePointSnapShot.dbo.[Order Details] 
               )

Notice that we used our database snapshot to identify the difference between what we managed to restore and what we have recovered.

Next Steps

It is important to have a disaster recovery plan in place and the procedures necessary for recovery. While it is impossible to come up with procedures for almost every type of disaster, it would help if you start listing the possible disasters that may happen and prepare a disaster recovery plan with procedures and document accordingly.
Simulate this particular process by going thru the steps outlined above.
You can download the Northwind database used in the sample here.

About the author

Edwin M Sarmiento is a Microsoft SQL Server MVP and Microsoft Certified Master from Ottawa, Canada specializing in high availability, disaster recovery and system infrastructures.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips