SQL Server 2005 has given us a lot of options on implementing high availability and disaster recovery. More than the technologies themselves, it is important to come up with the proper procedures as we manage different disaster recovery scenarios. How do we come up with procedures for various SQL Server disaster recovery scenarios?
This series of articles will look at different disaster recovery scenarios and the procedures involved in your recovery plan. When considering disaster recovery options for your SQL Server 2005 database, you should include as many technologies as you can so you'll have a pool of options to choose from if a disaster arises. While having these technologies prove to be important, it is the process that goes with it that makes it effective. For this article, let's take a simple scenario where a user accidentally dropped or truncated a table about 5 hours after a database backup was taken. Restoring from a database backup would mean losing 5 hours worth of data. And for most companies, they would rather opt for loss of time than loss of data. Plus, if this was a very large database, it would take quite some time to recover and bring it online. We'll take this scenario to build a procedural approach to recover the database as quickly as possible while minimizing data loss. We will use the Northwind database to demonstrate the process. Remember to change Northwind's database recovery model to FULL before working through the steps below.
1) Make sure you have a very good backup
For the scenario above, let's say you have a daily backup running at 6:00 AM and there are no database snapshots created on a regular basis. Your database is configured to use a single MDF and LDF file, which is not quite right for disaster recovery. Let's generate a full database backup for our Northwind database which will be the starting point for database recovery. Here is that code:
USE master GO BACKUP DATABASE Northwind TO DISK = N'D:\DBBackup\NorthwindBackup.bak' WITH NAME = N'Full Database Backup', DESCRIPTION= 'Starting point for recovery', INIT, STATS= 10 GO
Looking at the Northwind database schema, you won't easily be able to drop the Products, Orders and Customers table due to foreign key constraints defined by other tables like the Order Details table. But I bet you can easily drop the Order Details table. Let's simulate that disaster by dropping the table at around 11:00AM.
DROP TABLE [Order Details]
2) Contain the problem
Since the database only has a single MDF and LDF file, we couldn't do much. All we can do is take the database offline by setting it to restricted user mode
USE master GO
ALTER DATABASE Northwind SET RESTRICTED_USER WITH ROLLBACK IMMEDIATE GO
This will effectively take the database offline, terminating all active connections. This is where the clock starts ticking and we need to take action. Keep in mind your RPO and RTO while proceeding with the next steps.
3) Backup the transaction log
A good DBA would know that the first thing to do when disaster strikes is to backup the transaction log - assuming that your database is set to FULL recovery model. This is to make sure you still have all the active transactions in the log since your last backup. In our scenario, since the last backup - FULL backup, in this case - occurred at 6:00AM.
BACKUP LOG Northwind TO DISK = N'D:\DBBackup\NorthwindBackupLog.trn' WITH NAME= N'Transaction Log Backup' , DESCRIPTION= 'Getting everything to current point in time.', STATS= 10 GO
4) Restore the database to a known good point-in-time
Now, any user who accidentally dropped a table or caused any damage on a database will not tell you immediately. Sometimes, you may have to dig it up for yourself but that would take quite a lot of time. Since we wanted to bring the database back online as fast as we can, let's just assume a "known good" point-in-time and leave the digging for a later time. In the script below, I have chosen to use 10:42AM in my STOPAT parameter as my "known good" point-in-time for demonstration purposes.
RESTORE DATABASE Northwind FROM DISK = N'D:\DBBackup\NorthwindBackup.bak' WITH NORECOVERY, RESTRICTED_USER GO
RESTORE LOG Northwind FROM DISK = N'D:\DBBackup\NorthwindBackupLog.trn' WITH RESTRICTED_USER, STOPAT= '2008-09-23 10:42:44.00', RECOVERY -- use a "known good" point in time GO
Although we have restored the database to a "known good" point-in-time, we don't really know how much data we've lost. We need to find out the exact time when the very last INSERT statement occurred prior to the DROPTABLE statement so that we can recover as much data as we possibly can. But we don't want to do this directly on the server as we need to get the database back online as quickly as we can. This is where the succeeding steps would prove to be very valuable
You can validate whether the dropped table has been restored by running a query against it.
SELECT * FROM Northwind.dbo.[Order Details] GO
5) Create a snapshot of the restored point
We will create a database snapshot of the restored database for further processing. This database snapshot will be our reference in recovering data to the exact time prior to the DROP TABLE statement.
USE master GO
CREATE DATABASE Northwind_RestorePointSnapshot ON ( NAME = N'Northwind', FILENAME = N'D:\DBBackup\NorthwindData_RestorePontSnapshot.snap') AS SNAPSHOT OF [Northwind] GO
Depending on the table schema, we can opt to leave it as it is, as what we will do for the Order Details table, or do a few more tasks. If the table has an existing IDENTITY column, we need to create a gap between the maximum value for our IDENTITY column and assumed number of rows which we need to recover. This, of course, will depend on the number of transactions that occur on the server. To identify the maximum value of the IDENTITY column, you can run the DBCC CHECKIDENT command as shown below:
DBCC CHECKIDENT ('tableName') --Displays the number of rows you have for the restored table GO
This will return the maximum value of the IDENTITY column. Let's say that the number of transactions on this table per day is around 4000 records, we can create a gap between the maximum value and the next value. If the maximum value for the IDENTITY column is 25000, we need to add 4000 to it and run the DBCC CHECKIDENT command again with the RESEED parameter (we are simply assuming that you can recover the lost data within a day, thus, the value 400):
DBCC CHECKIDENT ('tableName',RESEED,29000) --Creates a gap of for the IDENTITY column to start the next value at 29000 GO
6) Bring the database back online
Once you have managed to do that, change the database option to bring it online and allow users to connect back and run their applications.
USE master GO
ALTER DATABASE Northwind SET MULTI_USER GO
Now the database is back online and the dropped table has been restored. Although everyone is happy by now, our job as a DBA does not stop here. Remember that we still need to recover the lost data from the "known good" point-in-time to the time before the DROP TABLE command was executed. That is the only way we can recover as much data as we can. Though there are a few third-party tools we can use to read the transaction log and recover the data by replaying those transactions back, most of us do not have the luxury of playing around with those tools. So our next best bet would be to use the RESTORE with STOPAT option. It's a bit tedious and sometimes very stressful as one mistake would plunge you back into repeating the entire process. All we need here is to find out the times we did our backups until the end of the tail (transaction log) backup. In our scenario, the last backup occurred at 6:00AM and your "known good" point-in-time is at 10:42.44AM. Therefore, you can start doing a RESTORE with the STOPAT option from 10:42.44AM and change the STOPAT time value to maybe a second. If you are not quite sure when the last backup occurred, you can always query the MSDB database.
SELECT * FROM msdb.dbo.backupset AS a INNER JOINmsdb.dbo.backupmediafamily AS b ON a.media_set_id = b.media_set_id WHERE database_name = 'Northwind' ORDER BY backup_finish_date
Note the backup_finish_date and the type columns as this will give you an idea on the time you need to consider for your STOPAT value in your RESTORE command.
7) Restore another copy of the damaged database with a different name for investigation
Restoring another copy of the damaged database with a different name will allow us to work on the restoration of the data without worrying about the availability as we've already managed to bring up the production database. Just make sure you select a different name and database file location for the restored database or you'll end up damaging the already brought up database. We will just repeat what we did in step #4 but with a twist - different name and database file location.
RESTORE DATABASE Northwind_recover FROM DISK = N'D:\DBBackup\NorthwindBackup.bak' WITH MOVEN'Northwind' TO N'D:\DBBackup\NorthwindData_recover.mdf', MOVEN'Northwind_Log' TO N'D:\DBBackup\NorthwindLog_recover.ldf', STANDBY= N'D:\DBBackup\Northwind_UNDO.bak', STATS= 10 GO
RESTORE LOG Northwind_recover FROM DISK = N'D:\DBBackup\NorthwindBackupLog.trn' WITH STANDBY= N'D:\DBBackup\Northwind_UNDO.bak', STATS= 10, STOPAT= '2008-09-23 10:42:44.00' GO
Document the value of your STOPAT parameter as this will be the most critical parameter you'll ever work with during this process. Since we just repeated the process in step#4, we know for a fact that the DROP TABLE command has not been executed at this time.
8) Restore the transaction log by increasing the value of the STOPAT parameter
We run another RESTORE LOG command, increasing the STOPAT parameter value by a minute - from 10:42:44.00 to 10:43:44.00
RESTORE LOG Northwind_recover FROM DISK = N'D:\DBBackup\NorthwindBackupLog.trn' WITH STANDBY= N'D:\DBBackup\Northwind_UNDO.bak', STATS= 10, STOPAT= '2008-09-23 10:43:44.00' GO
This is the part where it becomes iterative. Don't be frustrated at this point as it will be really tedious. You can increase the value by a minute, 5 minutes, 10 minutes and document the time. Remember to run a test query on the dropped object after running the RESTORE LOG command. I would recommend creating a table for this activity that would look something like this.
With this information, you know for a fact that the table was dropped between 10:58:44.00 to 11:03:44.00. You can repeat step #8 and increase the value of the STOPAT parameter by a minute or even a second if you may since you already have a smaller interval to work with. If you find yourself overshooting the time value of the STOPAT parameter, go back to step #7 armed with the tabular information you've documented in step #8, making the restore process a bit faster. Just remember to use the WITH RECOVERY option at the last RESTORE LOG statement like this
RESTORE LOG Northwind_recover FROM DISK = N'D:\DBBackup\NorthwindBackupLog.trn' WITHSTATS= 10, STOPAT= '2008-09-23 11:01:44.00', RECOVERY GO
Once you've managed to restore the database to the time before the DROP TABLE command was executed, you can now do a comparison between what was restored on the production database and what was recovered. You can do this in a number of different ways. Since we already have a database snapshot created earlier, we will use that together with the TableDiff utility. Although the tool was designed for replication, we can use it for disaster recovery as well. A previous tip on SQL Server 2005 tablediff command line utility can give you an overview on how to use this tool but just to highlight that your source database will be the one that you recovered and the destination database will be your database snapshot. This is where your database snapshot would prove to be very important especially if you are dealing with more than one object which is normally the case. If you are not comfortable with command-line utilities, a GUI version was created by the guys from SQLTeam.com. You might want to check that out as well and include it in your DBA toolbox
You can also do an INSERT/SELECT where you insert records on the production database based on a query on the recovered database. Since our Order Details table does not have an IDENTITY column, we can create our own by inserting the records in a temporary table and using the ROW_NUMBER() function:
--This inserts records in a temporary table and assigns a dummy identity value for reference SELECT ROW_NUMBER() OVER (ORDER BY OrderID) AS ROWID, * INTO Northwind_recover.dbo.OrderDetailsRecover FROM [Order Details]
--This inserts recovered records from the recovered database into the production database based on --the dummy identity valuewe have assigned for reference INSERT INTO Northwind.dbo.[Order Details] (OrderID,ProductId,UnitPrice,Quantity,Discount) SELECT OrderID,ProductId,UnitPrice,Quantity,Discount FROM Northwind_recover.dbo.OrderDetailsRecover WHERE ROWID> ( SELECT COUNT(*) FROM Northwind_RestorePointSnapShot.dbo.[Order Details] )
Notice that we used our database snapshot to identify the difference between what we managed to restore and what we have recovered.
It is important to have a disaster recovery plan in place and the procedures necessary for recovery. While it is impossible to come up with procedures for almost every type of disaster, it would help if you start listing the possible disasters that may happen and prepare a disaster recovery plan with procedures and document accordingly.
Simulate this particular process by going thru the steps outlined above.
You can download the Northwind database used in the sample here.
Last Update: 10/1/2008
About the author
Edwin Sarmiento works as a SQL Server DBA for The Pythian Group in Ottawa and is a SQL Server MVP.
You are absolutely correct about other tables being updated while this specific table has been dropped. I just used this example for simplicity's sake but you hit the nail right on the head. The restoration of the transaction log, though, will restore all the transaction in the database - including the ones in the other tables. Let's say Table1 receives like 500 new records at the same time the Order Details table got dropped. Restoring the database transaction log to a known good point in time would restore those new records in Table1 as well. What we are not sure of is if there are other things that happened to Table1 between the known good point in time and the time just before the disaster happened. What you can do is to reseed all other tables with IDENTITY columns in them prior to bringing the database back online. While it may sound like a lot of work, that is what needs to be done as mentioned in Step#5. Then you can use the TableDiff utility to identify the missing records and re-insert them back into the production database
Tuesday, October 28, 2008 - 11:09:12 AM - dschaeffer
In step 6 the database is made available again. It seems to me that reseeding just the table that got dropped is not sufficient. While we're trying to get back to the point in time when the table was dropped, users are probably inserting records into other tables. When we apply those further log transactions I would expect conflicts with identity values in the other tables.
Perhaps in this case the production database should not be overwritten by the restore; instead restore to a copy as shown and when the point in time just before the table drop is reached (found by trial and error if need be), copy the table from that copy to the production database.