Best Practices For Deleting SQL Server Data

By: Ben Snaidero | Comments (5) | Related: > TSQL

Problem

Deleting data from a SQL Server database is something I really don't like to do as it's always better if it can be done through an application that has been thoroughly tested. However, inevitably there comes a time where some backend data cleanup needs to be done. Below are some best practices I like to follow when deleting data.

Solution

Before we get into the details of this tip I just want to mention that you should ALWAYS HAVE A BACKUP when you are going to be deleting data. I know everyone knows how important backups are, but before you delete anything you probably want to verify you have a good one by restoring it somewhere first. You can never be too careful when it comes to removing data.

With that said, let's setup a couple sample tables for us to run our example queries against. The first table below is the table that contains our actual data and the second one contains a list of records to be deleted from the main table. Here is the T-SQL to create and load these tables.

--Table Setup
CREATE TABLE Main (col1 INT, col2 INT);
CREATE TABLE ToDelete (col3 INT);

DECLARE @val INT
SELECT @val=1
WHILE @val < 50000
BEGIN  
   INSERT INTO Main VALUES (@val,round(rand()*100000,0));
   IF (@val % 1000) = 0
      INSERT INTO ToDelete VALUES (@val);
   SELECT @val=@val+1;
END;

Now let's remove the records from the Main table based on the records in the ToDelete table using the following simple query.

DELETE FROM Main WHERE col1 IN (SELECT col1 From ToDelete);

(49999 row(s) affected)

Oops. It looks like a few too many rows were deleted. What happened? If we look closely at my query I put the wrong column name in the subquery by accident and because this column name exists in the primary table I removed everything. Let's look at a few different ways that we could have avoided this issue.

For our first solution had I used table aliases for each table this would not have happened as the query would error out as follows:

DELETE FROM Main WHERE col1 IN (SELECT td.col1 FROM ToDelete td);

Msg 207, Level 16, State 1, Line 2
Invalid column name 'col1'.

In this case I could then fix the query to use the correct column name and it would remove the records that were supposed to be removed in the first place.

DELETE FROM Main WHERE col1 IN (SELECT td.col3 FROM ToDelete td)
 
(49 row(s) affected)

Another solution to this issue and one I use quite often is to always check how many records are going to be deleted before running the delete. If we created and ran the script that follows when we needed to delete our records we would have noticed that it was going to delete too many records when the SELECT returned 49999. I always comment out the DELETE part of the script so it does not get executed by accident.

SELECT COUNT(1)
--DELETE 
FROM Main WHERE col1 IN (SELECT col1 From ToDelete);

(No column name)
49999

Once I see that it has returned too many records I can fix my query and then run them successfully as shown below.

SELECT COUNT(1)
--DELETE 
FROM Main WHERE col1 IN (SELECT col3 From ToDelete);

-- SELECT output
(No column name)
49

-- DELETE output
(49 row(s) affected)

One final good practice to use when deleting data from a database is to always wrap your DELETE statement in a BEGIN TRAN - COMMIT/ROLLBACK TRAN code block. With the method outlined below you can run the BEGIN TRAN and your DELETE, then verify how many records were affected before you COMMIT your changes. As I did in the previous solution I've commented out the COMMIT part of my script so it does not get run by accident.

BEGIN TRAN
DELETE FROM Main WHERE col1 IN (SELECT col1 From ToDelete);
--ROLLBACK TRAN
--COMMIT TRAN

-- DELETE output
(49999 row(s) affected)

As with the previous example once I see that too many rows were delete I can rollback the transaction and fix the script as shown below. In this case it returned the correct number and then I can run my commit statement.

BEGIN TRAN
DELETE FROM Main WHERE col1 IN (SELECT col3 From ToDelete);
--ROLLBACK TRAN
--COMMIT TRAN

-- DELETE output
(49 row(s) affected)

I understand this is a trivial example, but in the real world where related or foreign key columns are not always named to follow a standard or there is no good environment to test your scripts in, issues like this can pop up. It's always best to make sure you have a backup of your data and if possible test your scripts on a copy of the production database before running any updates or deletes. Even what you think might be a small update could end up being a big issue with the wrong syntax.

Next Steps

Other tips on table aliases:
- Aliasing Table in a SELECT statement Example
- How to Join to the Same Table Multiple Time for a SQL Server Query
What Does BEGIN TRAN - ROLLBACK TRAN - COMMIT TRAN Mean
Look for a future tip on a way to protect yourself when deleting data

About the author

Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on performance tuning.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Comments For This Article

Tuesday, January 5, 2016 - 6:54:32 AM - benson sibanda

What I normally do is to create a rollback script as we have a rollback database that is purged every other week once the deployments have gone in smoothly. So in this case I will create a rollback table, populate it with the data to be deleted first before deleting anything. Then I will create rollback script and package it as part of deployment.

Wednesday, December 30, 2015 - 4:27:59 PM - Matt

One thing about the ROLLBACK/COMMIT options. I ask my team to uncomment one or the other (preferably the ROLLBACK, initially) to avoid inadvertently leaving a transaction open too long and blocking other code.

Also, I will often add an INTO clause into the SELECT statement used for the rowcounts prior to deleting and save the rows to be deleted to a table with the same name as the table I'm deleting from, but with "_bak_mmddyy" appended. Then I usually wait a week in case someone panics at unforeseen consequences of deleting the data.

Wednesday, December 30, 2015 - 10:15:26 AM - Karen

Good tips.

I'm incorporating your code in a very hairy delete process I inherited.

Wednesday, December 2, 2015 - 11:22:59 AM - Thomas Franz

Regarding "I always comment out the DELETE part of the script so it does not get executed by accident.":

If you would write the SELECT below the DELETE, you would get an additional security net (you would get an syntax error if you forgot to comment out the delete):

--DELETE
SELECT COUNT(1)
FROM Main
WHERE ...

Wednesday, December 2, 2015 - 8:10:09 AM - Jared Karney

Excellent article. In addition, I prefer to batch any deletes or purges that I am doing to avoid potential log growth:

DECLARE @BatchCount INT;

DECLARE @BatchSize INT;

SET @BatchSize = 10000;

SET @BatchCount = @BatchSize;

WHILE @BatchSize = @BatchCount

    BEGIN

        DELETE TOP ( @BatchSize )

        FROM    TableA

        WHERE   Col1 < Something;

        SET @BatchCount = @@ROWCOUNT;

    END;

Or, if I am archiving I can to a DELETE...OUTPUT