By: Ben Snaidero | Last Updated: 2015-12-02 | Comments (5) | T-SQL
Deleting data from a SQL Server database is something I really don't like to do as it's always better if it can be done through an application that has been thoroughly tested. However, inevitably there comes a time where some backend data cleanup needs to be done. Below are some best practices I like to follow when deleting data.
Before we get into the details of this tip I just want to mention that you should ALWAYS HAVE A BACKUP when you are going to be deleting data. I know everyone knows how important backups are, but before you delete anything you probably want to verify you have a good one by restoring it somewhere first. You can never be too careful when it comes to removing data.
With that said, let's setup a couple sample tables for us to run our example queries against. The first table below is the table that contains our actual data and the second one contains a list of records to be deleted from the main table. Here is the T-SQL to create and load these tables.
--Table Setup CREATE TABLE Main (col1 INT, col2 INT); CREATE TABLE ToDelete (col3 INT); DECLARE @val INT SELECT @val=1 WHILE @val < 50000 BEGIN INSERT INTO Main VALUES (@val,round(rand()*100000,0)); IF (@val % 1000) = 0 INSERT INTO ToDelete VALUES (@val); SELECT @[email protected]+1; END;
Now let's remove the records from the Main table based on the records in the ToDelete table using the following simple query.
DELETE FROM Main WHERE col1 IN (SELECT col1 From ToDelete); (49999 row(s) affected)
Oops. It looks like a few too many rows were deleted. What happened? If we look closely at my query I put the wrong column name in the subquery by accident and because this column name exists in the primary table I removed everything. Let's look at a few different ways that we could have avoided this issue.
For our first solution had I used table aliases for each table this would not have happened as the query would error out as follows:
DELETE FROM Main WHERE col1 IN (SELECT td.col1 FROM ToDelete td); Msg 207, Level 16, State 1, Line 2 Invalid column name 'col1'.
In this case I could then fix the query to use the correct column name and it would remove the records that were supposed to be removed in the first place.
DELETE FROM Main WHERE col1 IN (SELECT td.col3 FROM ToDelete td) (49 row(s) affected)
Another solution to this issue and one I use quite often is to always check how many records are going to be deleted before running the delete. If we created and ran the script that follows when we needed to delete our records we would have noticed that it was going to delete too many records when the SELECT returned 49999. I always comment out the DELETE part of the script so it does not get executed by accident.
SELECT COUNT(1) --DELETE FROM Main WHERE col1 IN (SELECT col1 From ToDelete); (No column name) 49999
Once I see that it has returned too many records I can fix my query and then run them successfully as shown below.
SELECT COUNT(1) --DELETE FROM Main WHERE col1 IN (SELECT col3 From ToDelete); -- SELECT output (No column name) 49 -- DELETE output (49 row(s) affected)
One final good practice to use when deleting data from a database is to always wrap your DELETE statement in a BEGIN TRAN - COMMIT/ROLLBACK TRAN code block. With the method outlined below you can run the BEGIN TRAN and your DELETE, then verify how many records were affected before you COMMIT your changes. As I did in the previous solution I've commented out the COMMIT part of my script so it does not get run by accident.
BEGIN TRAN DELETE FROM Main WHERE col1 IN (SELECT col1 From ToDelete); --ROLLBACK TRAN --COMMIT TRAN -- DELETE output (49999 row(s) affected)
As with the previous example once I see that too many rows were delete I can rollback the transaction and fix the script as shown below. In this case it returned the correct number and then I can run my commit statement.
BEGIN TRAN DELETE FROM Main WHERE col1 IN (SELECT col3 From ToDelete); --ROLLBACK TRAN --COMMIT TRAN -- DELETE output (49 row(s) affected)
I understand this is a trivial example, but in the real world where related or foreign key columns are not always named to follow a standard or there is no good environment to test your scripts in, issues like this can pop up. It's always best to make sure you have a backup of your data and if possible test your scripts on a copy of the production database before running any updates or deletes. Even what you think might be a small update could end up being a big issue with the wrong syntax.
- Other tips on table aliases:
- What Does BEGIN TRAN - ROLLBACK TRAN - COMMIT TRAN Mean
- Look for a future tip on a way to protect yourself when deleting data
Last Updated: 2015-12-02
About the author
View all my tips