Script to Delete SQL Server Data in Batches


By:   |   Updated: 2020-03-17   |   Comments (6)   |   Related: More > T-SQL

Problem

Sometimes in large SQL Server OLTP systems, tables can grow very big over time. When that time comes, queries against such tables might begin to suffer performance issues because those tables were never designed to host that volume of data. Even with the correct indexes, the performance might not be as good as expected and you might be forced to give some extra thought purging some old data.

Solution

In this tip I’m going to present a few T-SQL scripts that can help you copy information to a historical table (in case historical information is required in the future, but not in the principal table), and set into motion a loop that can delete records in batches (specified by you) using a determined start date and end date.

During the process, you can use another T-SQL script (provided in this solution) to constantly monitor the progress of such deletion.  Perhaps you need to know how many time it takes to delete a day of information or simply how much time it takes to delete X amount of records, and the overall progress.

Code Explained

TThe process will be handled through a stored procedure, so it can fit any case you might have.

  • I will use tempdb as my chosen database to work with through the entire demonstration.
  • I have created a test table called "big_table" and have populated it with 500,000 rows distributed across 5 days.

HHere’s how you are going to call the stored procedure:

EXECUTE [dbo].[Batch_Delete]    @startDate          = '2015-01-01'
  ,@endDate            = '2015-01-06'
  ,@dbName             = 'tempdb'
  ,@schemaName         = 'dbo'
  ,@tableName          = 'big_table'
  ,@dateFieldName      = 'created_date'
  ,@saveToHistoryTable = 1
  ,@batch              = 1000

The names of the parameters are pretty much self-explanatory, but here’s their purpose:

  • @startDate: The start date from when to start working with the records.
  • @endDate: A non-inclusive date to limit when the records will be taken into account. In my example it means that I’m just going to be working with records from >= ‘2015-01-01’ and date < ‘2015-01-06’.
  • @dbName: The name of the database that hosts the table that you want to work with. Take into account that in this same database the historic and metrics tables are created.
    • IfIf you’d like to have these 2 tables in separate databases, a few tweaks would have to be made to the stored procedure and an additional parameter would have to be passed to specify the name of the other database.
  • @schemaName: The name of the schema of the table.
  • @tableName: The name of the table to be processed.
  • @[email protected]: The name of the column that contains the timestamp fields to work with. Remember that this logic is written towards a date-based approach; if you want a different one you’d have to adjust the stored procedure to your particular case.
  • @saveToHistoryTable: If 1, then an identical empty table is created based on @tableName, and the name "_historic" is added to it to distinguish it. If 0, then no data movement will be performed during the execution of the script (use it very carefully).

ThThe stored procedure contains a set of initial validations to make sure that you have entered the correct information to proceed.

Here are all the validations considered:

  • If @startDate is equal or greater than @endDate then notify the user.
  • If the @dbName isn’t specified or it is left blank then notify the user.
  • If the @schemaName isn’t specified or it is left blank then notify the user.
  • If the @tableName isn’t specified or it is left blank then notify the user.
  • If the @dateFieldName isn’t specified or it is left blank then notify the user.
  • If the table targeted by @[email protected]@tableName doesn’t exist, then notify the user.
  • If the table exists but the field @dateFieldName doesn’t, then notify the user.
  • IfIf the parameter @saveToHistoryTable is set to 0 (being 1 the default), a warning will be shown to the user.

Below are some screenshots as an example.

In this example, the table "big_table_X" does not exists, so we get an error.

A screenshot of a cell phone Description automatically generated

In this example, the column "create_date2" does not exists, so we get an error.

A screenshot of a cell phone Description automatically generated

TST-SQL Code for Stored Procedure Creation

At a high-level, this is what the stored procedure does:

  1. If the stored procedure already exists, delete the stored procedure and create it from scratch.
  2. All the set validations described above are performed initially and the execution stops if any of them fails.
  3. If all the validations pass, then create the "Delete_Metrics" table to keep track of how many times (in seconds) it takes each day to be wiped out from the main table.
    • A non-clustered index is placed on the StartDate and EndDate, in case you end up with a lot of records and want to speed up your lookups. If you think there are not that many, then you can consider removing this.
  4. If the @saveToHistoryTable is set to 1, then all the records that will be affected are copied to a newly created table that has the exact same structure as the original table (without constraints and indexes), and will have the name "_historic" added to it.
  5. After all the records have been saved to the history table, a while loop that goes from @startDate to @endDate-1 will kick in and the processing will take place inside a transaction. Why? Because if anything goes wrong at any given time then it has a chance to rollback.
  6. Per day, the deletes will take place by batches specified by the parameter @batch, so use it and test it carefully.
  7. After all the records for a given day have been deleted, then the respective record is inserted in the Delete_Metrics table, to show how many seconds it took to delete X amount of records.
  8. As a bonus, I included a piece of code that constant displays (after each batch of records is deleted) the current size of the transaction log (in MB) of the database where the activity is taking place, along with the free space in the file (in MB).

With all this information, not only can you create a baseline of delete operations for daily records, but you also can know the impact this will have on the transaction log file.

Here is the entire script.

IF EXISTS (
   SELECT type_desc, type
    FROM sys.procedures WITH(NOLOCK)
    WHERE NAME = 'Batch_Delete'
            AND type = 'P'
)
DROP PROCEDURE Batch_Delete
GO
 
CREATE PROCEDURE Batch_Delete
@startDate          DATE,
@endDate            DATE,
@dbName             VARCHAR(64) = NULL,
@schemaName         VARCHAR(64) = NULL,
@tableName          VARCHAR(64) = NULL,
@dateFieldName      VARCHAR(64) = NULL,
@saveToHistoryTable BIT = 1,
@batch              INT = 1000
AS
  SET NOCOUNT ON
 
  DECLARE @tableExists BIT = 0
  DECLARE @fieldExists BIT = 0
  DECLARE @sqlCommand NVARCHAR(2048)
 
  IF(@startDate > @endDate OR @startDate = @endDate)
  BEGIN
   RAISERROR('startDate can''t be higher or equal than endDate!!!', 16, -1)
   RETURN
  END
 
  IF(@dbName IS NULL OR TRIM(@dbname) = '')
  BEGIN
   RAISERROR('You must specify the source database where the table is hosted!!!', 16, -1)
   RETURN
  END
 
  IF(@schemaName IS NULL OR TRIM(@schemaName) = '')
  BEGIN
   RAISERROR('You must specify the schema of the table!!!', 16, -1)
   RETURN
  END
 
  IF(@tableName IS NULL OR TRIM(@tableName) = '')
  BEGIN
   RAISERROR('You must specify the name of the table!!!', 16, -1)
   RETURN
  END
 
  IF(@dateFieldName IS NULL OR TRIM(@dateFieldName) = '')
  BEGIN
   RAISERROR('You must specify the name of the column that contains the dates for the lookups!!!', 16, -1)
   RETURN
  END
 
  DECLARE @e AS TABLE([objectID] BIGINT)
  SET @sqlCommand = '
  DECLARE @objectID BIGINT = 0
  SELECT @objectID = OBJECT_ID ('+CHAR(39)+'['[email protected]+'].['[email protected]+'].['[email protected]+']'+CHAR(39)+',''U'')
  SELECT ISNULL(@objectID,-1)
  '
 
  INSERT INTO @e
  EXEC sp_executesql @sqlCommand
  SET @tableExists = (SELECT CASE [objectID] WHEN -1 THEN 0 ELSE 1 END FROM @e)
  DELETE FROM @e
 
  IF(@tableExists <> 1)
  BEGIN
   RAISERROR('The specified table can''t be located, please check and try again!!!', 16, -1)
   RETURN
  END
 
  DECLARE @f AS TABLE([size] SMALLINT)
  SET @sqlCommand = '
  DECLARE @colSize SMALLINT = 0
  SELECT @colSize = COL_LENGTH ('+CHAR(39)+'['[email protected]+'].['[email protected]+'].['[email protected]+']'+CHAR(39)+','+CHAR(39)[email protected]+CHAR(39)+')
  SELECT ISNULL(@colSize,-1)
  '
  INSERT INTO @f
  EXEC sp_executesql @sqlCommand
  SET @fieldExists = (SELECT CASE [size] WHEN -1 THEN 0 ELSE 1 END FROM @f)
  DELETE FROM @f
 
  IF(@fieldExists = 0)
  BEGIN
   RAISERROR('The specified field can''t be located, please check and try again!!!', 16, -1)
   RETURN
  END
 
  IF(@saveToHistoryTable = 0)
  PRINT 'Be aware that you have invoked the execution of this SP with historical data transfer turned off!!!'
 
  -- Per Day logic
  DECLARE @currentDate DATE
  DECLARE @startTime   DATETIME
  DECLARE @endTime     DATETIME
  DECLARE @rows        INT
  DECLARE @totalRows   INT
  DECLARE @deletedRows INT
 
  SET @currentDate = @startDate
 
  SET @sqlCommand = '
  USE '+'['[email protected]+']
  '
  EXEC(@sqlCommand)
 
  IF OBJECT_ID ('Delete_Metrics','U') IS NULL
  BEGIN
    CREATE TABLE Delete_Metrics(
      StartDate      DATE NOT NULL,
      EndDate        DATE NOT NULL,
      Records        INT NOT NULL,
      CompletionTime INT NOT NULL
   )
    CREATE NONCLUSTERED INDEX IX_StartDate ON Delete_Metrics(StartDate)
    CREATE NONCLUSTERED INDEX IX_EndDate ON Delete_Metrics(EndDate)
  END
 
  IF(@saveToHistoryTable = 1)
  BEGIN
   DECLARE @h AS TABLE([rows] INT)
    SET @sqlCommand = '
    SET NOCOUNT ON
 
    IF OBJECT_ID ('+CHAR(39)+'['[email protected]+'].['[email protected]+'].['[email protected]+'_historic]'+CHAR(39)+',''U'') IS NULL
    BEGIN
        SELECT TOP 0 * INTO ['[email protected]+'].['[email protected]+'].['[email protected]+'_historic] FROM ['[email protected]+'].['[email protected]+'].['[email protected]+']  
    END
 
    INSERT INTO ['[email protected]+'].['[email protected]+'].['[email protected]+'_historic]
    SELECT * FROM ['[email protected]+'].['[email protected]+'].['[email protected]+'] WHERE ['[email protected]+'] >= '+CHAR(39)+CONVERT(VARCHAR(20),@startDate)+CHAR(39)+' AND ['[email protected]+'] < '+CHAR(39)+CONVERT(VARCHAR(20),@endDate)+CHAR(39)+'
   
   SELECT @@ROWCOUNT
   '
   INSERT INTO @h
    EXEC sp_executesql @sqlCommand
    SET @totalRows = (SELECT [rows] FROM @h)
    DELETE FROM @h
 
   IF(@totalRows > 0)
   RAISERROR ('#Finished transferring records to historic table#', 0, 1) WITH NOWAIT
  END
 
  WHILE(@currentDate < @endDate)
  BEGIN
   BEGIN TRANSACTION
      BEGIN TRY
            DECLARE @t AS TABLE([rows] INT)
            SET @sqlCommand = '
            DECLARE @tempTotalRows INT = 0
            SELECT @tempTotalRows = COUNT(*) FROM ['[email protected]+'].['[email protected]+'].['[email protected]+'] WHERE ['[email protected]+'] >= '+CHAR(39)+CONVERT(VARCHAR(20),@currentDate)+CHAR(39)+' AND ['[email protected]+'] < DATEADD(DAY,1,'+CHAR(39)+CONVERT(VARCHAR(20),@currentDate)+CHAR(39)+')
            SELECT @tempTotalRows
            '
            INSERT INTO @t
            EXEC sp_executesql @sqlCommand
            SET @totalRows = (SELECT [rows] FROM @t)
            DELETE FROM @t
 
         SET @deletedRows = 0
         SET @startTime = GETDATE()
         DECLARE @d AS TABLE([rows] INT)
 
         WHILE @deletedRows < @totalRows 
         BEGIN
             SET @sqlCommand = '            
            DELETE TOP ('+CONVERT(VARCHAR(16),@batch)+')
            FROM ['[email protected]+'].['[email protected]+'].['[email protected]+'] WHERE ['[email protected]+'] >= '+CHAR(39)+CONVERT(VARCHAR(20),@currentDate)+CHAR(39)+' AND ['[email protected]+'] < DATEADD(DAY,1,'+CHAR(39)+CONVERT(VARCHAR(20),@currentDate)+CHAR(39)+')
 
            SELECT @@ROWCOUNT
                '
 
                INSERT INTO @d
            EXEC sp_executesql @sqlCommand
            SET @deletedRows += (SELECT [rows] FROM @d)
            DELETE FROM @d
         
            SELECT l.total_size AS TotalSize,f.free_space AS FreeSpace
            FROM(
               SELECT CONVERT(DECIMAL(10,2),(total_log_size_in_bytes - used_log_space_in_bytes)/1024.0/1024.0) AS [free_space]  
               FROM sys.dm_db_log_space_usage
            )AS f,
            (
               SELECT CONVERT(DECIMAL(10,2),size*8.0/1024.0) AS [total_size]
               FROM sys.database_files
               WHERE type_desc = 'LOG'
            )AS l
 
         END
 
         IF(@deletedRows > 0)
         BEGIN
            DECLARE @stringDate VARCHAR(10) = CONVERT(VARCHAR(10),@currentDate)
            RAISERROR('Finished deleting records for date: %s',0,1,@stringDate) WITH NOWAIT
            INSERT INTO Delete_Metrics VALUES(@currentDate, DATEADD(DAY,1,@currentDate),@deletedRows,DATEDIFF(SECOND,@startTime,GETDATE()))
         END
 
         SET @currentDate = DATEADD(DAY,1,@currentDate)
 
         COMMIT TRANSACTION
      END TRY
      BEGIN CATCH
         ROLLBACK TRANSACTION
         SELECT ERROR_MESSAGE() AS ErrorMessage;
      END CATCH
  END

Running the Delete Stored Procedure

Here are some screenshots of what you can expect upon a successful execution:

A screenshot of a cell phone Description automatically generated
A screenshot of a cell phone Description automatically generated
A screenshot of a social media post Description automatically generated

Check Progress of Delete Processing

  • This T-SQL code can be used to constantly check the progress of the rows deleted.
  • You can see how much time each individual day took to get deleted and the total time the entire process has consumed so far. In the very last column, you can see a column named "Progress" which is the % of records deleted that are already copied to the historic table (make sure to put the correct table name for your case).
  • Keep in mind that this query will work in instances using SQL Server 2012 and above, due to the usage of the SUM(X) OVER(Y) function.

Here is the query you can use to get information about the delete process:

SELECT 
   StartDate,
   EndDate,
   Records,
   [Total Records] = SUM(Records) OVER (ORDER BY StartDate),
   CompletionTime,
   [Total Time] = SUM(CompletionTime) OVER (ORDER BY StartDate),
   CONVERT(DECIMAL(10,2),(SUM(Records) OVER (ORDER BY StartDate) / (SELECT CONVERT(DECIMAL(10,2),COUNT(*)) FROM big_table_historic) ) * 100) AS 'Progress'
FROM Delete_Metrics

Here’s a screenshot of what the result set looks:

A screenshot of a cell phone Description automatically generated
Next Steps
  • YoYou should consider that if you delete a large amount of records, the respective indexes and statistics may need to be rebuilt.
  • Depending on the volume of information, you might have to schedule an eventual shrink of the files to reclaim disk space.
  • If you need to address a problem using this approach, make sure to run tests in development and test to get an idea of how long the process takes and the possible impact.li>
  • FeFeel free to customize the stored procedure in any way that you like, as long as it is useful for your particular use case.
  • Download the T-SQL code for this tip.


Last Updated: 2020-03-17


get scripts

next tip button



About the author
MSSQLTips author Alejandro Cobar Alejandro Cobar is an MCP, SQL Server DBA and Developer with 10+ years of experience working in a variety of environments.

View all my tips





Comments For This Article




Wednesday, April 15, 2020 - 1:31:11 AM - World Teaches Back To Top

Nice post for SQL server


Wednesday, April 01, 2020 - 3:09:07 AM - Alexander Huberti Back To Top

I had some performance issues when I used DELETE TOP() statements in batches on huge tables. The main problem was increasing of logical reads over time. I prefer a solution like this: https://michaeljswart.com/2014/09/take-care-when-scripting-batches/


Tuesday, March 31, 2020 - 2:08:03 PM - Alejandro Cobar Back To Top

@Anne, you're most welcome!

@Broniz, thanks a lot for that insight. Yes, a v2 of what I did here can potentially address such cases where large tables are linked together.

@Jamie Ingram, yes it would also work well. For me it is very cool to have multiple ways to accomplish the same thing.

Take care and stay safe!!!


Tuesday, March 31, 2020 - 1:23:52 PM - Anne Back To Top

Thanks for the stored procedure.

It is very helpful


Tuesday, March 31, 2020 - 4:09:31 AM - Broniz Back To Top

Thank you !

This is very useful for isolated cases, or temporary tables when you want to keep minimal data retention. I especially like the ability to see the progress. However this technique can't work for critical systems where you get several large tables linked together. You need to segment the deletes to stay within a reasonable journal size (log file will explode otherwise), and make sure all records for the linked tables are processed at the same time. You may also need to drop the indexes that are not useful for better performances. We're talking here for tables with dozens of millions of records of course.


Wednesday, March 18, 2020 - 1:52:47 AM - Jamie Ingram Back To Top
This also works well:
 
delete z from TableA z
inner join (Select Top 500 * from TableA)d
	on z.ID = d.ID
go 100 


download


Recommended Reading

Rolling up multiple rows into a single row and column for SQL Server data

SQL Server Loop through Table Rows without Cursor

Using MERGE in SQL Server to insert, update and delete at the same time

Cursor in SQL Server

How to use @@ROWCOUNT in SQL Server





get free sql tips
agree to terms


Learn more about SQL Server tools