Archive SQL Server Data using the SQL OUTPUT Clause

By:   |   Updated: 2024-03-06   |   Comments   |   Related: More > Database Administration


Problem

Is there a more direct and better way to move or archive data than the traditional Insert and Delete method? If done correctly, the traditional method requires Transaction Control and Error checking to ensure that records are not deleted unless first copied. What if this could be done in one step?

Solution

This tip explores a better, more direct, and more optimized method for moving or archiving data in SQL. This method does not require transaction control as it consolidates the move and deletion of records to a single SQL statement using the OUTPUT clause.

This tip will show a data move technique using a Delete with an OUTPUT clause redirecting output directly into an archive table. This is done in one step, so transaction control wrapping an insert and delete is not needed to ensure integrity. Also, if you have massive amounts of data to move, this tip will show how this can be done in controlled batches.

Massive amounts of data can be many records or big data like images or XML. For this, I will use Microsoft's sample database, WideWorldImporters. It's [Warehouse].[VehicleTemperatures] table has approximately 600,000 records, which is significant enough for this demo. I will first create new tables to keep the integrity of the sample database, then review the distribution of the data to determine how to batch the records to a controlled move. Finally, I will show the traditional archive methods and a few examples using the OUTPUT clause method.

Setup SQL Server Archiving Example

Using the WideWorldImporters database, I will create a copy of the table [Warehouse].[VehicleTemperatures] into a new schema "mssqltips" and table I will name [mssqltips].[VehicleTemperatures] as my source table. Then I will create the empty Archive table [mssqltips].[VehicleTemperaturesArchive]. The table [Warehouse].[VehicleTemperatures] has an Identity property on the Primary Key column VehicleTemperatureID. This Identity property will be on all the tables in this example and will be handled appropriately in the code snippets. I use the simple Select Into method for creating both tables under the new Schema "mssqltips".

Note: The tables in this example have a Primary Key column with an Identity property, so you will see the additional code to handle the Identity inserts throughout the code snippets. It is important to maintain those values in the archive tables. Exclude the Identity Insert lines of code if your tables do not have an identity property.

 --1) How many rows are we starting out with?
SELECT COUNT(*) FROM [Warehouse].[VehicleTemperatures]; --598286
 
--2) Let's Make a copy of this table 
CREATE SCHEMA [mssqltips];
GO
Select *
INTO [mssqltips].[VehicleTemperatures]
FROM [Warehouse].[VehicleTemperatures]
GO
 
--3) Let's Create the empty Archive table that we will be loading into. 
Select top 0 *
INTO [mssqltips].[VehicleTemperaturesArchive]
FROM [Warehouse].[VehicleTemperatures]
 
--Check Row counts:
select count(*) from [mssqltips].[VehicleTemperatures]; --598286 rows
select count(*) from [mssqltips].[VehicleTemperaturesArchive];  --0 rows
GO

The counts confirm that the tables are created and show the initial row counts of each.

Cleanup to Re-Run

Here is the cleanup script to re-run the setup to re-test later.

--CleanUp: to re-run the setup to re-test.
DROP TABLE IF EXISTS [mssqltips].[VehicleTemperatures]; --make this re-runnable
DROP TABLE IF EXISTS [mssqltips].[VehicleTemperaturesArchive]; --make this re-runnable
DROP SCHEMA IF EXISTS [mssqltips]; --make this re-runnable
GO

Review Data to Determine Batch Size

The [Warehouse].[VehicleTemperatures] table in this example is approximately 600,000 rows. I can easily handle this in a single operation in a reasonably short time. However, for this demo, I will do it in smaller chunks to demonstrate the technique. If you run into a data move situation that runs for a long period of time, then this example will show one way to perform data moves in controlled batches. To archive a very large table, you will likely do it in batches rather than all at once in a single transaction.

To help determine potential batch sizes, you will need to learn the distribution of your data. Typically, you will want to pick a batch size based on a time interval. This works if you have a date column in the table indicating when the records were created. If you do not have a date column in your table, you could use the total row count and divide that based on a reasonable batch size. This will help determine how many batches it will take to complete the archive.

In this example, I have settled on a batch size based on the number of weekly records inserted. Run the query below to see the record counts per week. The number of records inserted weekly in the sample database is approximately 3000, which I will use for the archive batch size. Generally, the batch size should be such that the process can run in relatively quick chunks. For very large tables, some advantages to moving or deleting data in reasonable batch sizes are minimizing database contention, keeping the transaction log size backups in check, and avoiding long-running transactions.

-- Review the distribution of the data over time.
-- We will review this result to help determine how we are to control the Archive/Deletes.
SELECT dateadd(week,datediff(week,0,[RecordedWhen]),0)
      ,COUNT(*)
FROM [mssqltips].[VehicleTemperatures]
GROUP BY dateadd(week,datediff(week,0,[RecordedWhen]),0)
ORDER BY 1
GO

This query returns 196 rows. Using a batch size of 3000 rows, I will iterate through approximately 196 batches.

Traditional SQL Server Archive – Insert and Delete with Transaction Control

The traditional method for moving data includes 1) copying the data to a new table and 2) deleting the copied rows. This should be done in a controlled transaction with error handling to ensure both steps succeed before committing the transaction.

--I. Traditional Old way with Insert first then Delete and Transaction Control
DECLARE @rowcount int = 3000;
 
SET IDENTITY_INSERT [mssqltips].[VehicleTemperaturesArchive] ON;
BEGIN TRANSACTION
INSERT INTO [mssqltips].[VehicleTemperaturesArchive] ([VehicleTemperatureID],[VehicleRegistration],[ChillerSensorNumber],[RecordedWhen],[Temperature],[FullSensorData],[IsCompressed],[CompressedSensorData])
SELECT top (@Rowcount) [VehicleTemperatureID],[VehicleRegistration],[ChillerSensorNumber],[RecordedWhen],[Temperature],[FullSensorData],[IsCompressed],[CompressedSensorData] 
FROM [mssqltips].[VehicleTemperatures]
WHERE [RecordedWhen] < '2019-01-01 00:00:00.000'
ORDER BY [RecordedWhen] DESC; 
--Note: top (@Rowcount) limits the numbers of rows deleted;  the 'Order by' is your preference if you want them delete in order.
 
IF @@error = 0
   DELETE [mssqltips].[VehicleTemperatures]
   FROM [mssqltips].[VehicleTemperatures] V
   WHERE EXISTS (SELECT * FROM [mssqltips].[VehicleTemperaturesArchive] A 
      WHERE A.VehicleTemperatureID = V.VehicleTemperatureID)
   AND V.[RecordedWhen] < '2019-01-01 00:00:00.000';
-- The delete joins back to the Archive Table to Ensure the Records exist before deleting them from the Source table.
 
IF @@ERROR = 0
  COMMIT TRANSACTION
ELSE
  ROLLBACK
GO
SET IDENTITY_INSERT [mssqltips].[VehicleTemperaturesArchive] OFF;
 
--Show record count of data that moved.
select count(*) from [mssqltips].[VehicleTemperatures];
select count(*) from [mssqltips].[VehicleTemperaturesArchive];
GO

Note: The runtimes for the Insert & Delete method for moving data:

  • Insert elapsed time: 2110 ms
  • Delete elapsed time: 389 ms
  • Total: 2499 ms

Move Data with SQL Server OUTPUT Clause

This example introduces the data move with the OUTPUT clause. Here, we do not include a sort option to control which records get deleted first. However, with less overhead, this option is more efficient than the Traditional Insert and Delete method. This code may be added to a stored procedure and scheduled in a SQL Agent Job for ongoing archive maintenance or if you have many deletes to do and want to run after peak hours.

If your destination Archive table does not have a clustered key, since it may cause page splits on load and slow the inserts down, you may consider creating the Archive table as a Heap and Adding a cluster key later or skipping to the next section.

--5) New way with OUTPUT clause
--Here top x just controls the number of rows and not the order by.
DECLARE @rowcount int = 3000;
SET IDENTITY_INSERT [mssqltips].[VehicleTemperaturesArchive] ON;
 
DELETE TOP (@rowcount) [mssqltips].[VehicleTemperatures]
OUTPUT DELETED.[VehicleTemperatureID], DELETED.[VehicleRegistration],DELETED.[ChillerSensorNumber],DELETED.[RecordedWhen],DELETED.[Temperature], DELETED.[FullSensorData],DELETED.[IsCompressed],DELETED.[CompressedSensorData]
INTO [mssqltips].[VehicleTemperaturesArchive] ([VehicleTemperatureID],[VehicleRegistration],[ChillerSensorNumber],[RecordedWhen],[Temperature],[FullSensorData],[IsCompressed],[CompressedSensorData])
WHERE RecordedWhen < '2019-01-01 00:00:00.000'
 
IF @@error <> 0 
   THROW 911000,'ERROR Archiving data!',1;
 
SET IDENTITY_INSERT [mssqltips].[VehicleTemperaturesArchive] OFF;
GO
select count(*) from [mssqltips].[VehicleTemperatures];
select count(*) from [mssqltips].[VehicleTemperaturesArchive];
GO

Note the elapsed time for moving: with the OUTPUT clause is 122 ms, compared to the 2499 ms for the Traditional Insert and Delete.

Move SQL Server Data with OUTPUT Clause and Control

This example is similar to the previous method, but we add Control to process the older records first. Though the older records are archived first, the sequence in which the records are inserted is not expected to be sequential. The database table ([Warehouse].[VehicleTemperatures]) that I based this example on has a non-cluster primary key. You may want to test additional scenarios with tables that have clustered keys. This method is still more efficient than the Traditional Insert and Delete method, but the added control does add a little more overhead.

 --6) Top with Order by to Archive Oldest Records first
SET NOCOUNT OFF;
Declare @rowcount int = 3000 
--Set Rowcount @rowcount;
 
SET IDENTITY_INSERT [mssqltips].[VehicleTemperaturesArchive] ON;
 
DELETE [mssqltips].[VehicleTemperatures]
OUTPUT DELETED.[VehicleTemperatureID], DELETED.[VehicleRegistration], DELETED.[ChillerSensorNumber], DELETED.[RecordedWhen], DELETED.[Temperature], DELETED.[FullSensorData], DELETED.[IsCompressed], DELETED.[CompressedSensorData]
INTO [mssqltips].[VehicleTemperaturesArchive] ([VehicleTemperatureID],[VehicleRegistration],[ChillerSensorNumber], [RecordedWhen], [Temperature], [FullSensorData], [IsCompressed], [CompressedSensorData])
WHERE VehicleTemperatureID IN 
      (SELECT TOP (@rowcount) VehicleTemperatureID FROM [mssqltips].[VehicleTemperatures] 
      WHERE RecordedWhen < '2019-01-01 00:00:00.000'
      ORDER BY VehicleTemperatureID );
 
SET IDENTITY_INSERT [mssqltips].[VehicleTemperaturesArchive] OFF;
go
--Run to Check Progress:
select count(*) from [mssqltips].[VehicleTemperatures];
select count(*) from [mssqltips].[VehicleTemperaturesArchive];
GO

The additional code to move the oldest records first results in 774 ms, which is much faster than 2499 ms for the Traditional Insert and Delete.

Move SQL Server Data with OUTPUT Clause Ad Hoc Loop

In this example, I show how to run the code easily in a loop, moving 5000 rows at a time. This might be used if you have a reasonable amount of data to archive and require a one-time cleanup.

--7) Run a manual Ad hoc controlled Delete with SET ROWCOUNT
SET NOCOUNT ON;
Declare @rowcount int = 5000 
SET ROWCOUNT @rowcount;
SET IDENTITY_INSERT [mssqltips].[VehicleTemperaturesArchive] ON;
GO
 
DELETE [mssqltips].[VehicleTemperatures]
OUTPUT DELETED.[VehicleTemperatureID], DELETED.[VehicleRegistration],DELETED.[ChillerSensorNumber],DELETED.[RecordedWhen],DELETED.[Temperature], DELETED.[FullSensorData],DELETED.[IsCompressed],DELETED.[CompressedSensorData]
INTO [mssqltips].[VehicleTemperaturesArchive] ([VehicleTemperatureID],[VehicleRegistration],[ChillerSensorNumber],[RecordedWhen],[Temperature],[FullSensorData],[IsCompressed],[CompressedSensorData])
WHERE RecordedWhen < '2019-01-01 00:00:00.000';
 
--Optionally add a Checkpoint if the DB is in Simple recovery 
checkpoint
GO 100  --The loop will run 100 times
 
SET IDENTITY_INSERT [mssqltips].[VehicleTemperaturesArchive] OFF;
SET ROWCOUNT 0;  --reset the Row count to unlimited
GO

The 100 loops in this test ran in approximately 2.5 minutes. In this scenario, do the math to determine how many loops to run! (Total Records / Batch Size = Number of Loops)

Conclusion

This tip shows more direct ways to move data using the OUTPUT clause. This will even work across databases. This example is a relatively small table, as I could have archived this table in a single batch. However, in real life, I have seen archive processes run for many hours against production systems that could have benefited by implementing some control over the batch size. Typical production scenarios involve tables that are perpetually growing and need to be trimmed for performance or archived for retention reasons. Also, you may want to set up an archive job to automate your archive process to keep source tables trimmed to target table sizes. Other methods may include moving all the records to a new table first and then doing controlled deletes. Also, if you must delete the data and not move a copy to another table, you can still do it in controlled batches, as demonstrated in this tip.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Jim Evans Jim Evans is an IT Manager currently for Crowe who has managed DBA, Developer, BI and Data Management teams for over 20 years.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2024-03-06

Comments For This Article

















get free sql tips
agree to terms