SQL Temporal Tables vs Change Data Capture vs Change Tracking

Problem

In Part 1, we talked about how the Change Tracking feature of SQL Server works. Change Tracking only stores the last change made to the row. It does not keep the history of previous changes. Although it has limited usage, some application may only need this simple refresh functionality and does not require temporal table or CDC for data tracking purposes.

In this tip we are going to see how another SQL Server feature Change Data Capture works. In part 3, we will talk about the new Temporal Table feature and compare these 3 features side by side.

Solution

Change Data Capture (CDC) was introduced in SQL Server 2008. Like Change Tracking (CT), CDC also records DML activity changes in a table. Like CT, this feature needs to be enabled at the database level first and then at the table level. But unlike CT, CDC has a lot complex features out of the box. It has lot of moving pieces. Nevertheless, CDC is a great feature and it has its own use cases and Temporal Tables are not a replacement for CDC.

Once a table in a database is enabled for change data capture all changes to that table are tracked by storing changes in a change table. The change table will contain one record for every INSERT that can be used to identify column values for the inserted records. Each time a DELETE is performed the change table will contain one record for each DELETE that will show the values in each column prior to the DELETE. When an UPDATE is preformed, against a change data capture enabled table, two records will be created in the change table, one with the updated column values and one with the original column values. By using change data capture, you can track changes that have occurred over time to your table. This kind of functionality is useful for applications, like a data warehouse load process that need to identify changes, so they can correctly apply updates to track historical changes over time.

How the SQL Server Change Data Capture feature works?

Let’s create a database called DataCapture and a table called Customer. Then insert few rows into the Customer table. After that enable the CDC feature at the database level.

 USE master
GO
CREATE DATABASE DataCapture
GO
USE DataCapture
GO
CREATE TABLE Customer (
CustomerId INT  PRIMARY KEY
,FirstName VARCHAR(30)
,LastName VARCHAR(30)
,Amount_purchased DECIMAL
)
GO
 
INSERT INTO dbo.Customer( CustomerId, FirstName,    LastName,    Amount_Purchased)
VALUES
(1, 'Frank', 'Sinatra',20000.00),( 2,'Shawn', 'McGuire',30000.00),( 3,'Amy', 'Carlson',40000.00)
GO
 
SELECT * FROM dbo.Customer
 
-- Now enable CDC at the Database Level
EXEC sys.sp_cdc_enable_db  
GO

I received a pretty detailed error message when I tried to enable the CDC feature on database DataCapture.

Msg 22830, Level 16, State 1, Procedure sys.sp_cdc_enable_db_internal, Line
198 [Batch Start Line 31] Could not update the metadata that indicates database
DataCapture is enabled for Change Data Capture. The failure occurred when executing
the command ‘SetCDCTracked(Value = 1)’. The error returned was 15404: ‘Could
not obtain information about Windows NT group/user ‘NC\alalani’, error
code 0x54b.’. Use the action and error to determine the cause of the failure
and resubmit the request.

CDC requires that database owner be a sysadmin. This is a quirk. By default the user creating the database will be the owner. So changing the owner to ‘sa’ resolves the above error. Or if you add the above user to the sysadmin role that will also work. Adding the user to sysadmin will also give that user a lot of permissions that you might not want to grant.

 EXEC sp_changedbowner 'sa'
GO
EXEC sys.sp_cdc_enable_db  
GO

Now we will enable the CDC at the table level.

 -- Enable on the table level
EXEC sys.sp_cdc_enable_table   
   @source_schema = N'dbo',
   @source_name   = N'Customer',
   @role_name     = NULL,
   @filegroup_name = N'Primary',
   @supports_net_changes = 0
GO

Enabling CDC at the table level is not as simple as at the database level. It is because all the CDC objects get created as system objects. There is also a dependency on the MSDB database and SQL Server Agent service. If we ran the above command successfully, it will return the following message:

Job 'cdc.DataCapture_capture' started successfully.
Job 'cdc.DataCapture_cleanup' started successfully.

A DDL Trigger and a number of system procedures also gets created. CDC objects are all over the place in a database. If you drop the table by mistake, your history is lost. Unlike a temporal table, there is no safety mechanism built in to restrict dropping of a table if CDC is enabled.

SQL Server Change Data Capture System Tables.

Let’s make some changes in the table Customer.

 -- insert a row
INSERT INTO Customer (Customerid, FirstName, LastName, Amount_purchased)
VALUES (4, 'Ameena', 'Lalani', 50000)
GO
-- delete a row
DELETE FROM dbo.Customer 
WHERE CustomerId = 2
GO
 
-- update a row
UPDATE Customer
SET Lastname = 'Clarkson' WHERE CustomerId = 3
GO
 
-- Let us query to see what it reports
SELECT * FROM dbo.Customer
 
Declare @begin_lsn binary (10), @end_lsn binary (10)
-- get the first LSN for customer changes
select @begin_lsn = sys.fn_cdc_get_min_lsn('dbo_customer')
-- get the last LSN for customer changes
select @end_lsn = sys.fn_cdc_get_max_lsn()
-- get individual changes in the range
select * from cdc.fn_cdc_get_all_changes_dbo_customer(@begin_lsn, @end_lsn, 'all');

Changes captured by SQL Server Change Data Capture

Notice customerId=2 which was deleted, now appears in the all changes function. CDC writes DML change information asynchronously. It first writes to the transaction log and then searches the transaction logs and stores the information with the beginning and ending Log Sequence Number (LSN). Unlike CT, CDC stores all columns and not just the Primary key columns. Operation 2 indicates an Insert, 1 is for delete and 4 is for update.

Let’s see how the information is stored once the rows that were captured above are changed.

 -- Update the above row one more time
UPDATE Customer
SET Lastname = 'Blacksmith' WHERE CustomerId = 3
GO
 
-- Let INSERT few more rows
INSERT INTO Customer (Customerid, FirstName, LastName, Amount_purchased)
VALUES (5, 'Sponge', 'Bob', 5000)
GO
 
INSERT INTO Customer (Customerid, FirstName, LastName, Amount_purchased)
VALUES (6, 'Donald', 'Duck', 6000)
GO
-- Let us query to see what it reports now
SELECT * FROM dbo.Customer
 
Declare @begin_lsn binary (10), @end_lsn binary (10)
-- get the first LSN for customer changes
Select @begin_lsn = sys.fn_cdc_get_min_lsn('dbo_customer')
-- get the last LSN for customer changes
Select @end_lsn = sys.fn_cdc_get_max_lsn()
-- get individual changes in the range
Select * from cdc.fn_cdc_get_all_changes_dbo_customer(@begin_lsn, @end_lsn, 'all');

SQL Server Change Data Capture data collection

Temporal tables are more efficient in storing historical data as it ignores insert actions, we will look at this closer in the next tip. Since the current table does have the data and they are time bound, there is no need to keep redundant data in the history object. CDC keeps that redundant data as we see for customer ids 5 and 6 in the above example. CDC does not have any time dimension, it keep tracks of data changes based on LSN (log sequence number) and that is where SQL Server 2016 Temporal Table shines. Another tip goes into more detail on how to use CDC functions to retrieve point in time data, but the process is complicated compared to Temporal Tables.

Summary

CDC has its own place and Temporal Tables in SQL Server 2016 are not replacing them. CDC is good for maintaining slowly changing dimensions. CDC can be used for recording changes in Master Data in Master Data Management (MDM) by asynchronously recording the changes. With examples in this tip, we observed the pros and cons of the CDC feature. Syntax and overall implementation of CDC is lot more complex than Change Tracking and Temporal Tables. In the next tip we will cover the same example using a Temporal Table and will do the comparison of all 3 SQL Server data tracking features.

Next Steps

Check out this MSSQLTips here for CDC.
Read Part 1 and stay tuned for Part 3.

Ameena Lalani

Ameena Lalani is a SQL Server veteran and started her journey with SQL Server 2000. She is a Microsoft Certified Solution Associate on SQL Server 2016. She is a Chapter leader of the Chicago SQL Server User Group. She has implemented numerous High Availability and Disaster Recovery solutions at various companies. Ameena loves to share her knowledge about SQL Server and learn from others who like to share their knowledge. She really believes that no one is smarter than all of us combined. She volunteers and speaks at SQL Saturday events throughout the United States.

SQL Temporal Tables vs Change Data Capture vs Change Tracking – part 2

How the SQL Server Change Data Capture feature works?

Summary

Leave a ReplyCancel Reply