SQL Server Temporal Tables vs Change Data Capture vs Change Tracking - part 2

By:   |   Comments (6)   |   Related: > Change Data Capture


Problem

In Part 1, we talked about how the Change Tracking feature of SQL Server works. Change Tracking only stores the last change made to the row. It does not keep the history of previous changes. Although it has limited usage, some application may only need this simple refresh functionality and does not require temporal table or CDC for data tracking purposes.

In this tip we are going to see how another SQL Server feature Change Data Capture works. In part 3, we will talk about the new Temporal Table feature and compare these 3 features side by side.

Solution

Change Data Capture (CDC) was introduced in SQL Server 2008. Like Change Tracking (CT), CDC also records DML activity changes in a table. Like CT, this feature needs to be enabled at the database level first and then at the table level. But unlike CT, CDC has a lot complex features out of the box. It has lot of moving pieces. Nevertheless, CDC is a great feature and it has its own use cases and Temporal Tables are not a replacement for CDC.

Once a table in a database is enabled for change data capture all changes to that table are tracked by storing changes in a change table. The change table will contain one record for every INSERT that can be used to identify column values for the inserted records. Each time a DELETE is performed the change table will contain one record for each DELETE that will show the values in each column prior to the DELETE. When an UPDATE is preformed, against a change data capture enabled table, two records will be created in the change table, one with the updated column values and one with the original column values. By using change data capture, you can track changes that have occurred over time to your table. This kind of functionality is useful for applications, like a data warehouse load process that need to identify changes, so they can correctly apply updates to track historical changes over time.

How the SQL Server Change Data Capture feature works?

Let’s create a database called DataCapture and a table called Customer. Then insert few rows into the Customer table. After that enable the CDC feature at the database level.

USE master
GO

CREATE DATABASE DataCapture
GO

USE DataCapture
GO

CREATE TABLE Customer (
CustomerId INT  PRIMARY KEY
,FirstName VARCHAR(30)
,LastName VARCHAR(30)
,Amount_purchased DECIMAL
)
GO
 
INSERT INTO dbo.Customer( CustomerId, FirstName,    LastName,    Amount_Purchased)
VALUES
(1, 'Frank', 'Sinatra',20000.00),( 2,'Shawn', 'McGuire',30000.00),( 3,'Amy', 'Carlson',40000.00)
GO
 
SELECT * FROM dbo.Customer
 
-- Now enable CDC at the Database Level
EXEC sys.sp_cdc_enable_db  
GO 

I received a pretty detailed error message when I tried to enable the CDC feature on database DataCapture.

Msg 22830, Level 16, State 1, Procedure sys.sp_cdc_enable_db_internal, Line 198 [Batch Start Line 31] Could not update the metadata that indicates database DataCapture is enabled for Change Data Capture. The failure occurred when executing the command 'SetCDCTracked(Value = 1)'. The error returned was 15404: 'Could not obtain information about Windows NT group/user 'NC\alalani', error code 0x54b.'. Use the action and error to determine the cause of the failure and resubmit the request.

CDC requires that database owner be a sysadmin. This is a quirk. By default the user creating the database will be the owner. So changing the owner to ‘sa’ resolves the above error. Or if you add the above user to the sysadmin role that will also work. Adding the user to sysadmin will also give that user a lot of permissions that you might not want to grant.

EXEC sp_changedbowner 'sa'
GO

EXEC sys.sp_cdc_enable_db  
GO

Now we will enable the CDC at the table level.

-- Enable on the table level
EXEC sys.sp_cdc_enable_table   
   @source_schema = N'dbo',
   @source_name   = N'Customer',
   @role_name     = NULL,
   @filegroup_name = N'Primary',
   @supports_net_changes = 0
GO

Enabling CDC at the table level is not as simple as at the database level. It is because all the CDC objects get created as system objects. There is also a dependency on the MSDB database and SQL Server Agent service. If we ran the above command successfully, it will return the following message:

Job 'cdc.DataCapture_capture' started successfully.
Job 'cdc.DataCapture_cleanup' started successfully.   

A DDL Trigger and a number of system procedures also gets created. CDC objects are all over the place in a database. If you drop the table by mistake, your history is lost. Unlike a temporal table, there is no safety mechanism built in to restrict dropping of a table if CDC is enabled.

SQL Server Change Data Capture System Tables.

Let’s make some changes in the table Customer.

-- insert a row
INSERT INTO Customer (Customerid, FirstName, LastName, Amount_purchased)
VALUES (4, 'Ameena', 'Lalani', 50000)
GO

-- delete a row
DELETE FROM dbo.Customer 
WHERE CustomerId = 2
GO
 
-- update a row
UPDATE Customer
SET Lastname = 'Clarkson' WHERE CustomerId = 3
GO
 
-- Let us query to see what it reports
SELECT * FROM dbo.Customer
 
Declare @begin_lsn binary (10), @end_lsn binary (10)

-- get the first LSN for customer changes
select @begin_lsn = sys.fn_cdc_get_min_lsn('dbo_customer')

-- get the last LSN for customer changes
select @end_lsn = sys.fn_cdc_get_max_lsn()

-- get individual changes in the range
select * from cdc.fn_cdc_get_all_changes_dbo_customer(@begin_lsn, @end_lsn, 'all');
Changes captured by SQL Server Change Data Capture

Notice customerId=2 which was deleted, now appears in the all changes function. CDC writes DML change information asynchronously. It first writes to the transaction log and then searches the transaction logs and stores the information with the beginning and ending Log Sequence Number (LSN). Unlike CT, CDC stores all columns and not just the Primary key columns. Operation 2 indicates an Insert, 1 is for delete and 4 is for update.

Let’s see how the information is stored once the rows that were captured above are changed.

-- Update the above row one more time
UPDATE Customer
SET Lastname = 'Blacksmith' WHERE CustomerId = 3
GO
 
-- Let INSERT few more rows
INSERT INTO Customer (Customerid, FirstName, LastName, Amount_purchased)
VALUES (5, 'Sponge', 'Bob', 5000)
GO
 
INSERT INTO Customer (Customerid, FirstName, LastName, Amount_purchased)
VALUES (6, 'Donald', 'Duck', 6000)
GO

-- Let us query to see what it reports now
SELECT * FROM dbo.Customer
 
Declare @begin_lsn binary (10), @end_lsn binary (10)

-- get the first LSN for customer changes
Select @begin_lsn = sys.fn_cdc_get_min_lsn('dbo_customer')

-- get the last LSN for customer changes
Select @end_lsn = sys.fn_cdc_get_max_lsn()

-- get individual changes in the range
Select * from cdc.fn_cdc_get_all_changes_dbo_customer(@begin_lsn, @end_lsn, 'all');
SQL Server Change Data Capture data collection

Temporal tables are more efficient in storing historical data as it ignores insert actions, we will look at this closer in the next tip. Since the current table does have the data and they are time bound, there is no need to keep redundant data in the history object. CDC keeps that redundant data as we see for customer ids 5 and 6 in the above example. CDC does not have any time dimension, it keep tracks of data changes based on LSN (log sequence number) and that is where SQL Server 2016 Temporal Table shines. Another tip goes into more detail on how to use CDC functions to retrieve point in time data, but the process is complicated compared to Temporal Tables.

Summary

CDC has its own place and Temporal Tables in SQL Server 2016 are not replacing them. CDC is good for maintaining slowly changing dimensions. CDC can be used for recording changes in Master Data in Master Data Management (MDM) by asynchronously recording the changes. With examples in this tip, we observed the pros and cons of the CDC feature. Syntax and overall implementation of CDC is lot more complex than Change Tracking and Temporal Tables. In the next tip we will cover the same example using a Temporal Table and will do the comparison of all 3 SQL Server data tracking features.

Next Steps
  • Check out this MSSQLTips here for CDC.
  • Read Part 1 and stay tuned for Part 3.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Ameena Lalani Ameena Lalani is a MCSA on SQL Server 2016. She is a SQL Server veteran and started her journey with SQL Server 2000. She has implemented numerous High Availability and Disaster Recovery solutions.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Thursday, March 5, 2020 - 2:31:42 PM - Tot Back To Top (84958)

"CDC does not have any time dimension, it keep tracks of data changes based on LSN (log sequence number) " - that's not true. [cdc].[lsn_time_mapping] table provides time start and time end data (tran_begin_time,tran_end_time) for each LSN. Besides that CDC allows to pull historical data without joining production tables to query for inserted rows.


Thursday, December 27, 2018 - 11:05:09 AM - Ameena Lalani Back To Top (78567)

 Shimmy,

I think you already found part 3.

https://www.mssqltips.com/sqlservertip/5144/sql-server-temporal-tables-vs-change-data-capture-vs-change-tracking--part-3/

:)


Wednesday, December 26, 2018 - 5:21:46 PM - Shimmy Back To Top (78562)

Ever planning on going for part 3?


Thursday, August 2, 2018 - 3:49:07 PM - Ameena Lalani Back To Top (76958)

 Hello William M,

You are correct that statement of mine was misleading. However, the link of another MSSQLTIps I provided in the article "Check out this MSSQLTips here for CDC" explains further on how to use CDC functions. I will try to make that update to the article if the platform allows me to.


Thursday, August 2, 2018 - 12:52:15 PM - William M Back To Top (76950)

"Therefore, you cannot look at the historical picture of data at certain point of time in the past to see what the trend was."

This is incorrect.  While the process is far less straightfoward than with temporal tables, SQL Server does include a means to let you get Point In Time data, net changes during a period, and a list of changed columns, among other things.  The key are the CDC Wrapper functions, as well as [sys].[fn_cdc_map_time_to_lsn] and [sys].[fn_cdc_map_lsn_to_time].

For example, say you created the following table and enabled CDC on it.

CREATE TABLE [dbo].[cdcTestTable](
[id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
[col1] [varchar](50) NULL,
[col2] [int] NULL,
[col3] [varchar](50) NULL
)

To get point in time data for this table you would create the following wrapper to get data as of an LSN.

CREATE FUNCTION [dbo].[tvf_cdc_point_in_lsn_dbo_cdcTestTable]
(
-- Add the parameters for the function here
@from_lsn binary(10)
)
RETURNS TABLE AS RETURN 
(
SELECT [id],[col1],[col2],[col3]
FROM [cdc].[fn_cdc_get_net_changes_dbo_cdcTestTable](@from_lsn, sys.fn_cdc_get_max_lsn(), 'all') AS PD
WHERE [__$operation] <> 2

UNION ALL
--Get Unchanged Items
SELECT [id],[col1],[col2],[col3]
FROM [dbo].[cdcTestTable] t WITH(NOLOCK)
WHERE NOT EXISTS (SELECT [id] FROM [cdc].[fn_cdc_get_all_changes_dbo_cdcTestTable] (@from_lsn,sys.fn_cdc_get_max_lsn(),'all') p WHERE t.[id] = p.[id])
)

You would then use [sys].[fn_cdc_map_time_to_lsn] to retrive the LSN associated with a Point In Time and pass that to your newly created function like so.

DECLARE @PointInTime datetime2(3) = '2016-08-08 13:32:40.843'
DECLARE @from_lsn binary(10) = [sys].[fn_cdc_map_time_to_lsn]('smallest greater than or equal', @PointInTime)
SELECT @PointInTime, * FROM [dbo].[tvf_cdc_point_in_lsn_dbo_cdcTestTable] (@from_lsn) ORDER BY id

Monday, November 20, 2017 - 1:56:34 AM - Raghavendra Dasara Back To Top (70003)

Good to know.Thanks for sharing the information. 

 















get free sql tips
agree to terms