By: Ameena Lalani | Last Updated: 2017-11-20 | Comments (5) | Change Data Capture
In Part 1, we talked about how the Change Tracking feature of SQL Server works. Change Tracking only stores the last change made to the row. It does not keep the history of previous changes. Although it has limited usage, some application may only need this simple refresh functionality and does not require temporal table or CDC for data tracking purposes.
In this tip we are going to see how another SQL Server feature Change Data Capture works. In part 3, we will talk about the new Temporal Table feature and compare these 3 features side by side.
Change Data Capture (CDC) was introduced in SQL Server 2008. Like Change Tracking (CT), CDC also records DML activity changes in a table. Like CT, this feature needs to be enabled at the database level first and then at the table level. But unlike CT, CDC has a lot complex features out of the box. It has lot of moving pieces. Nevertheless, CDC is a great feature and it has its own use cases and Temporal Tables are not a replacement for CDC.
Once a table in a database is enabled for change data capture all changes to that table are tracked by storing changes in a change table. The change table will contain one record for every INSERT that can be used to identify column values for the inserted records. Each time a DELETE is performed the change table will contain one record for each DELETE that will show the values in each column prior to the DELETE. When an UPDATE is preformed, against a change data capture enabled table, two records will be created in the change table, one with the updated column values and one with the original column values. By using change data capture, you can track changes that have occurred over time to your table. This kind of functionality is useful for applications, like a data warehouse load process that need to identify changes, so they can correctly apply updates to track historical changes over time.
How the SQL Server Change Data Capture feature works?
Let’s create a database called DataCapture and a table called Customer. Then insert few rows into the Customer table. After that enable the CDC feature at the database level.
USE master GO CREATE DATABASE DataCapture GO USE DataCapture GO CREATE TABLE Customer ( CustomerId INT PRIMARY KEY ,FirstName VARCHAR(30) ,LastName VARCHAR(30) ,Amount_purchased DECIMAL ) GO INSERT INTO dbo.Customer( CustomerId, FirstName, LastName, Amount_Purchased) VALUES (1, 'Frank', 'Sinatra',20000.00),( 2,'Shawn', 'McGuire',30000.00),( 3,'Amy', 'Carlson',40000.00) GO SELECT * FROM dbo.Customer -- Now enable CDC at the Database Level EXEC sys.sp_cdc_enable_db GO
I received a pretty detailed error message when I tried to enable the CDC feature on database DataCapture.
CDC requires that database owner be a sysadmin. This is a quirk. By default the user creating the database will be the owner. So changing the owner to ‘sa’ resolves the above error. Or if you add the above user to the sysadmin role that will also work. Adding the user to sysadmin will also give that user a lot of permissions that you might not want to grant.
EXEC sp_changedbowner 'sa' GO EXEC sys.sp_cdc_enable_db GO
Now we will enable the CDC at the table level.
-- Enable on the table level EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'Customer', @role_name = NULL, @filegroup_name = N'Primary', @supports_net_changes = 0 GO
Enabling CDC at the table level is not as simple as at the database level. It is because all the CDC objects get created as system objects. There is also a dependency on the MSDB database and SQL Server Agent service. If we ran the above command successfully, it will return the following message:
Job 'cdc.DataCapture_capture' started successfully. Job 'cdc.DataCapture_cleanup' started successfully.
A DDL Trigger and a number of system procedures also gets created. CDC objects are all over the place in a database. If you drop the table by mistake, your history is lost. Unlike a temporal table, there is no safety mechanism built in to restrict dropping of a table if CDC is enabled.
Let’s make some changes in the table Customer.
-- insert a row INSERT INTO Customer (Customerid, FirstName, LastName, Amount_purchased) VALUES (4, 'Ameena', 'Lalani', 50000) GO -- delete a row DELETE FROM dbo.Customer WHERE CustomerId = 2 GO -- update a row UPDATE Customer SET Lastname = 'Clarkson' WHERE CustomerId = 3 GO -- Let us query to see what it reports SELECT * FROM dbo.Customer Declare @begin_lsn binary (10), @end_lsn binary (10) -- get the first LSN for customer changes select @begin_lsn = sys.fn_cdc_get_min_lsn('dbo_customer') -- get the last LSN for customer changes select @end_lsn = sys.fn_cdc_get_max_lsn() -- get individual changes in the range select * from cdc.fn_cdc_get_all_changes_dbo_customer(@begin_lsn, @end_lsn, 'all');
Notice customerId=2 which was deleted, now appears in the all changes function. CDC writes DML change information asynchronously. It first writes to the transaction log and then searches the transaction logs and stores the information with the beginning and ending Log Sequence Number (LSN). Unlike CT, CDC stores all columns and not just the Primary key columns. Operation 2 indicates an Insert, 1 is for delete and 4 is for update.
Let’s see how the information is stored once the rows that were captured above are changed.
-- Update the above row one more time UPDATE Customer SET Lastname = 'Blacksmith' WHERE CustomerId = 3 GO -- Let INSERT few more rows INSERT INTO Customer (Customerid, FirstName, LastName, Amount_purchased) VALUES (5, 'Sponge', 'Bob', 5000) GO INSERT INTO Customer (Customerid, FirstName, LastName, Amount_purchased) VALUES (6, 'Donald', 'Duck', 6000) GO -- Let us query to see what it reports now SELECT * FROM dbo.Customer Declare @begin_lsn binary (10), @end_lsn binary (10) -- get the first LSN for customer changes Select @begin_lsn = sys.fn_cdc_get_min_lsn('dbo_customer') -- get the last LSN for customer changes Select @end_lsn = sys.fn_cdc_get_max_lsn() -- get individual changes in the range Select * from cdc.fn_cdc_get_all_changes_dbo_customer(@begin_lsn, @end_lsn, 'all');
Temporal tables are more efficient in storing historical data as it ignores insert actions, we will look at this closer in the next tip. Since the current table does have the data and they are time bound, there is no need to keep redundant data in the history object. CDC keeps that redundant data as we see for customer ids 5 and 6 in the above example. CDC does not have any time dimension, it keep tracks of data changes based on LSN (log sequence number) and that is where SQL Server 2016 Temporal Table shines. Another tip goes into more detail on how to use CDC functions to retrieve point in time data, but the process is complicated compared to Temporal Tables.
CDC has its own place and Temporal Tables in SQL Server 2016 are not replacing them. CDC is good for maintaining slowly changing dimensions. CDC can be used for recording changes in Master Data in Master Data Management (MDM) by asynchronously recording the changes. With examples in this tip, we observed the pros and cons of the CDC feature. Syntax and overall implementation of CDC is lot more complex than Change Tracking and Temporal Tables. In the next tip we will cover the same example using a Temporal Table and will do the comparison of all 3 SQL Server data tracking features.
Last Updated: 2017-11-20
About the author
View all my tips