Configure and Query Change Data Capture (CDC) data for Azure SQL Database
You might have used Change Data Capture (CDC) for an on-prem SQL Server database and noticed that CDC uses SQL Server Agent for recording insert, update, and delete activity that applies to a table. The Azure SQL Database does not support SQL Server Agent. Can we use change data capture for Azure SQL Database?
Let's find how the change data capture process works for Azure SQL Database.
This tutorial teaches about Change Data Capture (CDC), a feature that can track changes in the tables and rows in a database. CDC is generally available in SQL Server and Azure SQL Managed Instance, but is currently in the preview phase for Azure SQL Database.
What is Change Data Capture (CDC)?
Change Data Capture (CDC) records insert, delete and update activity on a database table using SQL Server Agent. It mirrors the source table with additional columns to record the changes performed on the database.
- It writes one record into the mirror table for each INSERT and DELETE statement
- It writes two records into the mirror table for each UPDATE statement. These two records contain data before and after the change.
As shown below, the change data capture uses the SQL Server transaction log as a change data source. The CDC process reads the transaction log asynchronously (in the background) to have minimal impact on the system during tracking and record the DML changes. It reads the transaction log for INSERTS, UPDATES, DELETES and adds required information in the tracked table's mirrored table.
The on-premises SQL instance or Azure Managed instance creates these SQL Server Agent jobs:
- The first job is for populating database change tables with the changed information.
- The second job performs the cleanup on the change table. It deletes the CDC table records older than the 3 days (retention period).
The CDC uses the transaction log reader from transactional replication. The benefit of the log reader is that it does not impact source database activity. Therefore, the log reader is also known as a capture process of CDC. If you have implemented transactional replication for your database, the CDC and transaction replication share the log reader agent. It helps SQL Server reduce log contention.
Change Data Capture with Azure SQL Database
The Azure SQL Databases uses a change data capture scheduler instead of the SQL Server Agent. The schedule invokes the stored procedure for periodic capture and cleanup of the CDC tables. The scheduler does not have a dependency and runs as a separate process. Azure allows users to run the procedures manually as well.
Note: The CDC process is in preview for Azure SQL Database.
To use the change data capture, we need to enable it at the Azure database. Firstly, it is preferred to check whether your database has CDC enabled or not.
SELECT name, CASE is_cdc_enabled WHEN 0 THEN 'CDC not enabled' WHEN 1 Then 'CDC enabled' ELSE 'Invalid value' END AS CDCstats FROM sys.databases WHERE name='MyTestDB'
As shown below, currently, CDC is not enabled for my Azure SQL Database named [MyTestDB].
After satisfying the following conditions, you run the stored procedure sys.sp_cdc_enable_db to enable CDC on Azure SQL Database.
- Use the db_owner account security context
- The Azure DB free, basic, or standard (S0, S1, S2) service tiers do not support CDC.
- Elastic database pools should have max eDTU>100 or max vCore>1.
Otherwise you get the following error message:
To satisfy the requirements, I configured my demo Azure database in the Standard S3: 100 DTUs.
Execute the stored procedure sp_cdc_enable_db and check the status using sys.databases. I have enabled change data capture for Azure SQL Database as shown below.
Note: The Change Data Capture process uses CDC schema and CDC user in a database to enable CDC. Therefore, if you have an existing schema or user named CDC, you cannot enable CDC until the schema or user is renamed or dropped.
Once we have enabled CDC at the database level, we can use the system stored procedure sys.sp_cdc_enable_table to configure the change tables. Azure does not allow secondary filegroups; therefore, we cannot specify a different filegroup other than primary.
The following script enables CDC for the [SalesLT].[Customer] table.
EXEC sys.sp_cdc_enable_table @source_schema = N'SalesLT', @source_name = N'Customer', @role_name = NULL, @filegroup_name = NULL, @supports_net_changes = 0 GO
To view the CDC system tables, expand the Azure database and System Tables. As shown below, it creates a few system tables and the change tracking table cdc.SalesLT_Customer_CT for our source table.
Let's review the purpose of these tables.
- CDC.captured_columns: It contains a list of captured columns.
- CDC.cdc_jobs: As stated earlier, it creates a capture and cleanup job with the default parameters once we enable CDC. The table CDC.cdc_jobs records these job details.
- CDC.change_tables: It is a repository that records all change tracking enables tables. For example, it lists the table [SalesLT].[Customer] for my demo database.
- CDC.ddl_history: It records the history of all DDL changes since we enable the CDC process.
- CDC.index_columns: The index_column table list the indexes associated with the table.
- CDC.lsn_time_mapping: The lsn_time_mapping table maps the LSN number with the CDC-enabled table that captures the DML changes on the source table.
If we expand the change tracking table, it has these additional columns.
As we can see in the image below.
View captured data using Change Data Capture in Azure SQL Database
I created an Employee table for the demonstration purposes, enabled it for CDC, performed INSERT, UPDATE and DELETE transactions, and viewed the change tracking data.
CREATE TABLE Employee ( ID int Identity(1,1) PRIMARY KEY, [FirstName] varchar(20), [LastName] varchar(20) ) EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'Employee', @role_name = NULL, @filegroup_name = NULL, @supports_net_changes = 0 GO
Let's insert a new record into the [Employee] table and view the change tracking table data.
INSERT INTO Employee values('Raj','Gupta') SELECT * FROM cdc.dbo_employee_ct
The inserts operation has value for __$operation column 2. You can see the inserted value in the table.
If you need to look at just inserted data in the table, you can use a WHERE clause for __$operation as 2.
You can view the deleted record in the change tracking table like the insert statement. The delete operation has value for __$operation as 1.
DELETE FROM Employee WHERE id=1 SELECT * FROM cdc.dbo_employee_ct WHERE __$operation=1
If we update an record in the source table, it creates two entries in the change tracking table. The delete operation is a combination of delete plus an insert transaction.
- __$operation value 3: delete activity (old value)
- __$operation value 4: New record (updated value)
UPDATE Employee SET [FirstName]='Manoj' WHERE ID=2 SELECT * FROM cdc.dbo_employee_ct WHERE __$operation in (3,4)
Use Table-valued function for viewing change data
You can use the table-valued function (TVFs) for accessing the change data. These functions are below:
- sys.fn_cdc_get_min_lsn() : It returns the smallest LSN that is associated with a capture instance validity interval.
- sys.fn_cdc_get_max_lsn(): It returns the largest LSN that is associated with a capture instance validity interval.
- sys.fn_cdc_map_time_to_lsn: It returns the log sequence number (LSN) value from the start_lsn column in the cdc.lsn_time_mapping system table for the specified time.
- sys.fn_cdc_map_lsn_to_time: It returns the date and time value from the tran_end_time column in the lsn_time_mapping system table for the specified log sequence number (LSN).
- sys.fn_cdc_increment_lsn: It returns the next log sequence number (LSN) in the sequence based upon the specified LSN.
- sys.fn_cdc_decrement_lsn: It returns the previous log sequence number (LSN) in the sequence-based upon the specified LSN.
The following query returns the inserted, deleted, and updated value in the [dbo].[Employee] table using the TVFs.
DECLARE @from_lsn binary (10), @to_lsn binary (10) SET @from_lsn = sys.fn_cdc_get_min_lsn('dbo_employee') SET @to_lsn = sys.fn_cdc_get_max_lsn() SELECT * FROM [cdc].[fn_cdc_get_all_changes_dbo_Employee](@from_lsn, @to_lsn, 'all') ORDER BY __$seqval
Disable Change data capture for a table
To disable change data capture on a table, we use the stored procedure sys.sp_cdc_disable_table.
It requires the following parameters.
- Source table schema
- Source table name
- Capture instance: It is the change tracking table name. It follows the format of the source schema_table.
We can query sys.tables to check whether the CDC tracks the table. The following query has three parts.
- First, it returns the list of tables tracked by the CDC.
- Disable CDC for the Customer table
- Verifies that CDC is not tracking table.
SELECT * from sys.tables WHERE is_tracked_by_cdc=1 GO EXEC sys.sp_cdc_disable_table @source_schema = N'SalesLT', @source_name = N'Customer', @capture_instance = N'SalesLT_Customer' GO SELECT * FROM[cdc].[change_tables]
Disable CDC for Azure SQL database
We can execute the stored procedure sys.sp_cdc_disable_db on the Azure SQL Database to disable the change data capture.
As shown below, after executing the stored procedure, we do not get any records from the sys.databases for CDC enabled database (is_cdc_enabled=1).
SELECT name FROM sys.databases WHERE is_cdc_enabled=1 EXEC sys.sp_cdc_disable_db GO SELECT name FROM sys.databases WHERE is_cdc_enabled=1
Change Data Capture functionality is a convenient tool as it allows us to track updates, inserts, and deletes to our tables efficiently. CDC ensures that we can keep track of the latest changes made to our tables and let's us query them to get the required data.
- Refer to existing SQL Server Azure Tips.
- Follow the Microsoft official documentation on Azure SQL Database.
About the author
View all my tips
Article Last Updated: 2022-05-02