source: http://www.MSSQLTips.com/tip.asp?id=1474 -- printed: 2/14/2016 3:04:15 AM
Using Change Data Capture (CDC) in SQL Server 2008Written By: Ray Barley -- 4/10/2008
In this tip we are going to gain an understanding of CDC by walking through a simple code sample to demonstrate how to:
Before we start reviewing the sample T-SQL code, let's discuss how CDC works at a high level. After performing some setup and configuration steps (which we will cover below), CDC will begin scanning the database transaction log for changes to certain tables that you specify, and will insert these changes into change tables. These change tables are created during the setup and configuration process. The setup and configuration process will also create table-valued functions which can be used to query for the changes. You use the table-valued functions in lieu of querying the underlying change tables directly. Based on this high level description, let's proceed to the demo.
The demo code below was only tested on the February, 2008 Community Technology Preview (CTP) of SQL Server 2008. Some of the function names and stored procedure names have changed from the earlier CTPs.
Setup and Configuration
CDC is a feature that must be enabled at the database level; it is disabled by default. To enable CDC you must be a member of the sysadmin fixed server role. You can enable CDC on any user database; you cannot enable it on system databases. Execute the following T-SQL script in the database of your choice to enable CDC:
The sys.sp_cdc_enable_db stored procedure will return 0 if successful and 1 if it fails. You can query whether CDC is enabled for any database by checking the new column is_cdc_enabled in the sys.databases table. You will see a value of 1 if CDC is enabled, a 0 otherwise.
The next step is to specify a table that you want to enable for CDC. Let's create a simple table called customer:
Execute the following system stored procedure to enable CDC for the customer table:
You must be a member of the db_owner fixed database role in order to execute the above system stored procedure and SQL Agent must be running. The sys.sp_cdc_enable_table system stored procedure has quite a few parameters; let's describe each one (only the first three parameters are required; the rest are optional and only the ones used are shown above):
There is a new column named is_tracked_by_cdc in sys.tables; you can query it to determine whether CDC is enabled for a table.
Enabling CDC at the database and table levels will create certain tables, jobs, stored procedures and functions in the CDC-enabled database. These objects will be created in a schema named cdc and a cdc user is also created. You will see a message that two SQL Agent jobs were created; e.g. cdc.sql2008demo_capture which scans the database transaction log to pickup changes to the tables that have CDC enabled, and cdc.sql2008demo_cleanup which purges the change tables periodically. The naming convention is cdc.[databasename]_task. We will discuss some of the schema objects created as we use them in the demo that follows. You can examine the schema objects created by running the following T-SQL script:
You can disable CDC on a particular table by executing the following T-SQL script:
You can disable CDC at the database level by executing the following T-SQL script:
Disabling CDC at the table and/or database level will drop the respective tables, jobs, stored procedures and functions that were created in the database when CDC was enabled.
Now that we have enabled CDC for a database and a table, let's proceed to a demo.
We will perform the following steps to demonstrate the CDC functionality:
Execute the following T-SQL script to perform some inserts, an update, and a delete on the customer table:
Now let's take a look at a query that will show us a record of the above changes:
To extract the changes for a table that has CDC enabled, you have to supply the relevant LSNs. An LSN is a log sequence number that uniquely identifies entries in the database transaction log. If this is the first time you are querying to extract changes, you can get the minimum LSN and the maximum LSN using the functions sys.fn_cdc_get_min_lsn() and sys.fn_cdc_get_max_lsn(). If you set @supports_net_changes = 1 when enabling CDC on the table, you can query for the net changes using cdc.fn_cdc_get_net_changes_dbo_customer(). This will group multiple changes to a row based on the primary key or unique index you specified when enabling CDC. You can always invoke cdc.fn_cdc_get_all_changes_dbo_customer() to retrieve every change to the table within the LSN range. The dbo_customer portion of the function name is the capture instance; this is the default - schema_tablename. Here's the output from the above query:
The first result set shows the net changes; the second result set shows the individual changes. Since there was an insert and a delete, the first result set doesn't show that row since it was added and deleted in the LSN range; i.e. it no longer exists when you group the changes. The __$operation column values are: 1 = delete, 2 = insert, 3 = update (values before update), 4 = update (values after update). To see the values before update you must pass 'all update old' to the cdc.fn_cdc_get_all_changes_dbo_customer() function. The __$update_mask column is a bit mask column that identifies the columns that changed. For __$operation = 1 or 2, all columns are indicated as changed. For __$operation = 3 or 4, the actual columns that changed are indicated. The columns are mapped to bits based on the column_ordinal; execute the stored procedure sys.sp_cdc_get_captured_columns passing the capture instance as a parameter to see the column_ordinal values; for example:
Let's extend this example to handle periodically extracting changed rows. We will add a new table to log the ending LSN and a new function to retrieve the ending LSN from the table. This will allow us to pick up just what changed since the last time we ran our ETL process.
We'll modify the code above adding a call to the get_last_customer_lsn() function and an insert or update to save the ending LSN:
After running the above query once, insert a row, then run the query again. Your output will look like this, showing only a single row that has changed: