I am going to describe quite forgotten feature in SQL Server 2008. It is Change Data Capture (CDC) – powerful feature to track changes in database (tables) without touching table’s schema. There are two features in SQL Server 2008 which support similar thing and look same but they are not – Change Data Capture (CDC) and Change Tracking. I will be focused mainly to CDC in this post.
But what is the difference between CDC and Change Tracking?
It seems to me that CDC is similar to same feature on Oracle – it is asynchronous tracking of changes to user tables by harvesting transaction log changes and translating them user readable relational format. Change tracking does similar thing but is synchronous and provides you less information about changes to tables. It provides only information that table was changed but doesn’t provide row values for changes which were made. CDC is treated in SQL Server documentation as more robust feature than Change Tracking. Maybe this is the reason why CDC is supported only in Enteprise and Datacenter edition while Change Tracking is supported in all versions. CDC is also mentioned together with ETL process where it has probably most useful usage.
This was very basic description of difference between CDC and Change Tracking. Let’s look closer to CDC now.
I borrow following schema from MSDN, please read this paragraph to end before looking to schema. As I mentioned above, CDC provides asynchronous tracking of changes by looking to transaction Log and translating recorded changes for Source tables. Changes are gathered by special SQL Server Agent Job (Capture process on following picture). These changes are inserted to Change tables which can be accessed by special user-defined functions created for that purpose – they are called Change data capture query functions on following picture. Schema depicts situation when ETL process consumes these functions but this is only sample usage.
How to setup Change Data Tracking (CDC)
Setup is quite straightforward (more than querying changes). Here are steps to bring it alive:
- Enable CDC for database by calling sys.sp_cdc_enable_db procedure. It adds few CDC related system tables to current database and set IS_CDC_ENABLED column in sys.databases to 1.
- Enable CDC for particular table by calling sys.sp_cdc_enable_table. It creates SQL Server Agent Job for harvesting transaction log, Change tables and Change data capture query functions which enable you to query Change tables. It does few more things like creating “cdc” security schema in your database.
- Now any changes for enabled table are gathered by job and written to change tables.
How to query changes
Query changes is not as straightforward as is for Change Tracking feature. It basically let you deal with LSN (Log Sequence Number) directly. LSN is unique identifier for each record in transaction log.
It is recommended to use only Change data capture functions for querying changes – not Change tables directly. These functions are created dynamically for each CDC-enabled table (created in setup – step 2.). Let’s Assume that we have table called “Customer” in “dbo” schema. There are than two basic function to retrieve changes for that table:
- cdc.fn_cdc_get_all_changes_dbo_Customer. This table-valued function returns ALL changes to the Customer table, including previous and new values (depends on parameters). It takes range of LSN you want to search changes for and @row_filter parameter. You can pull LSN range from built-in functions sys .fn_cdc_get_min_lsn and sys .fn_cdc_get_max_lsn (see T-SQL below). Row filter parameter can have two values – “all” which returns all changes but for changes caused by UPDATEs it doesn’t include previous values, only new ones. “all update old” parameter value returns the same as “all” but for UPDATEs returns previous values as well.
- cdc .fn_cdc_get_net_changes_dbo_Customer. This table-valued function returns only takes three parameters as well but returns only final changes to each row. This function is created only if @supports_net_changes parameter is set to 1 while enabling CDC for table. I will not concentrate to this function, please refere to MSDN if you should need it.
See this short sample how to setup up CDC and query changed data:
-- create Customer table in current DB CREATE TABLE Customer (ID INT IDENTITY (1,1) PRIMARY KEY, Name VARCHAR(50)) -- Enable CDC for current DB EXEC sys.sp_cdc_enable_db -- Enable CDC for Customer table EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'Customer', @role_name='test', @supports_net_changes = 1 -- insert some records INSERT INTO Customer (Name) VALUES ('Microsoft') INSERT INTO Customer (Name) VALUES ('Oracle') INSERT INTO Customer (Name) VALUES ('Apple') !!! -- now SQL Server agent job named cdc.<database_name>_capture must be executed -- get min LSN for table customer DECLARE @min_lsn BINARY(10) = sys.fn_cdc_get_min_lsn ('dbo_Customer') -- get max LSN for current DB DECLARE @max_lsn BINARY(10) = sys.fn_cdc_get_max_lsn () -- select all changes made to Customer table SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Customer(@min_lsn, @max_lsn, 'all update old');
Result is following:
Not nice result, isn’t it? Result contains all columns from Customer table (ID and Name) and some more. _$operation column contains indication of INSERT (2), DELETE (1) or UPDATE with previous values (3), UPDATE for new values (4). _$update_mask is bit mask containing information which column was updated (if applicable). Please see full description here.
As you see, you must handle LSN directly. To track changed data in increments (probably most common scenario) you would have to store last max LSN value you’ve processed changes for and start another run with this value.
There are couple of helper fuctions which may help you with LSN:
- sys.fn_cdc_map_time_to_lsn – it map time milestones to LSN
- sys.fn_cdc_map_lsn_to_time – … do opposite
- sys .fn_cdc_increment_lsn – returns next LSN based upon your “stored” LSN. Great for tracking incremental changes.
That was basics of Change Data Capture (CDC). There’s really much much much more about it – if you’re seriously considering using this feature, read full description of CDC in MSDN.
Tags: built-in functions, cdc, Change Data Capture, query functions, SQL, SQL Server, sql server agent, t-sql, transaction log