Two T-SQL Alternatives For Tracking Changes
Two very popular tools we've used for tracking data, logging and configuration changes are triggers and change data tracking. However, we've discovered they don't work as consistently as we'd like and when we're troubleshooting both, they create additional overhead. We'd like to investigate alternatives for helping us track data history for tables involved in configuration and logging.
In many normal use cases, a combination of T-SQL triggers or CDC will be appropriate for tracking data changes. In some environments, it may be most cost effective to engineer a custom solution so that both performance and troubleshooting overhead do not carry too many costs. In this tip, we look at two other designs for tracking data changes that carry benefits and costs and where they may be useful.
Option 1 - Simulate Trigger Activity
One way to handle data changes is reverse engineering the trigger design. When a trigger fires to track values on a table that were entered, the insert, update and (or) delete operation are generally tracked along with values that are needed for tracking purposes. For an example, if a developer creates a trigger on a configuration table to track updates for a column, any updated record will be saved or recorded to another table. In this case, changes can occur and those changes are always tracked. In a reverse engineering design, the design would queue the needed change, such as a delete, insert, or update along with the values that need to be changed and then the change would be applied.
Let's look at a contrived example of this:
CREATE TABLE queueExample( queueID INT IDENTITY(1,1), ---- contrived for example purposes: queueAction VARCHAR(8), queueComplete BIT DEFAULT 0, mainID INT, mainKey VARCHAR(10), mainValue VARCHAR(10), queueDate DATETIME DEFAULT GETDATE() ) CREATE TABLE mainExample( mainID INT, mainKey VARCHAR(10), mainValue VARCHAR(10), mainDate DATETIME DEFAULT GETDATE() ) ---- Simple demo of this: INSERT INTO queueExample (queueAction,mainID,mainKey,mainValue) VALUES ('INSERT',1,'file','two.config') , ('UPDATE',1,'file','2.config') , ('INSERT',2,'file','4.config') , ('INSERT',3,'file','test.txt') , ('DELETE',3,'file','test.txt') ---- Example: running this five times each, viewing changes each time DECLARE @queueID INT, @queueAction VARCHAR(8), @mainID INT, @mainKey VARCHAR(10), @mainValue VARCHAR(10) SELECT @queueID = queueID , @queueAction = queueAction , @mainID = mainID , @mainKey = mainKey , @mainValue = mainValue FROM queueExample ---- Testing each at a time WHERE queueID = 1 IF @queueAction = 'INSERT' BEGIN INSERT INTO mainExample (mainID,mainKey,mainValue) VALUES (@mainID,@mainKey,@mainValue) UPDATE queueExample SET queueComplete = 1 WHERE queueID = @queueID END ELSE IF @queueAction = 'UPDATE' BEGIN UPDATE mainExample SET mainKey = @mainKey , mainValue = @mainValue WHERE mainID = @mainID UPDATE queueExample SET queueComplete = 1 WHERE queueID = @queueID END ELSE IF @queueAction = 'DELETE' BEGIN DELETE FROM mainExample WHERE mainID = @mainID UPDATE queueExample SET queueComplete = 1 WHERE queueID = @queueID END ---- View example: SELECT * FROM mainExample ---- View the example history of changes: SELECT queueAction , mainID , mainKey , mainValue FROM queueExample WHERE queueComplete = 1
While this may be appropriate and more controlled for some designs (and is very popular among ETL designs), it does carry some drawbacks:
- The queue table simply receives inserts, which carry instructions to insert, update or delete, but the insert must succeed and must be logged if it fails.
- Any change that the queue table should pass to the object table receiving the change must also succeed and notify the queue table it passed. Either of these transactions could fail - for an example, an insert to the object table may pass, but the queue table may not receive the notification that the insert passed. Provided that queue tables and their object tables are horizontally scaled appropriately, this problem can generally be minimized.
- For situations where we only want to track a few pieces information (like one or two columns) that was changed, added, or removed, this design may require a lot of duplicate data.
- Finally, if we never experience downtime, we may see conflicts if we try to archive queue records to reduce the size of queue tables.
Option 2 - Track Changes in Table with Bit Flag
In some cases, we may re-think using the delete operation on records. One way in which we can make changes to a table is to add a column to the table that activates the record (like a bit field where true equals "use" while false equals "ignore"). If the bit column is 1, then the record is used (or valid); if the column is 0, we keep the record in the table temporarily until it can be archived elsewhere (or we keep it for historic reasons in cases where it makes sense to do so). In this design, when our environment is live, we have two operations we have to design for: inserts to the table and updates to the bit field. When we have downtime, we can archive the bit records of false (0) to an archive table, reducing the main table size. In a few select cases I've seen triggers used for tracking configuration changes, this approach is a much faster design with fewer moving pieces.
Let's used the contrived example above this, but use this design instead while accomplishing the same outcome:
CREATE TABLE mainExample( IncrementingID INT IDENTITY(1,1), mainID INT, mainKey VARCHAR(10), mainValue VARCHAR(10), mainDate DATETIME DEFAULT GETDATE(), InUse BIT DEFAULT 1 ) INSERT INTO mainExample (mainID,mainKey,mainValue) VALUES (1,'file','two.config') , (2,'file','4.config') , (3,'file','test.txt') ---- We add a new value for an update to mainID 1 and set the old value to InUse = 0 INSERT INTO mainExample (mainID,mainKey,mainValue) VALUES (1,'file','2.config') UPDATE mainExample SET InUse = 0 WHERE IncrementingID = 1 ---- We no longer want to use mainID 3, and instead of remove, we set InUse = 0 UPDATE mainExample SET InUse = 0 WHERE IncrementingID = 3
In the above example, we see that records 1 and 3 are no longer used, though we have a record of their history. In addition, if we needed to revert a change, we have the full history of that change and would only need to update the two values involved - setting the old record to active while the new record would become inactive (the reverse case). In this example, both the update and delete operation on a record evolve into setting a record as inactive. This method is not without drawbacks however:
- Depending on what the table is used for, this could create interactions that could interfere with each other on the table level. It's one thing if this method is used for historically tracking configuration table changes that are applied on occasion, it's another thing if we're trying to keep track of thousands of table changes a second.
- If a record is updated, both the insert (new record) and update (inactive old record) must pass. In other words, a commit should not be issued until both of these are complete, so one failure means everything rolls back.
- Archiving the data is optional, and if we don't do it, the table may become bloated. If we have downtime for maintenance, archiving the inactive records would be a best practice.
- The first question we should ask is what are we using triggers and change data tracking for? These are great tools in generic situations; the more specific the use case, the more inappropriate these may be.
- If we use either of the above two methods, we should do so in a way that makes troubleshooting issues easy; triggers and change data tracking can add overhead on the troubleshooting side. Make the design clear about what broke, failed, etc.; this is where custom solutions can save companies and developers hours.
About the author
View all my tips