Learn more about SQL Server tools

   
   






















































Connect with MSSQLTips




MSSQLTips Facebook Page MSSQLTips LinkedIn Page MSSQLTips RSS Feed MSSQLTips Pinterest Page MSSQLTips Twitter Page MSSQLTips Google+ Page

Designing Tables for Audit Data in SQL Server

MSSQLTips author Jeremy Kadlec By:   |   Read Comments (5)   |   Related Tips: More > Auditing and Compliance

Problem
My company is just starting to look at adding functionality to retain historical data for key tables and columns for auditing purposes in many of our SQL Server databases.  I have seen some of your recent tips related to triggers (Forcing Trigger Firing Order in SQL Server and Trigger Alternatives in SQL Server - OUTPUT Clause).  Based on using triggers or a similar technology, what is the best way to store the historical data?  What options are available to store the data?  Are there really a variety of options?  Should the historical data be in the same database as the original data?  Does it really make a difference?

Solution
Being in the position to start to capture historical data is a good one.  This gives you the opportunity to build a solution that can be used across various applications in the enterprise.  As such, this is a big undertaking so an appropriate amount of time should be spent planning, building, testing and implementing the solution.  If you agree with that premise, then this need should be considered a project and a formal process should be followed if the decisions you make have a wide scope across your applications and business processes.

Requirements

If you agree that capturing auditing data is a project, then it is appropriate to start with defining the requirements for the project.  Consider gathering the following information as a baseline set of requirements:

  • Find out what data needs to be audited
  • Find out why the data is going to be audited
  • Find out how the data is going to be used
    • Reporting and\or Business Intelligence
    • Historical record of changes for customer service purposes
    • Legal or regulatory requirements
  • Find out which databases and applications need to be audited
  • Find out what type of meta data needs to be stored with the audited data
    • Start date
    • End date
    • Start date and end date
    • Date the record was captured
    • User making the change
    • Corresponding application, database, SQL Server instance, etc.
  • Reporting needs
    • Line of business reporting
    • Monthly reporting
    • Business process reporting
    • Reporting frequency
    • Reporting users
    • Line of business application access to the auditing data
  • Data collection
    • Determine if triggers, the OUTPUT clause, stored procedure logic, application logic, Service Broker or some combination of these approaches will be responsible for writing the data to the audit tables
  • Data consistency requirements
    • Based on how the data is going to be used determine if all of the audit data should\could be stored in the same format
  • Audit data location
    • Stored with the base data i.e. same database or in a separate file group on a different disk sub system
    • Stored in a specific database on the instance i.e. separate database
    • Stored in a centralized SQL Server database i.e. remote instance

Design Options

As indicated above, a couple of key technical design decisions relate to how the data is written to the auditing tables, where the auditing data is stored and if a consistent approach can be used to store the data for reporting purposes.  With different answers to these questions leading to a different solution, let's outline a few different sample ideas to see if any of these could resolve your issue:

Base Table Copy

This approach could be considered the traditional auditing technique where an auditing table is created for each base table that needs to be audited.  The design from the base table to the audit table are very similar, but the audit table has some additional columns to support managing the historical data (i.e. AuditID to serve as the identity, ModifiedBy to serve as a means to capture who made the change, etc.).  Let's use a sample AdventureWorks table ([HumanResources].[EmployeePayHistory]) in the SQL Server 2005 database as an example:
 
USE [AdventureWorks]
GO

CREATE TABLE [HumanResources].[EmployeePayHistory_Audit](
[AuditID] [int] IDENTITY (1,1) NOT NULL,
[EmployeeID] [int] NOT NULL,
[RateChangeDate] [datetime] NOT NULL,
[Rate] [money] NOT NULL,
[PayFrequency] [tinyint] NOT NULL,
[ModifiedDate] [datetime] NOT NULL CONSTRAINT [DF_EmployeePayHistory_Audit_ModifiedDate] DEFAULT (getdate()),
[ModifiedBy] [varchar](100) NOT NULL CONSTRAINT [DF_EmployeePayHistory_Audit_ModifiedBy] DEFAULT (SUSER_SNAME())
CONSTRAINT [PK_EmployeePayHistory_Audit_AuditID] PRIMARY KEY CLUSTERED
(
[AuditID] ASC
) WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
 

Generic Design with Lookup Table

Another approach to auditing is with a generic table to store the data needed in conjunction with a lookup table to identify the original column for the audited data.  The value in this design is the same table design can be used for each base table, the lookup table can be extended to support new columns and reporting is consistent. In addition, this design can be further extended to support a single database, an instance, an application or an entire environment with a few extra columns and the needed supporting infrastructure.

Below is one example for a single table to support auditing for a few columns in the SQL Server 2005 AdventureWorks database:
 

USE [AdventureWorks]
GO

CREATE TABLE [HumanResources].[EmployeePayHistory_Audit_Generic](
[AuditID] [int] IDENTITY (1,1) NOT NULL,
[AuditColumnID] [int] NOT NULL,
[BaseTableUniqueKey_Value] [int] NOT NULL,
[AuditColumnID_Value] [varchar](500),
[ModifiedDate] [datetime] NOT NULL CONSTRAINT [DF_EmployeePayHistory_Audit_Generic_ModifiedDate] DEFAULT (getdate()),
[ModifiedBy] [varchar](100) NOT NULL CONSTRAINT [DF_EmployeePayHistory_Audit_Generic_ModifiedBy] DEFAULT (SUSER_SNAME())
CONSTRAINT [PK_EmployeePayHistory_Audit_Generic_AuditID] PRIMARY KEY CLUSTERED
(
[AuditID] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TABLE [HumanResources].[EmployeePayHistory_AuditLookup_Generic](
[AuditColumnID] [int]  IDENTITY (1,1) NOT NULL,
[AuditColumnName] [varchar](150) NOT NULL,
[ActiveFlag] [bit] NOT NULL
CONSTRAINT [PK_EmployeePayHistory_AuditLookup_Generic_AuditID] PRIMARY KEY CLUSTERED
(
[AuditColumnID] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO

ALTER TABLE [HumanResources].[EmployeePayHistory_Audit_Generic] ADD CONSTRAINT
[FK_EmployeePayHistory_Audit_Generic_EmployeePayHistory_AuditLookup_Generic] FOREIGN KEY
(
AuditColumnID
) REFERENCES [HumanResources].[EmployeePayHistory_AuditLookup_Generic]
(
AuditColumnID
)
GO

INSERT INTO [HumanResources].[EmployeePayHistory_AuditLookup_Generic] ([AuditColumnName],[ActiveFlag])
VALUES('RateChangeDate',1)
GO

INSERT INTO [HumanResources].[EmployeePayHistory_AuditLookup_Generic] ([AuditColumnName],[ActiveFlag])
VALUES('Rate',1)
GO

SELECT *
FROM [HumanResources].[EmployeePayHistory_AuditLookup_Generic]
GO
 

Let's explain each of the columns to help illustrate the point:
  • [HumanResources].[EmployeePayHistory_Audit_Generic]
    • AuditID - Primary key for the table that uniquely identifies each row in the table
    • AuditColumnID - Foreign key to the [HumanResources].[EmployeePayHistory_AuditLookup_Generic] table to identify if the row is for the RateChangeDate or Rate column
    • BaseTableUniqueKey_Value - In this example this would be the value for the EmployeeID
    • AuditColumnID_Value - In this example, this would be the data value for the  RateChangeDate or Rate column
    • ModifiedDate - Date the record is inserted
    • ModifiedBy - User name of the user who made the data change
  • [HumanResources].[EmployeePayHistory_AuditLookup_Generic]
    • AuditColumnID - Primary key for the table that is a foreign key to the [HumanResources].[EmployeePayHistory_Audit_Generic] table
    • AuditColumnName - Name of the column being audited i.e. RateChangeDate or Rate column
    • ActiveFlag - Flag to indicate if the auditing on the column is active or not

XML Design - Per Database Design

USE [AdventureWorks]
GO

CREATE TABLE [dbo].[Database_Audit_XML](
[AuditID] [int] IDENTITY (1,1) NOT NULL,
[TableName] [varchar](100) NOT NULL,
[ColumnName] [varchar](100) NOT NULL,
[AuditData] [xml] NOT NULL,
[ModifiedDate] [datetime] NOT NULL CONSTRAINT [DF_Database_Audit_XML_ModifiedDate] DEFAULT (getdate()),
[ModifiedBy] [varchar](100) NOT NULL CONSTRAINT [DF_Database_Audit_XML_ModifiedBy] DEFAULT (SUSER_SNAME())
CONSTRAINT [PK_Database_Audit_XML_AuditID] PRIMARY KEY CLUSTERED
(
[AuditID] ASC
) WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
 

Let's explain each of the columns to help illustrate the point:
  • [dbo].[Database_Audit_XML]
    • AuditID - Primary key for the table that uniquely identifies each row in the table
    • TableName - Base table name
      • Depending on how the data is queried this could be an object ID from the system tables, a user defined lookup table ID or the actual table name as is the case in our example
      • This data can also be stored in the AuditData (xml) column
    • ColumnName - Base column name
      • Depending on how the data is queried this could be the column ID from the system tables, a user defined lookup column ID or the actual column name as is the case in our example
      • This data can also be stored in the AuditData (xml) column
    • AuditData - XML storage for the audited data
    • ModifiedDate -  Date the record is inserted
      • This data can also be stored in the AuditData (xml) column
    • ModifiedBy - User name of the user who made the data change
      • This data can also be stored in the AuditData (xml) column

Depending on where the data is stored (i.e. instance, centrally, etc.) it is necessary to store additional data for reporting purposes.  These columns may include:

  • Per Instance - Centralized auditing on a per SQL Server instance perspective
    • Source database
  • Per Application - Centralized auditing per line of business application
    • Source database
    • Source application
  • Per Environment - Centralized auditing across all production SQL Server instances
    • Source database
    • Source application
    • Source SQL Server instance

In addition to adding columns to the tables as the suggestions above outline, it is also necessary to setup the infrastructure from a data perspective to replicate the data or be able to write the data directly to the tables.  This could mean taking one of more of the following approaches:

  • Adding permissions to accounts
  • Coding the logic in a trigger to write centrally
  • Including a separate auditing connection string in the application
  • Setting up a Linked Server to enable writing the data
  • Writing the data locally then replicating centrally
  • Using SQL Server Integration Services to capture the data and write it locally on a specific interval
  • Use Service Broker to write data to a queue for centralized processing

Next Steps

  • As you are faced with auditing needs, consider the information in this tip as a jumping off point to determine what information is needed when planning, designing, developing, testing and implementing a solution.
  • The code examples in this tip are just that, examples.  As you begin to define your auditing infrastructure, consider how these tables took advantage of a generic means to stored audited data either via table structures or an XML data type.  If these solutions only meet 80% of the auditing need in your environment, research how these general ideas can be modified to meet your needs, whether it is including multiple [BaseTableUniqueKey_Value] columns or having different data types for the columns or whatever the case may be.
  • If you choose to have some sort of centralized solution be sure to setup the needed infrastructure to support the centralized solution.  Be sure this does not cause any sort of performance or security issue.  In addition, consider getting creative with some of the new SQL Server 2005 features such as Service Broker or SQL Server Integration Services.
  • Depending on the particular auditing need, consider using columns with the XML data type to store the data.  This may be a simple and elegant way to support a number of different audited data sets from different base tables, if needed.
  • Check out these additional resources:


Last Update: 4/2/2008


About the author
MSSQLTips author Jeremy Kadlec
Jeremy Kadlec is the CTO @ Edgewood Solutions, co-founder of MSSQLTips.com, Baltimore SSUG co-leader and SQL Server MVP since 2009.

View all my tips
Related Resources


print tip Print  
Become a paid author





join MSSQLTips for free SQL Server tips     



Learn more about SQL Server tools
Post a comment or let the author know this tip helped you.

       All comments are reviewed, so stay on subject or we may delete your comment.

*Name   *Email Notify for updates



       Note: your email address is not published. Required fields are marked with an asterisk (*)


Get free SQL tips:

*Enter Code refresh code     



Thursday, May 27, 2010 - 4:08:52 PM - DylanGross Read The Tip
Hello I am Dylan Goss and I love this forum. I hope to learn from you, thanks!

Friday, April 11, 2008 - 10:08:41 AM - admin Read The Tip

Mark,

Thank you for the lengthy reply and feedback.

Generic tables: 

  • You are correct, only specific columns are being audited with the 2 table generic solution.  The requirement is not to audit the entire row.
  • For the population, some of the data is populated via a trigger and other data is actually populated by calling a stored procedure when a specific set of logic is called in the front end application.
  • We are storing new values and use the start date to determine when the new value went into effect.
  • We do not have a need to distinguish between INSERT, UPDATE and DELETE statements.  I could see how that could be very important in some environments.
  • We are not trying to capture failed attempts to INSERT, UPDATE and DELETE data.  Once again that is not a requirement, but with some data is sometimes more important than a normal change.
  • Reporting is primarily from the front end application or custom reports.

XML tables

  • This option has much more flexibility and can include many of these items based on the XML schema.
  • Once again may of the items you mentioned were not requirements, but I think this solution has a much easier means to change schema with less development time, but that could change based on implementation details.

Thank you,
The MSSQLTips.com Team


Tuesday, April 08, 2008 - 9:30:14 AM - MarkJo_FP Read The Tip

If I am understanding your generic solution, I think it is very similar to out current approach of using 2 tables.  Your implementation has the flexibility to audit selected columns in a table whereas we audit the entire table.  I assume you use triggers to populate these tables when a change is made. 

You are storing the table and column information in one table whereas we have a header table where we store the table name, rowid, operation, modifiedby, modifieddate and computername and then a child table which stores the column name, old value and new value.

Since you are storing only one value for the column you are apparently storing either the new value for the column or the old value for the column.  How do you distinguish between inserts, updates and deletes when reporting?  How does it handle failed update attempts? We have an auditlog viewer as part of the product that allows searching the log based on a variety of criteria.  Without storing the old and new values I assume you would need to join on the previous change for that column.  Storing the Old and New values is where we ended up with the temp tables and INSERTED and DELETED tables as I described in my post.

Have you tried doing a bulk load (or change) of data with your auditing in place.   It is not very common but I do run into it once in a while.

If I am missing something in my assumptions above let me know.  I am always looking for better ways to deal with the auditing issues. 

Thanks.

Mark

 

 

 

 


Friday, April 04, 2008 - 7:21:33 AM - admin Read The Tip

MarkJo_FP,

Thank you for sharing some of the challenges you are facing today with auditing.

Although your new solution does have many tables it sounds like the performance is acceptable and is something you are willing to manage.

How do you think a generic solution either with an XML column or a generic table would fair in that scenario?

We have implemented both solutions under different circumstances and they have appeared to perform well and meet the overall reporting needs.

Just interested in your thoughts on the matter as opposed to recommending a different approach.

Thank you,
The MSSQLTips.com Team

 


Wednesday, April 02, 2008 - 11:56:06 AM - MarkJo_FP Read The Tip

I also have to do data change auditing on our database.  Our current implementation has been in place for several years but I am looking at changing it.  The new approach is based on an article by Itzik Ben-Gan a couple years ago.  Below is a summary of the current approach and the proposed approach.  Any comments would be greatly appreciated.

 

Current Approach

Currently auditing is done with 3 "INSTEAD OF" triggers (trg<tableName>Insert, trg<tableName>Update, trg<tableName>Delete) on each table and 2 auditing tables (AuditLog and AuditDetail).

The insert trigger copies the data from Inserted to a temp table, calls a procedure that populates the AuditLog table with the table name, Table PK, Operation, who made the change, the time of the change and what computer the changes was made on. The procedure then inserts a row for each column of the table into the AuditDetail table. This contains the column name, old value (always NULL in this case) and new value. Once the procedure is completed the temp table is dropped and the data from INSERTED is inserted into the base table.

The Update trigger copies data from Inserted and Deleted into temp tables and calls a procedure that populates the AuditLog table just like the insert trigger and then compares the inserted and deleted temp tables column by column and inserts a row in the AuditDetails table for each of the columns that have changed. The temp tables are then dropped and the base table is updated.

The Delete trigger copies data from Deleted into a temp table and calls a procedure that populates the AuditLog table and inserts a row for each column of the table into the AuditDetail table and then deletes the record in the base table.

Pros: Only 2 additional tables required.

Cons: Performances is terrible in a table with millions of records. Bulk changes to a table take hours.

New Approach

What I am considering for the new approach is this.  One trigger on each table and 3 additional tables per audited table. The trigger identifies the type of statement that fired the trigger and the number of rows affected. Uses table variable instead of temp tables which allow roll back of a change with ability to log the attempted change. Has logic to log attempted changes to the data and block the change (e.g. changing the PK column or integrity violations).

Each audited table will have a <TableName>AuditHeader table that contains the dmltype, date of change, who made the change, application name, what computer the change was made from, a failed flag and comment if the attempted change failed or was blocked.

Each audited table will also have a <TableName>InsAuditDelDetail table that contain a FK to the header table and a column for each of the columns in the base table. This table will store the full row of data for inserts and deletes.

The final table is <TableName>AuditUpdDetail. This table contain a FK to the Header table, the name of the column, the old value and new value for each of the columns that changed in the update.

Pros: Performance is very good. I ran a change of several hundred thousand records and the time went from about 4 minutes to 6 minutes. A comercial package using two audit tables ran for 20 minutes and crashed. Current version ran for several hours and I killed it.

Avoids hot spots created by all user changes resulting in hitting just 2 tables (the detail in particular).

Cons: A lot of extra tables. May be more difficult to generate a view of all changes depending on requirements

Admittedly bulk changes do not happen often but when they do it is a killer and we are seeing performance problems under normal use with the current approach when the log grows too large. Some of this can be mitigated by table and index changes.

The <TableName>AuditUpdDetail could be eliminate and the <TableName>InsAuditDelDetail used. The before and after values would need to be determined by comparison of the 2 records. I have seen this approach also. Another downside to this is that you are storing an entire record even if only one column changed (a big hit for tables with image data, file attachments, etc.)

 

 




 
Sponsor Information