Determining if two rows or expressions are equal can be a difficult and resource intensive process. This can be the case with UPDATE statements where the update was conditional based on all of the columns being equal or not for a specific row. To address this need in the SQL Server environment the CHECKSUM, CHECKSUM_AGG and BINARY_CHECKSUM functions are available in SQL Server 2005 to natively create a unique expression, row or table for comparison or other application needs. In this tip we will focus on the common questions related to the CHECKSUM code and provide an example to begin to leverage the CHECKSUM commands in your T-SQL code.
What is the purpose of the using the CHECKSUM functionality?
The CHECKSUM is intended to build a hash index based on an expression or column list.
When would I use the CHECKSUM function?
One example of using a CHECKSUM is to store the unique value for the entire row in a column for later comparison. This would be helpful in a situation where all of the rows in a table need to be compared in order to perform an UPDATE. Without a CHECKSUM you would need to do the following:
Compare the UPDATE code from the first example to this one using the CHECKSUM function.
In order for this query to be successful, it is necessary to build the CHECKSUM value ahead of time when inserting the data in order to perform the comparison in subsequent code. So if your performing very few entire row (or just about every column in the row) comparisons then ad-hoc comparisons may be optimal. However, if significant number of comparisons are made with a large number of columns, then this option should be researched further and tested for performance improvements over individual comparisons outlined in the first set of code.
What are some of the caveats with using any of the CHECKSUM functions?
- Need to ensure the column or expression order is the same between the two CHECKSUMs that are being compared.
- Would not recommend a CHECKSUM(*) because the generated CHECKSUM value is based on the column order for the table definition at run time which may change over time, so I would recommend explicitly defining the column listing with a static order in the CHECKSUM code. For example, use CHECKSUM(Col1, Col2, Col3) where these are all of the columns in a table as opposed to CHECKSUM(*).
- Need to ensure that if a date\time column or value is included in the CHECKSUM that is is equal between the 2 expressions\columns because if the date\time is off by even a second the CHECKSUM values will be different.
- The next time you need to compare the unique characteristics of an expression, columns or a table consider the native features available with CHECKSUM, CHECKSUM_AGG and BINARY_CHECKSUM functions.
- If you have UPDATE code where many columns are compared to determine if the data is unique, consider changing the code from using WHERE Col1 = Col2, etc. to build the CHECKSUM and then compare the CHECKSUM values.
- Stay tuned for tips on the CHECKSUM_AGG and BINARY_CHECKSUM functions with more way to implement them into your code...
Last Update: 2006-07-19
About the author
View all my tips