Learn more about SQL Server tools

 

Tutorials          DBA          Dev          BI          Career          Categories          Videos          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Events

DBA    Dev    BI    Categories

 

CHECKSUM Functions in SQL Server 2005


By:   |   Read Comments (5)   |   Related Tips: More > Functions - System

Problem
Determining if two rows or expressions are equal can be a difficult and resource intensive process.  This can be the case with UPDATE statements where the update was conditional based on all of the columns being equal or not for a specific row.  To address this need in the SQL Server environment the CHECKSUM, CHECKSUM_AGG and BINARY_CHECKSUM functions are available in SQL Server 2005 to natively create a unique expression, row or table for comparison or other application needs.  In this tip we will focus on the common questions related to the CHECKSUM code and provide an example to begin to leverage the CHECKSUM commands in your T-SQL code.

Solution
What is the purpose of the using the CHECKSUM functionality?

The CHECKSUM is intended to build a hash index based on an expression or column list. 

When would I use the CHECKSUM function?

One example of using a CHECKSUM is to store the unique value for the entire row in a column for later comparison.  This would be helpful in a situation where all of the rows in a table need to be compared in order to perform an UPDATE.  Without a CHECKSUM you would need to do the following:

Download the sample code here from the image above.

Compare the UPDATE code from the first example to this one using the CHECKSUM function.

Download the sample code here from the image above.

In order for this query to be successful, it is necessary to build the CHECKSUM value ahead of time when inserting the data in order to perform the comparison in subsequent code.  So if your performing very few entire row (or just about every column in the row) comparisons then ad-hoc comparisons may be optimal.  However, if significant number of comparisons are made with a large number of columns, then this option should be researched further and tested for performance improvements over individual comparisons outlined in the first set of code.

What are some of the caveats with using any of the CHECKSUM functions?

  • Need to ensure the column or expression order is the same between the two CHECKSUMs that are being compared.
  • Would not recommend a CHECKSUM(*) because the generated CHECKSUM value is based on the column order for the table definition at run time which may change over time, so I would recommend explicitly defining the column listing with a static order in the CHECKSUM code.  For example, use CHECKSUM(Col1, Col2, Col3) where these are all of the columns in a table as opposed to CHECKSUM(*).
  • Need to ensure that if a date\time column or value is included in the CHECKSUM that is is equal between the 2 expressions\columns because if the date\time is off by even a second the CHECKSUM values will be different.

Next Steps

  • The next time you need to compare the unique characteristics of an expression, columns or a table consider the native features available with CHECKSUM, CHECKSUM_AGG and BINARY_CHECKSUM functions.
  • If you have UPDATE code where many columns are compared to determine if the data is unique, consider changing the code from using WHERE Col1 = Col2, etc. to build the CHECKSUM and then compare the CHECKSUM values.
  • Stay tuned for tips on the CHECKSUM_AGG and BINARY_CHECKSUM functions with more way to implement them into your code...


Last Update:






About the author
MSSQLTips author Jeremy Kadlec Since 2002, Jeremy Kadlec has delivered value to the global SQL Server community as an Edgewood Solutions SQL Server Consultant, MSSQLTips.com co-founder and Baltimore SSUG co-leader.

View all my tips


 









Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


Get free SQL tips:

*Enter Code refresh code     



Friday, May 09, 2014 - 1:52:01 PM - Dan K Back To Top

Microsoft recommends using HASHBYTES with the MD5 algorithm to avoid the problem of identical hashcodes being created.  "However, there is a small chance that the checksum will not change. For this reason, we do not recommend using CHECKSUM to detect whether values have changed, unless your application can tolerate occasionally missing a change. Consider usingHashBytesinstead. When an MD5 hash algorithm is specified, the probability of HashBytes returning the same result for two different inputs is much lower than that of CHECKSUM."


Friday, April 04, 2008 - 7:10:12 AM - admin Back To Top

glauco.basilio,

Agreed that the hashes may be the same on two different rows. 

Can you include or exclude particular columns to see if the hash will still meet your business rules and be unique?

Thank you,
The MSSQLTips.com Team


Thursday, April 03, 2008 - 10:48:57 AM - glauco.basilio Back To Top

I try unsucefull use checksum and binary_checksum to identify duplicated rows in my database. If you have a table with a large amount of rows you will see that both functions generate the same "hash" for rows with diferent data.


Tuesday, March 11, 2008 - 6:57:26 PM - aprato Back To Top

 In the Remarks section of the SQL 2005 BOL it says this

 

CHECKSUM applied over any two lists of expressions returns the same value if the corresponding elements of the two lists have the same type and are equal when compared using the equals (=) operator. For this definition, null values of a specified type are considered to compare as equal. If one of the values in the expression list changes, the checksum of the list also generally changes. However, there is a small chance that the checksum will not change.

Based on the last 2 sentences, I'm not sure of it's reliability.  I don't generally use it (In fact, I've never used it).   Are you using this for checking data changes?  If so, maybe a table flag would be a safer option?


Tuesday, March 11, 2008 - 11:02:52 AM - papachec Back To Top

I have been using the checksum function successfully for quite some time.  Just recently I encountered 2 instances where a different list of values produced identical checksums.

I'd like to understand the calculation that is done by checksum 'under the hood' to know how this is possible and how safe it is to continue to use 'checksum'.

EXAMPLE:
  select checksum('51;52;56;2204;') produces 1726190947
  select checksum('51;53;56;2205;') produces 1726190947

  select checksum('51;52;56;2205;') produces 1726190963
  select checksum('51;53;56;2204;') produces 1726190963

I expected that I would get 4 different results because each of the 4 examples is a different list of values.  But I get only 2 different results.

Your comments and suggestions are welcomed.


Learn more about SQL Server tools