CHECKSUM Functions in SQL Server 2005

By:   |   Comments (5)   |   Related: > Functions System


Problem

Determining if two rows or expressions are equal can be a difficult and resource intensive process.  This can be the case with UPDATE statements where the update was conditional based on all of the columns being equal or not for a specific row.  To address this need in the SQL Server environment the CHECKSUM, CHECKSUM_AGG and BINARY_CHECKSUM functions are available in SQL Server 2005 to natively create a unique expression, row or table for comparison or other application needs.  In this tip we will focus on the common questions related to the CHECKSUM code and provide an example to begin to leverage the CHECKSUM commands in your T-SQL code.

Solution

What is the purpose of the using the CHECKSUM functionality?

The CHECKSUM is intended to build a hash index based on an expression or column list. 

When would I use the CHECKSUM function?

One example of using a CHECKSUM is to store the unique value for the entire row in a column for later comparison.  This would be helpful in a situation where all of the rows in a table need to be compared in order to perform an UPDATE.  Without a CHECKSUM you would need to do the following:

CHECKSUM SQL2000

Download the sample code here from the image above.

Compare the UPDATE code from the first example to this one using the CHECKSUM function.

CHECKSUM SQL2005

Download the sample code here from the image above.

In order for this query to be successful, it is necessary to build the CHECKSUM value ahead of time when inserting the data in order to perform the comparison in subsequent code.  So if your performing very few entire row (or just about every column in the row) comparisons then ad-hoc comparisons may be optimal.  However, if significant number of comparisons are made with a large number of columns, then this option should be researched further and tested for performance improvements over individual comparisons outlined in the first set of code.

What are some of the caveats with using any of the CHECKSUM functions?

  • Need to ensure the column or expression order is the same between the two CHECKSUMs that are being compared.
  • Would not recommend a CHECKSUM(*) because the generated CHECKSUM value is based on the column order for the table definition at run time which may change over time, so I would recommend explicitly defining the column listing with a static order in the CHECKSUM code.  For example, use CHECKSUM(Col1, Col2, Col3) where these are all of the columns in a table as opposed to CHECKSUM(*).
  • Need to ensure that if a date\time column or value is included in the CHECKSUM that is is equal between the 2 expressions\columns because if the date\time is off by even a second the CHECKSUM values will be different.
Next Steps
  • The next time you need to compare the unique characteristics of an expression, columns or a table consider the native features available with CHECKSUM, CHECKSUM_AGG and BINARY_CHECKSUM functions.
  • If you have UPDATE code where many columns are compared to determine if the data is unique, consider changing the code from using WHERE Col1 = Col2, etc. to build the CHECKSUM and then compare the CHECKSUM values.
  • Stay tuned for tips on the CHECKSUM_AGG and BINARY_CHECKSUM functions with more way to implement them into your code...


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Jeremy Kadlec Jeremy Kadlec is a Co-Founder, Editor and Author at MSSQLTips.com with more than 300 contributions. He is also the CTO @ Edgewood Solutions and a six-time SQL Server MVP. Jeremy brings 20+ years of SQL Server DBA and Developer experience to the community after earning a bachelor's degree from SSU and master's from UMBC.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Friday, May 9, 2014 - 1:52:01 PM - Dan K Back To Top (30704)

Microsoft recommends using HASHBYTES with the MD5 algorithm to avoid the problem of identical hashcodes being created.  "However, there is a small chance that the checksum will not change. For this reason, we do not recommend using CHECKSUM to detect whether values have changed, unless your application can tolerate occasionally missing a change. Consider usingHashBytesinstead. When an MD5 hash algorithm is specified, the probability of HashBytes returning the same result for two different inputs is much lower than that of CHECKSUM."


Friday, April 4, 2008 - 7:10:12 AM - admin Back To Top (823)

glauco.basilio,

Agreed that the hashes may be the same on two different rows. 

Can you include or exclude particular columns to see if the hash will still meet your business rules and be unique?

Thank you,
The MSSQLTips.com Team


Thursday, April 3, 2008 - 10:48:57 AM - glauco.basilio Back To Top (810)

I try unsucefull use checksum and binary_checksum to identify duplicated rows in my database. If you have a table with a large amount of rows you will see that both functions generate the same "hash" for rows with diferent data.


Tuesday, March 11, 2008 - 6:57:26 PM - aprato Back To Top (717)

 In the Remarks section of the SQL 2005 BOL it says this

 

CHECKSUM applied over any two lists of expressions returns the same value if the corresponding elements of the two lists have the same type and are equal when compared using the equals (=) operator. For this definition, null values of a specified type are considered to compare as equal. If one of the values in the expression list changes, the checksum of the list also generally changes. However, there is a small chance that the checksum will not change.

Based on the last 2 sentences, I'm not sure of it's reliability.  I don't generally use it (In fact, I've never used it).   Are you using this for checking data changes?  If so, maybe a table flag would be a safer option?


Tuesday, March 11, 2008 - 11:02:52 AM - papachec Back To Top (716)

I have been using the checksum function successfully for quite some time.  Just recently I encountered 2 instances where a different list of values produced identical checksums.

I'd like to understand the calculation that is done by checksum 'under the hood' to know how this is possible and how safe it is to continue to use 'checksum'.

EXAMPLE:
  select checksum('51;52;56;2204;') produces 1726190947
  select checksum('51;53;56;2205;') produces 1726190947

  select checksum('51;52;56;2205;') produces 1726190963
  select checksum('51;53;56;2204;') produces 1726190963

I expected that I would get 4 different results because each of the 4 examples is a different list of values.  But I get only 2 different results.

Your comments and suggestions are welcomed.















get free sql tips
agree to terms