solving sql server problems for millions of dbas and developers since 2006



SQL Server DBA Tips SQL Server Developer Tips SQL Server Business Intelligence Tips SQL Server Career Tips SQL Server Tip Categories SQL Server Tutorials SQL Server Webcasts SQL Server Whitepapers SQL Server Tools SQL Server Questions and Answers MSSQLTips Authors About MSSQLTips SQL Server User Groups MSSLQTips Giveaways MSSQLTips Advertising Options

MSSQLTips Facebook Page MSSQLTips LinkedIn Page MSSQLTips RSS Feed MSSQLTips Twitter Page MSSQLTips Google+ Page

SQL Product Highlight

Idera - SQL safe backup

Idera's SQL safe provides a high-performance backup and recovery solution for Microsoft SQL Server. SQL safe saves money by reducing database backup time by up to 50% over native backups and reducing backup disk space requirements by up to 95%. SQL safe also enables complete "hands-free" automated backup of your entire SQL Server infrastructure and ensures compliance with your organization's backup and recovery policies. From implementations with tens of SQL servers to enterprises with hundreds of servers spread around the globe, SQL safe is the only SQL Server backup and recovery solution that scales to meet the challenge.

Learn more!








CHECKSUM Functions in SQL Server 2005

By: | Read Comments (4) | Print

Jeremy is the CTO @ Edgewood Solutions, co-founder of MSSQLTips.com and SQL Server MVP since 2009.



Related Tips: More

Problem
Determining if two rows or expressions are equal can be a difficult and resource intensive process.  This can be the case with UPDATE statements where the update was conditional based on all of the columns being equal or not for a specific row.  To address this need in the SQL Server environment the CHECKSUM, CHECKSUM_AGG and BINARY_CHECKSUM functions are available in SQL Server 2005 to natively create a unique expression, row or table for comparison or other application needs.  In this tip we will focus on the common questions related to the CHECKSUM code and provide an example to begin to leverage the CHECKSUM commands in your T-SQL code.

Solution
What is the purpose of the using the CHECKSUM functionality?

The CHECKSUM is intended to build a hash index based on an expression or column list. 

When would I use the CHECKSUM function?

One example of using a CHECKSUM is to store the unique value for the entire row in a column for later comparison.  This would be helpful in a situation where all of the rows in a table need to be compared in order to perform an UPDATE.  Without a CHECKSUM you would need to do the following:

Download the sample code here from the image above.

Compare the UPDATE code from the first example to this one using the CHECKSUM function.

Download the sample code here from the image above.

In order for this query to be successful, it is necessary to build the CHECKSUM value ahead of time when inserting the data in order to perform the comparison in subsequent code.  So if your performing very few entire row (or just about every column in the row) comparisons then ad-hoc comparisons may be optimal.  However, if significant number of comparisons are made with a large number of columns, then this option should be researched further and tested for performance improvements over individual comparisons outlined in the first set of code.

What are some of the caveats with using any of the CHECKSUM functions?

  • Need to ensure the column or expression order is the same between the two CHECKSUMs that are being compared.
  • Would not recommend a CHECKSUM(*) because the generated CHECKSUM value is based on the column order for the table definition at run time which may change over time, so I would recommend explicitly defining the column listing with a static order in the CHECKSUM code.  For example, use CHECKSUM(Col1, Col2, Col3) where these are all of the columns in a table as opposed to CHECKSUM(*).
  • Need to ensure that if a date\time column or value is included in the CHECKSUM that is is equal between the 2 expressions\columns because if the date\time is off by even a second the CHECKSUM values will be different.

Next Steps

  • The next time you need to compare the unique characteristics of an expression, columns or a table consider the native features available with CHECKSUM, CHECKSUM_AGG and BINARY_CHECKSUM functions.
  • If you have UPDATE code where many columns are compared to determine if the data is unique, consider changing the code from using WHERE Col1 = Col2, etc. to build the CHECKSUM and then compare the CHECKSUM values.
  • Stay tuned for tips on the CHECKSUM_AGG and BINARY_CHECKSUM functions with more way to implement them into your code...


Related Tips: More | Become a paid author


Last Update: 7/19/2006

Share: Share 






Comments and Feedback:

Tuesday, March 11, 2008 - 11:02:52 AM - papachec Read The Tip

I have been using the checksum function successfully for quite some time.  Just recently I encountered 2 instances where a different list of values produced identical checksums.

I'd like to understand the calculation that is done by checksum 'under the hood' to know how this is possible and how safe it is to continue to use 'checksum'.

EXAMPLE:
  select checksum('51;52;56;2204;') produces 1726190947
  select checksum('51;53;56;2205;') produces 1726190947

  select checksum('51;52;56;2205;') produces 1726190963
  select checksum('51;53;56;2204;') produces 1726190963

I expected that I would get 4 different results because each of the 4 examples is a different list of values.  But I get only 2 different results.

Your comments and suggestions are welcomed.


Tuesday, March 11, 2008 - 6:57:26 PM - aprato Read The Tip

 In the Remarks section of the SQL 2005 BOL it says this

 

CHECKSUM applied over any two lists of expressions returns the same value if the corresponding elements of the two lists have the same type and are equal when compared using the equals (=) operator. For this definition, null values of a specified type are considered to compare as equal. If one of the values in the expression list changes, the checksum of the list also generally changes. However, there is a small chance that the checksum will not change.

Based on the last 2 sentences, I'm not sure of it's reliability.  I don't generally use it (In fact, I've never used it).   Are you using this for checking data changes?  If so, maybe a table flag would be a safer option?


Thursday, April 03, 2008 - 10:48:57 AM - glauco.basilio Read The Tip

I try unsucefull use checksum and binary_checksum to identify duplicated rows in my database. If you have a table with a large amount of rows you will see that both functions generate the same "hash" for rows with diferent data.


Friday, April 04, 2008 - 7:10:12 AM - admin Read The Tip

glauco.basilio,

Agreed that the hashes may be the same on two different rows. 

Can you include or exclude particular columns to see if the hash will still meet your business rules and be unique?

Thank you,
The MSSQLTips.com Team



Post a Comment or Question

Keep it clean and stay on the subject or we may delete your comment.
Your email address is not published. Required fields are marked with an asterisk (*)

*Name   *Email   Notify for updates
Comments
*Enter Code refresh code


 
Sponsor Information
"SQL diagnostic manager delivers response in minutes, not hours!"

SQL Monitor – For database professionals who need results on Day One. Try it online.

What grade do you think your SQL Servers get? Find out with Edgewood's SQL Server Health Check consulting services.

Free Trial: Get Proactive Insight with Spotlight® for SQL Server Enterprise.

Join the over million SQL Server Professionals who get their issues resolved daily.

Are you waiting on SQL Server? Learn about these DMV's.


Copyright (c) 2006-2012 Edgewood Solutions, LLC All rights reserved
privacy | disclaimer | copyright | advertise | about
authors | contribute | feedback | giveaways | user groups
Some names and products listed are the registered trademarks of their respective owners.


Edgewood Solutions LLC | MSSharePointTips.com | MSSQLTips.com