solving sql server problems for millions of dbas and developers since 2006


Identify and resolve SQL Server problems BEFORE they happen with SQL diagnostic manager

SQL Server DBA Tips SQL Server Developer Tips SQL Server Business Intelligence Tips SQL Server Career Tips SQL Server Whitepapers SQL Server Tools SQL Server Webcasts SQL Server Questions and Answers SQL Server Questions and Answers


SQL Product Highlight

Idera - SQL compliance manager

SQL compliance manager is a comprehensive auditing solution that tells you who did what, when and how on your SQL Servers. SQL compliance manager helps you ensure compliance with regulatory and data security requirements.

Learn more!








Sort and compare SQL Server unicode and binary data results

By: | Read Comments | Print

Chad is an Architect, Administrator and Developer with technologies such as SQL Server, .NET, and Windows Server.

Related Tips: More

A customer recently questioned me regarding a sorting issue they were having in the following scenario:

   My database is running under the "SQL_Latin1_General_CP1_CI_AS" Sql collation, but when I run
   the script below I seem to get incorrect sort results.  Why does the "abc-test" item come 
   immediately after the "abctest" item and not after "abctest1" as I would expect it to?

      create table #temp1 (
       ServerItem nvarchar(260) collate SQL_Latin1_General_CP1_CI_AS primary key clustered
      )

      insert #temp1
      select 'abctest'
      insert #temp1
      select 'abc-test'
      insert #temp1
      select 'abctest1'
      insert #temp1
      select 'abc-test1'

      select * from #temp1
      order by ServerItem

      drop table #temp1

      RESULTS:
     
      ServerItem
      ------------
      abctest
      abc-test
      abctest1
      abc-test1


   Is this right? Is this a bug?

No, this is not a bug - and yes, this is correct and by design.  The reasoning lies in how certain SQL collations particularly handle sorting rules for unicode data...this collation in particular uses a word sort algorithm to sort the unicode data, which ignores punctuation characters (and hyphen is one of those, amoung many other such as apostrophe's, dialect characters, etc.). The simple rule to follow when comparing or sorting multiple unicode strings is to not assume that it is done character-by-character, as it isn't.

A similar distinction occurs when using a Binary code page, like Latin1_General_BIN for example (popular amoung many cross-platform COTS applications). Did you know that if you are using a binary sort order that UPPERCASE letters will always sort before lowercase letters?  This is because binary sorting sorts based on the actual bit representation of the characters, and UPPERCASE letters have a lower ascii-code than that of lowercase letters.  Using a similar example as above, here is what you would see using a binary sort order:

   create table #temp1 (ServerItem nvarchar(260) collate Latin1_General_BIN primary key clustered)

   insert #temp1
   select 'abctest'
   insert #temp1
   select 'ABCTEST'
   insert #temp1
   select 'BBCTEST'
   insert #temp1
   select 'bbctest'
  
   select * from #temp1
   order by ServerItem

   drop table #temp1

   RESULTS:
  
   ServerItem
   -------------
   ABCTEST
   BBCTEST
   abctest
   bbctest

See how BBCTEST comes before abctest?  You'd see the same type of result if you changed "BBCTEST" to just "Bbctest".

Multiple things are taken into consideration when sorting/comparing data in Sql Server (and many other platforms) including your collation setting, code page, binary/non-binary, unicode vs. ascii data, etc. Be sure to understand differences between all characteristics of your system before deploying...


Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.mssqltips.com/disclaimer.asp and http://www.mssqltips.com/copyright.asp.

 



Related Tips: More | Become a paid author


Last Update: 10/15/2007

Share: Share 






Comments and Feedback:


Post a Comment or Question

Keep it clean and stay on the subject or we may delete your comment.
Your email address is not published. Required fields are marked with an asterisk (*)

*Name   *Email   Notify for updates
Comments
*Enter Code refresh code


 
Sponsor Information
Try the award winning SQL diagnostic manager as a free 14-day trial!

Write, edit, and explore SQL effortlessly with SQL Prompt.

Real solutions for real problems. Get in-depth knowledge of all SQL Server features.

Join the over million SQL Server Professionals who get their issues resolved daily.

Optimizing SQL Server performance can be a daunting task. Or is it?


Copyright (c) 2006-2012 Edgewood Solutions, LLC All rights reserved
privacy | disclaimer | copyright | advertise | about
authors | contribute | feedback | giveaways | user groups
Some names and products listed are the registered trademarks of their respective owners.


Edgewood Solutions LLC | MSSharePointTips.com | MSSQLTips.com