Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Scripts          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Problem using DISTINCT in case insensitive SQL Server databases


By:   |   Updated: 2008-05-27   |   Comments (3)   |   Related: More > T-SQL

Problem

SQL Server gives you the ability to store mixed case data in your databases, but depending on how you create your databases SQL Server will ignore the case when you issue T-SQL commands.  One of the problems you may be faced with is that you want to get a distinct list of values from a table to show the differences in your table, but if your database is setup as case insensitive the DISTINCT clause does not show the differences it all gets grouped together.  So based on this what options are there?

Solution

To illustrate this behavior we are going to look at a couple ways this works using a case sensitive database and a case insensitive database.

The first set of queries uses the AdventureWorks database which is configured as case sensitive.  To determine the collation for your databases you can run this query:

SELECT name, collation_name 
FROM master.sys.databases

We are querying the data from Preson.Contact in the AdventureWorks database.  All data is setup as mixed case, so we have no duplicates when we run this query.

SELECT DISTINCT TOP 10 FirstName 
FROM Person.Contact
WHERE FirstName LIKE 'A%'
ORDER BY 1 
first name

If we update one of the record and change the FirstName from "Adam" to "ADAM" we should get two different values when we run the query.

UPDATE Person.Contact
SET FirstName = 'ADAM'
WHERE ContactID = 62
GO
  
SELECT DISTINCT TOP 10 FirstName 
FROM Person.Contact
WHERE FirstName LIKE 'A%'
ORDER BY 1 

As you can see we now show both "Adam" and "ADAM" as two different values.

first name

The next thing we are going to do is to create a new table in a case insensitive database and then load all of the data from Person.Contact into this new table.

CREATE TABLE Test.dbo.contact (FirstName nvarchar(50))
GO
INSERT INTO Test.dbo.contact
SELECT FirstName FROM Person.Contact
GO

SELECT DISTINCT TOP 10 FirstName 
FROM Test.dbo.contact
WHERE FirstName LIKE 'A%'
ORDER BY 1 
GO

When we run the SELECT query you can see that the output combines both "Adam" and "ADAM" since case is ingored.

select query

To get around this we can change the query as follows to force the collation to case sensitive on the FirstName column.

SELECT DISTINCT TOP 10 FirstName COLLATE sql_latin1_general_cp1_cs_as
FROM Test.dbo.contact
WHERE FirstName LIKE 'A%'
ORDER BY 1 

When this is run we now have the values of "Adam" and "ADAM".

column name

So depending on how your database is setup you may or may not see the differences.

To show you another example here is just a quick way of selecting the case sensitive or case insensitive option.

The first query we run is using case sensitive, so all four rows should show up.

select distinct (item) COLLATE sql_latin1_general_cp1_cs_as
FROM (
select 'abcd' item
union all select 'ABCD'
union all select 'defg'
union all select 'deFg') items 
no column name

All that is different in the next query is the name of the collation. When this query is run using case insensitive, we only get two rows.

select distinct (item) COLLATE sql_latin1_general_cp1_ci_ai
FROM (
select 'abcd' item
union all select 'ABCD'
union all select 'defg'
union all select 'deFg') items
no column name
Next Steps
  • You can see how the behavior of the database can impact the output, so next time you are looking for distinct values make sure you understand your database settings or use the COLLATE option
  • Here is another tip that shows you how you can use COLLATE in your WHERE clause Case Sensitive Search on a Case Insensitive SQL Server
  • Special thanks to Andy Novick at Novick Software for this tip idea


Last Updated: 2008-05-27


get scripts

next tip button



About the author
MSSQLTips author Greg Robidoux Greg Robidoux is the President of Edgewood Solutions and a co-founder of MSSQLTips.com.

View all my tips




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Wednesday, August 07, 2013 - 2:43:44 AM - Jagadesh Back To Top

Thanks Greg, it helps me lot..


Wednesday, June 11, 2008 - 8:50:37 AM - grobido Back To Top

Here is an example running this against the AdventureWorks database

SELECT DISTINCT City FROM Person.Address

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Address'. Scan count 1, logical reads 216, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SELECT DISTINCT TOP 10 City FROM Person.Address

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Address'. Scan count 1, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The TOP 10 requires a lot less logical reads getting data for the entire table.

 


Wednesday, June 11, 2008 - 2:19:55 AM - Praveen Back To Top

Difference between " Select DISTINCT TOP 10 column_name from tbl " and " Select DISTINCT column_name from tbl " . And how many roqws it will execute and how much each query will take.

 SQL 2005


Learn more about SQL Server tools