Problem using DISTINCT in case insensitive SQL Server databases

By:   |   Comments (3)   |   Related: > TSQL


Problem

SQL Server gives you the ability to store mixed case data in your databases, but depending on how you create your databases SQL Server will ignore the case when you issue T-SQL commands.  One of the problems you may be faced with is that you want to get a distinct list of values from a table to show the differences in your table, but if your database is setup as case insensitive the DISTINCT clause does not show the differences it all gets grouped together.  So based on this what options are there?

Solution

To illustrate this behavior we are going to look at a couple ways this works using a case sensitive database and a case insensitive database.

The first set of queries uses the AdventureWorks database which is configured as case sensitive.  To determine the collation for your databases you can run this query:

SELECT name, collation_name 
FROM master.sys.databases

We are querying the data from Preson.Contact in the AdventureWorks database.  All data is setup as mixed case, so we have no duplicates when we run this query.

SELECT DISTINCT TOP 10 FirstName 
FROM Person.Contact
WHERE FirstName LIKE 'A%'
ORDER BY 1 
first name

If we update one of the record and change the FirstName from "Adam" to "ADAM" we should get two different values when we run the query.

UPDATE Person.Contact
SET FirstName = 'ADAM'
WHERE ContactID = 62
GO
  
SELECT DISTINCT TOP 10 FirstName 
FROM Person.Contact
WHERE FirstName LIKE 'A%'
ORDER BY 1 

As you can see we now show both "Adam" and "ADAM" as two different values.

first name

The next thing we are going to do is to create a new table in a case insensitive database and then load all of the data from Person.Contact into this new table.

CREATE TABLE Test.dbo.contact (FirstName nvarchar(50))
GO
INSERT INTO Test.dbo.contact
SELECT FirstName FROM Person.Contact
GO

SELECT DISTINCT TOP 10 FirstName 
FROM Test.dbo.contact
WHERE FirstName LIKE 'A%'
ORDER BY 1 
GO

When we run the SELECT query you can see that the output combines both "Adam" and "ADAM" since case is ingored.

select query

To get around this we can change the query as follows to force the collation to case sensitive on the FirstName column.

SELECT DISTINCT TOP 10 FirstName COLLATE sql_latin1_general_cp1_cs_as
FROM Test.dbo.contact
WHERE FirstName LIKE 'A%'
ORDER BY 1 

When this is run we now have the values of "Adam" and "ADAM".

column name

So depending on how your database is setup you may or may not see the differences.

To show you another example here is just a quick way of selecting the case sensitive or case insensitive option.

The first query we run is using case sensitive, so all four rows should show up.

select distinct (item) COLLATE sql_latin1_general_cp1_cs_as
FROM (
select 'abcd' item
union all select 'ABCD'
union all select 'defg'
union all select 'deFg') items 
no column name

All that is different in the next query is the name of the collation. When this query is run using case insensitive, we only get two rows.

select distinct (item) COLLATE sql_latin1_general_cp1_ci_ai
FROM (
select 'abcd' item
union all select 'ABCD'
union all select 'defg'
union all select 'deFg') items
no column name
Next Steps
  • You can see how the behavior of the database can impact the output, so next time you are looking for distinct values make sure you understand your database settings or use the COLLATE option
  • Here is another tip that shows you how you can use COLLATE in your WHERE clause Case Sensitive Search on a Case Insensitive SQL Server
  • Special thanks to Andy Novick at Novick Software for this tip idea


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Greg Robidoux Greg Robidoux is the President and founder of Edgewood Solutions, a technology services company delivering services and solutions for Microsoft SQL Server. He is also one of the co-founders of MSSQLTips.com. Greg has been working with SQL Server since 1999, has authored numerous database-related articles, and delivered several presentations related to SQL Server. Before SQL Server, he worked on many data platforms such as DB2, Oracle, Sybase, and Informix.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Wednesday, August 7, 2013 - 2:43:44 AM - Jagadesh Back To Top (26124)

Thanks Greg, it helps me lot..


Wednesday, June 11, 2008 - 8:50:37 AM - grobido Back To Top (1139)

Here is an example running this against the AdventureWorks database

SELECT DISTINCT City FROM Person.Address

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Address'. Scan count 1, logical reads 216, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SELECT DISTINCT TOP 10 City FROM Person.Address

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Address'. Scan count 1, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The TOP 10 requires a lot less logical reads getting data for the entire table.

 


Wednesday, June 11, 2008 - 2:19:55 AM - Praveen Back To Top (1133)

Difference between " Select DISTINCT TOP 10 column_name from tbl " and " Select DISTINCT column_name from tbl " . And how many roqws it will execute and how much each query will take.

 SQL 2005















get free sql tips
agree to terms