Problem using DISTINCT in case insensitive SQL Server databases
By: Greg Robidoux | Comments (3) | Related: More > TSQL
SQL Server gives you the ability to store mixed case data in your databases, but depending on how you create your databases SQL Server will ignore the case when you issue T-SQL commands. One of the problems you may be faced with is that you want to get a distinct list of values from a table to show the differences in your table, but if your database is setup as case insensitive the DISTINCT clause does not show the differences it all gets grouped together. So based on this what options are there?
To illustrate this behavior we are going to look at a couple ways this works using a case sensitive database and a case insensitive database.
The first set of queries uses the AdventureWorks database which is configured as case sensitive. To determine the collation for your databases you can run this query:
SELECT name, collation_name FROM master.sys.databases
We are querying the data from Preson.Contact in the AdventureWorks database. All data is setup as mixed case, so we have no duplicates when we run this query.
SELECT DISTINCT TOP 10 FirstName FROM Person.Contact WHERE FirstName LIKE 'A%' ORDER BY 1
If we update one of the record and change the FirstName from "Adam" to "ADAM" we should get two different values when we run the query.
UPDATE Person.Contact SET FirstName = 'ADAM' WHERE ContactID = 62 GO SELECT DISTINCT TOP 10 FirstName FROM Person.Contact WHERE FirstName LIKE 'A%' ORDER BY 1
As you can see we now show both "Adam" and "ADAM" as two different values.
The next thing we are going to do is to create a new table in a case insensitive database and then load all of the data from Person.Contact into this new table.
CREATE TABLE Test.dbo.contact (FirstName nvarchar(50)) GO INSERT INTO Test.dbo.contact SELECT FirstName FROM Person.Contact GO SELECT DISTINCT TOP 10 FirstName FROM Test.dbo.contact WHERE FirstName LIKE 'A%' ORDER BY 1 GO
When we run the SELECT query you can see that the output combines both "Adam" and "ADAM" since case is ingored.
To get around this we can change the query as follows to force the collation to case sensitive on the FirstName column.
SELECT DISTINCT TOP 10 FirstName COLLATE sql_latin1_general_cp1_cs_as FROM Test.dbo.contact WHERE FirstName LIKE 'A%' ORDER BY 1
When this is run we now have the values of "Adam" and "ADAM".
So depending on how your database is setup you may or may not see the differences.
To show you another example here is just a quick way of selecting the case sensitive or case insensitive option.
The first query we run is using case sensitive, so all four rows should show up.
select distinct (item) COLLATE sql_latin1_general_cp1_cs_as FROM ( select 'abcd' item union all select 'ABCD' union all select 'defg' union all select 'deFg') items
All that is different in the next query is the name of the collation. When this query is run using case insensitive, we only get two rows.
select distinct (item) COLLATE sql_latin1_general_cp1_ci_ai FROM ( select 'abcd' item union all select 'ABCD' union all select 'defg' union all select 'deFg') items
- You can see how the behavior of the database can impact the output, so next time you are looking for distinct values make sure you understand your database settings or use the COLLATE option
- Here is another tip that shows you how you can use COLLATE in your WHERE clause Case Sensitive Search on a Case Insensitive SQL Server
- Special thanks to Andy Novick at Novick Software for this tip idea
About the author
View all my tips