By: Tal Olier | Comments (6) | Related: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | > Temp Tables
Problem
When we write code, we sometimes use temporary tables. Using a temporary table is a convenient way to store intermediate results, and then use them at a later phase in our application logic. When using temporary tables without specifying a collation (for the column used) SQL Server will inherit the collation for our newly created temporary table from the SQL Server instance default. In case our SQL Server database has a different collation setting than the instance's default (there might be various reasons for that), we might fall into a trap of having two tables containing similar information, but with different collation settings. How can I create the temporary tables with the correct collation?
Solution
In order to illustrate this issue, let's consider the following example as a problem case: our application needs to import customer's data from a different source database, unfortunately the customers that exists in both databases has different IDs for their customer records and only their names can be used in order to find matching customers.
It has been decided that the merging process would perform the next steps:
- Copy both customer's information tables into temporary tables called TMP_CUSTOMERS.
- Join (left join) our main CUSTOMERS table with TMP_CUSTOMERS_A and copy the relevant data into our CUSTOMERS table via an update statement.
Script for setting up a demo (for step 1 above):
use master go if exists (select null from sys.databases where name like 'source') drop database source; go if exists (select null from sys.databases where name like 'target') drop database target; go create database source collate Latin1_General_CI_AS; go create database target collate Latin1_General_CI_AI; go use source go if OBJECT_ID('CUSTOMERS') is not null drop table CUSTOMERS; go create table CUSTOMERS(customer_id int not null primary key, customer_name varchar(20) not null unique, sales_2010 int); go insert into CUSTOMERS (customer_id, customer_name, sales_2010) values (1,'Microsoft', 100000); insert into CUSTOMERS (customer_id, customer_name, sales_2010) values (2,'Google', 150000); insert into CUSTOMERS (customer_id, customer_name, sales_2010) values (3,'Apple', 200000); go use target go if OBJECT_ID('CUSTOMERS') is not null drop table CUSTOMERS; go create table CUSTOMERS ( customer_id int not null primary key, customer_name varchar(20) not null unique, phone_number int, sales_total int ); go insert into CUSTOMERS (customer_id, customer_name, phone_number) values (100,'Microsoft', 1111111); insert into CUSTOMERS (customer_id, customer_name, phone_number) values (200,'Google', 2222222); insert into CUSTOMERS (customer_id, customer_name, phone_number) values (300,'Apple', 3333333); insert into CUSTOMERS (customer_id, customer_name, phone_number) values (400,'Oracle', 4444444); go if OBJECT_ID('tempdb..#TMP_CUSTOMERS') is not null drop table #TMP_CUSTOMERS go select * into #TMP_CUSTOMERS from source..CUSTOMERS; go
Assuming that the source database is called "source" and that the target database is called "target" the query from step 2 above would look like:
UPDATE target_table SET target_table.sales_total = source_table.sales_2010 FROM CUSTOMERS target_table cross join #TMP_CUSTOMERS source_table WHERE target_table.customer_name = source_table.customer_name; go
Of course we would expect that the result would be the number of records affected, but instead we get the following error:
Msg 468, Level 16, State 9, Line 3 Cannot resolve the collation conflict between "Latin1_General_CI_AS" and "Latin1_General_CI_AI" in the equal to operation.
What has happened?
The error is raised because we asked SQL Server to compare two columns (customer_name in table CUSTOMERS and customer_name in table #TMP_CUSTOMERS) which have different collation settings, so SQL Server tells us it cannot compare two strings with different collations.
In the example above, I have created the source and target tables with different collations (Latin1_General_CI_AS vs. Latin1_General_CI_AI), so when we copy the data from the source database the #TMP_CUSTOMERS temporary table inherits the collation setting for the customer_name column from the source database which is Latin1_General_CI_AS.
Later on, when we run the update statement the column customer_name in the temporary table's collation setting collide with the target database CUSTOMERS table's customer_name column that has Latin1_General_CI_AI set as the collation for the customer_name column.
If we go one step further, and create the temporary table ourselves, this may yield another problem, take a look at the script below:
Dropping and re-creating the #TMP_CUSTOMERS prior to inserting records
if OBJECT_ID('tempdb..#TMP_CUSTOMERS') is not null drop table #TMP_CUSTOMERS go create table #TMP_CUSTOMERS ( customer_id int not null primary key, customer_name varchar(20) not null unique, sales_2010 int ); go insert into #TMP_CUSTOMERS select * from source..CUSTOMERS; go
Now, when I run the same update statement on my server:
UPDATE target_table SET target_table.sales_total = source_table.sales_2010 FROM CUSTOMERS target_table cross join #TMP_CUSTOMERS source_table WHERE target_table.customer_name = source_table.customer_name; go
I get the following error:
Msg 468, Level 16, State 9, Line 3 Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Latin1_General_CI_AI" in the equal to operation.
This has happened on my specific server because the default collation of my SQL Server instance is SQL_Latin1_General_CP1_CI_AS.
In order to check your SQL Server instance default collation you can use the following query:
SELECT SERVERPROPERTY ('Collation')
The common solution for this case is to tell SQL Server according to which collation we want the comparison to happen for example:
UPDATE target_table SET target_table.sales_total = source_table.sales_2010 FROM CUSTOMERS target_table cross join #TMP_CUSTOMERS source_table WHERE target_table.customer_name = source_table.customer_name collate Latin1_General_CI_AI; go
The result would of course be:
(3 row(s) affected)
This is nice to know there is a way to tell SQL Server the collation to use for the comparison, but this requires us to specify a relevant collation for the comparison and that the collation specified would exist for the SQL Server.
Many software providers would not like to be coupled to a specific collation nor be required to find out the current collation in order to be able to produce update statements or joins as demonstrated above.
So what is the resolution?
The solution would be to use the target table column when creating the temporary table instead of the source table column; later on copy the data from the source to the temporary table created. With this technique, we do not need to specify a collation and still get the collation to match every time we use it.
This can be done in the following way:
First we'll drop the table #TMP_CUSTOMERS that was created incorrectly:
if OBJECT_ID('tempdb..#TMP_CUSTOMERS') is not null drop table #TMP_CUSTOMERS go
Then we'll create the #TMP_CUSTOMERS in the following way:
select customer_id, ( select customer_name from CUSTOMERS where 1=2 ) customer_name, sales_2010 into #TMP_CUSTOMERS from source..CUSTOMERS where 1=2; go
Here are a few things to note about the statement above:
- The customer_name is fetched from the target table and in this way is also created with the same collation as the customer_name column in the CUSTOMERS table in the target database.
- The inner select is followed by "customer_name" making sure that the table created will have this name for the column created.
- I have added the logic "where 1=2" in the inner select statement to make sure that we do not get any records back in the inner select. I actually do not mind if we get 1 result back, but in case the inner select would return more than 1 row it will fail the whole statement.
- If we would have more varchar columns in the table, we'll have to repeat this trick for every column.
- We also the "1=2" in the where clause in order to make sure that only the table structure is copied without the data.
Now it is time to copy the data from the source table with the following statement:
insert into #TMP_CUSTOMERS select customer_id, customer_name, sales_2010 from source..CUSTOMERS
All that is left now is to run the update statement:
UPDATE target_table SET target_table.sales_total = source_table.sales_2010 FROM CUSTOMERS target_table cross join #TMP_CUSTOMERS source_table WHERE target_table.customer_name = source_table.customer_name; go
The result would be:
(3 row(s) affected)
If we check the results with this query:
select * from CUSTOMERS
The result will be as expected:
Next Steps
- Research for temporary table's performance optimization on the mssqltips.com website.
- Consider alternatives for temporary tables e.g. table variable.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips