Fixing Cannot Convert Between Unicode and Non-Unicode String Data Types in SSIS
By: Hadi Fadlallah | Updated: 2023-01-20 | Comments | Related: More > Integration Services Development
Most of the time string values are stored as Unicode within data sources such as Excel worksheets and flat files, while the destination data column in the SQL database only supports encoded strings. Building the package throws the following exception: "Cannot convert between Unicode and non-Unicode string data types". How do I resolve this?
In this article, we will reproduce the "Cannot convert between Unicode and non-Unicode string data types" exception and explain several solutions to this issue.
We will cover 5 solutions to handle this issue:
- Using a Data Conversion Transformation
- Using a Derived Column Transformation
- Converting the Data Type Implicitly in the Data Source Component
- Using a Script Component
- Converting the Data Type Explicitly in the Data Source Component
What is a Unicode String? Why is it Used?
In general, Unicode strings are used to store data in multiple languages within one database. As the Internet grows, it becomes even more critical to support multiple client computers in different geographical locations. Selecting a code page that supports all the characters required by a worldwide audience would be challenging. This is why several data sources use stored strings as Unicode. In contrast, local databases usually store strings with a specific code page since it saves more disk space and could guarantee higher performance.
Reproducing the Error
To reproduce the error, we created a text file containing one column that stores username information. Then, we saved the text file as Unicode, as shown in the screenshot below.
Now, we created a new Integration Services project using Visual Studio. Then we added a flat file connection manager to read data from the text file and an OLE DB connection manager to establish a connection with the destination SQL Server database. In the SQL Server database, we create a table with one column of type VARCHAR(50).
Within the SSIS package control flow, we added a new data flow task that contains a Flat File source component and an OLE DB Source component. Once we link both components and map the source and destination columns in the OLE DB destination, an error icon appears on the OLE DB destination component with the following error message:
1 - Using a Data Conversion Transformation
The first solution that may solve this issue is by using data conversion transformation. As described in the SSIS toolbox, this component "converts data from one data type to another. Converted data is written to a new output column. For example, convert data from column of decimal type to an output column of integer type, or convert ANSI string to Unicode."
We should add a data conversion transformation to convert data from Unicode to non-Unicode. Within the data conversion transformation editor, we should select the Username column, change the data type from DT_WSTR to DT_STR, and specify the desired code page, as shown in the screenshot below.
Next, open the OLE DB destination editor and map the newly generated column with destination columns, as shown below.
After closing the OLE DB destination editor, we should note that the error message disappeared.
2 - Using a Derived Column Transformation
The second solution that may solve this error is using a derived column transformation. The SSIS toolbox describes that a derived column transformation "creates new column values by applying an expression to transformation input column."
We should add a derived column transformation to the data flow task and write an expression that converts the Username column to a non-Unicode column. The image below shows that the derived column transformation supports several type cast functions. To solve the "cannot convert between Unicode and non-Unicode" error, we should use the (DT_STR) type cast function.
As shown in the image below, we added a new derived column with the following expression:
(DT_STR, 50, 1252) [Username]
Now, after mapping the source and destination column within the OLE DB destination component, we can check that the error message has disappeared.
3 - Converting the Data Type Implicitly in the Data Source Component
The third solution that may solve this issue is changing the source column data types within SSIS. This can be done by opening the source component advanced editor, as shown below.
Then, in the advanced editor, go to the "Input and Output Properties" tab. Within the Source output > external columns, select the column with a Unicode string data type and change it to an encoded string, as shown below.
After changing the column data type, the following warning will appear on the source component:
"The external columns for Flat File Source are out of synchronization with the data source columns. The external column "Username" needs to be updated."
This warning appears because we need to refresh the source component metadata by only opening the source component editor and clicking the "OK" button. We can note that the output column data type also changed within the columns tab in the Source editor.
4 - Using a Script Component
In several cases, the three solutions mentioned above do not solve the error, and the exception is still thrown on the package execution. This usually occurs once the values within the source file contain Unicode characters that cannot be converted.
In this case, we can use a script component that removes or replaces all Unicode characters. This is done mainly using a regular expression:
row.UsernameNonUni = Regex.Replace(row.Username, @"[^\u0000-\u007F]+", "");
Another option is to stage the data within a SQL table with Unicode columns, then use a SQL command to convert the values to an encoded string:
SELECT CAST([Username] as VARCHAR(50)) from staging_table
5 - Converting the Data Type Explicitly in the Data Source Component
The last solution we will provide in this tip is to convert or cast the source column in the source component if this option is available. For example, if we are reading the data using an OLE DB source component, we can use the "SQL Command" data access mode instead of the "Table or View".
Then we can use the data conversion functions in the SQL Command to convert the data from Unicode to non-Unicode characters.
SELECT CAST([Username] as VARCHAR(50)), CONVERT(VARCHAR(50),[Username]) from source_table
Read more about the SSIS package performance optimization methods:
Before putting your SSIS package into production, it is highly recommended to read more about SSIS performance optimization techniques and the bad habits to avoid:
- SSIS Bad Habits: Decreasing SSIS Package Performance
- SSIS Bad Habits: Inefficient Data Staging Strategies
- SSIS Bad Habits: Wrong OLE DB components configuration
- SSIS Bad Habits: Inefficient data staging strategies - Part 2
- SSIS Bad Habits: Slow lookup transformations
- SSIS Bad Habits: Using Sort and Merge Join Components
About the author
View all my tips
Article Last Updated: 2023-01-20