By: Nat Sundar | Comments (2) | Related: More > Import and Export
Problem
I need to start importing and exporting data from a table that has an VARCHAR(MAX) column. What is the best way to handle a column in a SQL Server table with the VARCHAR(MAX) datatype in a SQL Server Integration Services (SSIS) package? Can you provide an example?
Solution
VARCHAR(MAX) Datatype
In SQL Server, the VARCHAR(MAX) data type is referred as a Large Object (LOBs). The column will exceed 8 KB of storage. This data type is typically used to store a large amount of text data in a database table. For example, this data type can be used to store documents in a table. Though SSIS can read the data from a VARCHAR(MAX) data type, however there are some limitations to process the actual data. All the SSIS string functions only support the VARCHAR or NVARCHAR data types. Hence it is not possible to use the string manipulation functions with VARCHAR(MAX) datatype.
Working with DT_NEXT and DT_NTEXT Data Types in SSIS
SSIS includes the DT_NEXT and DT_NTEXT data types to support the SQL Server VARCHAR(MAX) and NVARCHAR(MAX) data types. These data types can store up to 2 GB of data. Unfortunately none of the built-in SSIS functions support the VARCHAR(MAX) data type. Hence you may need to convert the VARCHAR(MAX) to a VARCHAR column for processing. However in this approach you may have to truncate the data beyond 8000 characters, as the varchar datatype can accommodate a maximum of 8000 characters. Unfortunately, the actual data will be truncated with this approach.
The post will help you to understand and evaluate a method to process the VARCHAR(MAX) data type in an SSIS package. This method can also be used for high volume transactions.
Setup a Sample Database
Based on the Adventureworks database, a sample table "SQLServerLog" has been created with two columns. The first column, SQLLogID is an identity column and the second column is a VARCHAR(MAX) column containing the details of the SQL Server error log which is separated by the “~” character. The below mentioned script will help you to create the table and load sample data into the table.
Create table dbo.SQLServerLog ( SQLLogID Int Identity(1,1), SQLLog varchar(max) ) Go Insert into dbo.SQLServerLog SELECT '''' + DatabaseUser + '''' + '~' + '''' + [Event] + '''' + '~' + '''' + [Schema] + '''' + '~' + '''' + [Object] + '''' + '~' + '''' + [TSQL] + '''' + '~' + '''' + Convert(varchar(max),XmlEvent) + '''' + '*' FROM [AdventureWorks2012].[dbo].[DatabaseLog] Where DataLength(XMLEvent) > 8000
It has been observed that the data column "SQLLog" has details about DatabaseUser, [Event], [Schema], [Object], [TSQL] and XmlEvents for a database log. All the column values have been delimited by the character '~'. Now our objective is to read each row and parse the columns. After the successful completion of parsing it is expected to load data into the target table.
The target table and the expected outcome has been shown in the picture below.
The target table has six columns (DatabaseUser, Event, Schema, Object, TSQL and XmlEvent) and it is expected that the SSIS package reads through the columns and loads the data into these columns for each record.
Proposed SSIS Solution Using Script Component to Parse Column Values
In this approach, I have a data flow task which will extract data (SQLLogID and SQLLog) from the SQLServerlog table.
The below picture highlights that the table “SQLServerLog” will be used as a source in the Data Flow Task.
A transformation task has been developed based on the Script component. This Script component transformation will use two columns (SQLLogID and SQLLog) from the SQLServerLog table as shown in the picture below.
Using Script Component to Parse Column Values in SSIS
This script component will read every row and parse the data values based on the “~” delimiter and provides multiple columns as output. It is observed from the below picture that the Script component has only two columns (SQLLogID and SQLLog) as input, whereas there are 8 columns (Database User, Event, Schema, Object, TSQL, XMLEvent, DatabaseLogID and SQLLogID) as output.
In the below mentioned paragraph, I have simplified the process to help you to understand the technical details of the script task.
Since the column is a VARCHAR(MAX) data type in SQL Server, SSIS treats this as a DT_TEXT datatype. The DT_TEXT and DT_NTEXT are text data types and there are several limitations related to handling these datatypes in SSIS. For example, none of the string functions would work with these data types. Hence I am using the GetBlobData function and this will convert the DT_TEXT data into a byte array. Since the data in the ASCII format, I am using the ASCII.GetString function to convert the byte array into a string. Once the DT_Text data has been converted into a string, all the string functions can be applied. So I am using the Split string function to parse multiple columns. The Split string function will return all the values as string array. Multiple column values can be extracted from the array using the index.
The actual contents of the script task component is below.
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding(); byte[] bytes = Row.SQLLog.GetBlobData(0, (int)Row.SQLLog.Length); string strSQLLog = Encoding.ASCII.GetString(bytes); strSQLLog = strSQLLog.Replace("'", ""); String[] splitString = new string[] { "~" }; String[] strSQLElements; strSQLElements = strSQLLog.Split(splitString, StringSplitOptions.None); if (strSQLElements.Length > 0) { OutputBuffer.AddRow(); OutputBuffer.DatabaseUser = strSQLElements[0]; OutputBuffer.Event = strSQLElements[1]; OutputBuffer.Schema = strSQLElements[2]; OutputBuffer.Object = strSQLElements[3]; OutputBuffer.TSQL = strSQLElements[4]; OutputBuffer.XmlEvent.AddBlobData(encoding.GetBytes(strSQLElements[5])); }
Finally the data will be loaded into the target table as shown below.
After successful execution, we can see the data in the target table as shown below.
The above approach is very simple and easy to understand if you have previous experience with C# / VB.Net. I have provided this sample SSIS Package as a starting point for community members new to scripting technologies.
Summary
This method has been tested to parse 100,000 records within a minute. This method is far better than handling VARCHAR(MAX) columns using native T-SQL. I recommend you try this method in a Development environment with similar data volumes to see your results.
Next Steps
- Read more about the VARCHAR(MAX) data type.
- Read other SSIS Tips here.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips