Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Import and Export VARCHAR(MAX) data with SQL Server Integration Services (SSIS)


By:   |   Read Comments   |   Related Tips: More > Import and Export

New Whitepaper > 10 Ways We Can Steal Your Data >> download now


Problem

I need to start importing and exporting data from a table that has an VARCHAR(MAX) column.  What is the best way to handle a column in a SQL Server table with the VARCHAR(MAX) datatype in a SQL Server Integration Services (SSIS) package?  Can you provide an example?

Solution

VARCHAR(MAX) Datatype

In SQL Server, the VARCHAR(MAX) data type is referred as a Large Object (LOBs). The column will exceed 8 KB of storage. This data type is typically used to store a large amount of text data in a database table. For example, this data type can be used to store documents in a table. Though SSIS can read the data from a VARCHAR(MAX) data type, however there are some limitations to process the actual data. All the SSIS string functions only support the VARCHAR or NVARCHAR data types. Hence it is not possible to use the string manipulation functions with VARCHAR(MAX) datatype.

Working with DT_NEXT and DT_NTEXT Data Types in SSIS

SSIS includes the DT_NEXT and DT_NTEXT data types to support the SQL Server VARCHAR(MAX) and NVARCHAR(MAX) data types. These data types can store up to 2 GB of data. Unfortunately none of the built-in SSIS functions support the VARCHAR(MAX) data type. Hence you may need to convert the VARCHAR(MAX) to a VARCHAR column for processing. However in this approach you may have to truncate the data beyond 8000 characters, as the varchar datatype can accommodate a maximum of 8000 characters. Unfortunately, the actual data will be truncated with this approach.

The post will help you to understand and evaluate a method to process the VARCHAR(MAX) data type in an SSIS package. This method can also be used for high volume transactions.

Setup a Sample Database

Based on the Adventureworks database, a sample table "SQLServerLog" has been created with two columns. The first column, SQLLogID is an identity column and the second column is a VARCHAR(MAX) column containing the details of the SQL Server error log which is separated by the “~” character. The below mentioned script will help you to create the table and load sample data into the table.

Create table dbo.SQLServerLog ( SQLLogID Int Identity(1,1), SQLLog varchar(max) )
Go
 
Insert into dbo.SQLServerLog 
SELECT 
   '''' + DatabaseUser + '''' 
   + '~' + '''' + [Event] + '''' 
   + '~' + '''' + [Schema] + '''' 
   + '~' + '''' + [Object] + '''' 
   + '~' + '''' + [TSQL] + '''' 
   + '~' + '''' + Convert(varchar(max),XmlEvent) + '''' + '*' 
FROM 
   [AdventureWorks2012].[dbo].[DatabaseLog] 
Where 
   DataLength(XMLEvent) > 8000
   

It has been observed that the data column "SQLLog" has details about DatabaseUser, [Event], [Schema], [Object], [TSQL] and XmlEvents for a database log. All the column values have been delimited by the character '~'. Now our objective is to read each row and parse the columns. After the successful completion of parsing it is expected to load data into the target table.

The target table and the expected outcome has been shown in the picture below.

Target Table Output - Description: Target Table Output

The target table has six columns (DatabaseUser, Event, Schema, Object, TSQL and XmlEvent) and it is expected that the SSIS package reads through the columns and loads the data into these columns for each record.

Proposed SSIS Solution Using Script Component to Parse Column Values

In this approach, I have a data flow task which will extract data (SQLLogID and SQLLog) from the SQLServerlog table.

SSIS Package Design - Description: SSIS Package Design
SSIS Data Flow Task - Description: Data Flow Task

The below picture highlights that the table “SQLServerLog” will be used as a source in the Data Flow Task.

SSIS OLEDB Source View - Description: OLEDB Source View

A transformation task has been developed based on the Script component. This Script component transformation will use two columns (SQLLogID and SQLLog) from the SQLServerLog table as shown in the picture below.

Script Task Input Columns - Description: Script Task Input Columns

Using Script Component to Parse Column Values in SSIS

This script component will read every row and parse the data values based on the “~” delimiter and provides multiple columns as output. It is observed from the below picture that the Script component has only two columns (SQLLogID and SQLLog) as input, whereas there are 8 columns (Database User, Event, Schema, Object, TSQL, XMLEvent, DatabaseLogID and SQLLogID) as output.

SSIS Script Output Columns - Description: Script Output Columns

In the below mentioned paragraph, I have simplified the process to help you to understand the technical details of the script task.

Since the column is a VARCHAR(MAX) data type in SQL Server, SSIS treats this as a DT_TEXT datatype. The DT_TEXT and DT_NTEXT are text data types and there are several limitations related to handling these datatypes in SSIS. For example, none of the string functions would work with these data types. Hence I am using the GetBlobData function and this will convert the DT_TEXT data into a byte array. Since the data in the ASCII format, I am using the ASCII.GetString function to convert the byte array into a string. Once the DT_Text data has been converted into a string, all the string functions can be applied. So I am using the Split string function to parse multiple columns. The Split string function will return all the values as string array. Multiple column values can be extracted from the array using the index.

The actual contents of the script task component is below.

System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
byte[] bytes = Row.SQLLog.GetBlobData(0, (int)Row.SQLLog.Length); 
string strSQLLog = Encoding.ASCII.GetString(bytes); 
strSQLLog = strSQLLog.Replace("'", ""); 
String[] splitString = new string[] { "~" }; 
String[] strSQLElements; 
strSQLElements = strSQLLog.Split(splitString, StringSplitOptions.None); 
if (strSQLElements.Length > 0) 
{ 
   OutputBuffer.AddRow(); 
   OutputBuffer.DatabaseUser = strSQLElements[0];
   OutputBuffer.Event = strSQLElements[1];
   OutputBuffer.Schema = strSQLElements[2];
   OutputBuffer.Object = strSQLElements[3];
   OutputBuffer.TSQL = strSQLElements[4];
   OutputBuffer.XmlEvent.AddBlobData(encoding.GetBytes(strSQLElements[5]));
}
   

Finally the data will be loaded into the target table as shown below.

Target Table Mapping - Description: Target Table Mapping

After successful execution, we can see the data in the target table as shown below.

SSIS Successful Execution - Description: SSIS Successful Execution
Target Table After data load - Description: Target Table After data load

The above approach is very simple and easy to understand if you have previous experience with C# / VB.Net.  I have provided this sample SSIS Package as a starting point for community members new to scripting technologies.

Summary

This method has been tested to parse 100,000 records within a minute.  This method is far better than handling VARCHAR(MAX) columns using native T-SQL. I recommend you try this method in a Development environment with similar data volumes to see your results.

Next Steps


Last Update:


signup button

next tip button



About the author
MSSQLTips author Nat Sundar Nat Sundar is working as an independent SQL BI consultant in the UK with Bachelor Degree in Engineering.

View all my tips





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


SQL tips:

*Enter Code refresh code     



Learn more about SQL Server tools