In my last tip, "Importing Data From Excel Using SSIS - Part 1" we found out that the SSIS Excel Connection Manager determines the data type of each column of the worksheet on the basis of data for that particular column from the first 8 rows. This is the default behavior and connection manager uses a value in the registry key to consider the number of rows for data type determination. We also saw that the data import fails if the data in others rows (other than first 8 rows) is not compatible or has a longer length than the data in the first 8 rows. I talked about different ways to handle this to make the data import succeed. I also talked about the easiest solution which is to change the registry key value, but this has its own implications and hence the question; what other options are there without creating too much additional overhead?
In my last tip, the easiest solution was to make a change in the registry but in many scenarios you might not have control to make this change for varous reasons. Even if you have control in changing this setting, this change might cause your other SSIS packages to perform poorly (based on the amount of data you have in an Excel worksheet) and it may impact other systems as well wherever this registry key is being referenced. So I have another solution for this issue.
The basic idea of this solution is to convert the Excel worksheet to CSV file and use the Flat File Connection Manager/Source adapter to import data from the CSV file. With the Flat File Connection Manager/Source Adapter we have more control to define the data type and length of each column.
These are questions that come to mind when thinking about converting an Excel worksheet to a CSV file:
When saving an Excel worksheet using a csv extension does this make it a CSV file?
A CSV file uses a comma as column separator, but what if there are commas in the data itself?
What is the impact of converting an Excel worksheet to CSV file?
Well, simply saving the Excel worksheet using a CSV extension will not make it CSV file as the storage format is different for both of these file types. Rather we can use the Excel Object Library to save the Excel worksheet as a CSV file using the Script Task in SSIS and then we can import the data directly from the CSV file as shown below.
In the Script Task, when writing code for conversion, first of all you need to add a reference to Microsoft.Office.Interop.Excel under .NET components as shown below:
After adding the required reference, the References node in the Solution Explorer will look like this:
Once you have added the required reference, you can add these lines of code. The complete list of code for converting Excel worksheet to CSV file is provided below. You need to provide the location and name of the Excel worksheet along with the name of the worksheet itself and then the location and name for the CSV file which will be created:
Microsoft SQL Server Integration Services Script Task
Write scripts using Microsoft Visual C# 2008.
The ScriptMain is the entry point class of the script.
using Excel = Microsoft.Office.Interop.Excel;
[System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
#region VSTA generated code
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
The execution engine calls this method when the task executes.
To access the object model, use the Dts property. Connections, variables, events,
and logging features are available as members of the Dts property as shown in the following examples.
To reference a variable, call Dts.Variables["MyCaseSensitiveVariableName"].Value;
To post a log entry, call Dts.Log("This is my log text", 999, null);
To fire an event, call Dts.Events.FireInformation(99, "test", "hit the help message", "", 0, true);
To use the connections collection use something like the following:
ConnectionManager cm = Dts.Connections.Add("OLEDB");
cm.ConnectionString = "Data Source=localhost;Initial Catalog=AdventureWorks;Provider=SQLNCLI10;Integrated Security=SSPI;Auto Translate=False;";
Before returning from this method, set the value of Dts.TaskResult to indicate success or failure.
To open Help, press F1.
private static Workbook mWorkBook;
private static Sheets mWorkSheets;
private static Worksheet mWSheet1;
private static Excel.Application oXL;
private static string ErrorMessage = string.Empty;
public void Main()
string sourceExcelPathAndName = @"D:\Excel Import\Excel Import.xls";
string targetCSVPathAndName = @"D:\Excel Import\Excel Import.csv";
string excelSheetName = @"Sheet1";
string columnDelimeter = @"|#|";
int headerRowsToSkip = 0;
if (ConvertExcelToCSV(sourceExcelPathAndName, targetCSVPathAndName, excelSheetName, columnDelimeter, headerRowsToSkip) == true)
Dts.TaskResult = (int)ScriptResults.Success;
Dts.TaskResult = (int)ScriptResults.Failure;
catch (Exception ex)
Dts.TaskResult = (int)ScriptResults.Failure;
public static bool ConvertExcelToCSV(string sourceExcelPathAndName, string targetCSVPathAndName, string excelSheetName, string columnDelimeter, int headerRowsToSkip)
oXL = new Excel.Application();
oXL.Visible = false;
oXL.DisplayAlerts = false;
Excel.Workbooks workbooks = oXL.Workbooks;
mWorkBook = workbooks.Open(sourceExcelPathAndName, 0, false, 5, "", "", false, XlPlatform.xlWindows, "", true, false, 0, true, false, false);
//Get all the sheets in the workbook
mWorkSheets = mWorkBook.Worksheets;
//Get the specified sheet
mWSheet1 = (Worksheet)mWorkSheets.get_Item(excelSheetName);
Excel.Range range = mWSheet1.UsedRange;
//deleting the specified number of rows from the top
for (int i = 0; i < headerRowsToSkip; i++)
rngCurrentRow = range.get_Range("A1", Type.Missing).EntireRow;
//replacing ENTER with a space
range.Replace("\n", " ", Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing);
//replacing COMMA with the column delimeter
range.Replace(",", columnDelimeter, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing);
Type.Missing, Type.Missing, Type.Missing, Type.Missing, Microsoft.Office.Interop.Excel.XlSaveAsAccessMode.xlExclusive,
Type.Missing, Type.Missing, Type.Missing,
catch (Exception ex)
ErrorMessage = ex.ToString();
if (mWSheet1 != null) mWSheet1 = null;
if (mWorkBook != null) mWorkBook.Close(Type.Missing, Type.Missing, Type.Missing);
if (mWorkBook != null) mWorkBook = null;
if (oXL != null) oXL.Quit();
if (oXL != null) oXL = null;
Below you can see the Excel worksheet which I am using as the source file for conversion. You will notice row number 15 has a long string and also the description of this row contains commas in it as well.
After the conversion, the CSV file will look like this. You will notice where there were commas we now have some special characters "|#|":
The idea behind this is, before conversion replace all the commas with some special characters and after importing the data from the CSV file update the special characters back to a comma. This can be done using code such as this.
SELECT * FROM [dbo].[ProductInformation]
SET [Description] = REPLACE([Description], '|#|', ',')
SELECT * FROM [dbo].[ProductInformation]
In order to use this approach, we need to have extra storage space for having both a CSV file along with an Excel file. Apart from that, the Excel object library will take a few seconds to save the file as a CSV file. I haven't tried it on very large file, though I think this should not take much longer. Once you have the data loaded you can do anything else you need to at that point.
Hey good man thank you a lot for this information. As i have seen you quite dominate rendering excel files in SSIS id like to ask you if you know how to password protect spreadsheets once created. The flow would be like this: grab the xls from an specific folder, open it, password protect it, close it. I have seen some ssis extensions suites but as far as i know there should be a way to put some code in the mpackage so that the processed file can be automized. i would endless appreciate this, Cheers!
Monday, October 08, 2012 - 2:17:40 PM - Megan Brooks
I have been having to deal with these issues for a number of years. As I blogged back in 2009, my solution has been to hand-convert each Excel file to tab-delimited text and then read the data using a flat-file data source with suitable metadata settings. I avoided CSV because of the potential for incorrectly handling embedded commas. CSV can work if the data is properly enclosed in quotation marks and the double-quote is specified as the text delimiter in the flat-file source, but as I recall Excel doesn't always write or read the data correctly. I don't remember the details now, but it seems like Import/Export operations were broken in this regard. Or maybe it was Open/Save As. Always test.
I hadn't thought about doing the Excel-to-text conversion right in the SSIS package. I don't have enough data volume for it to be a problem right now (most of my Excel data comes in once or twice a year, and there aren't that many different files), but if there is more in the future then I will look into it. Thank you!
Does anyone know, in a SSIS loop to read multiple Excel files, does the excel connection manager scan the first N rows of EACH file it opens to set the data types for each pass OR does it use only on the first file it finds ?
Hi, Arshad. Thanks for posting the solution described above. I have a question. Does using the Microsoft.Office.Interop.Excel dll require Office to be installed on the server? I thought that I saw a post about this. Thanks.
Chris, You do have to install Exel on the server, nothing else works. I tried to add Microsoft.Office.Interop.Excel.dll to the windows\asembly 10 diffrent ways with no luck. Installing Excel will put the dll in the asembly so you can use it. The package will pickup and build the script fine, but will not execute correctly if the dll is not in the assembly.
My problme now is that my package wont run the script from a job. it does run correctly in debug mode though. I find it difficult to debug because Arshad's code dont have any descriptive error output with it. I'm a SQL guy, not a c# programmer. Can anybody help out with some code for error debugging in this script?
Arshad - Great Job with this solution. This helped me out big time.
I am trying to load a pipe '|' delimited file using SSIS, I am getting the following error related to target code page. I appreciate if you can provide some help.
Error: 0xC02020A1 at Data Flow Task 1, Source - Account_txt : Data conversion failed. The data conversion for column "FORMULA0,Account" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.". Error: 0xC020902A at Data Flow Task 1, Source - Account_txt : The "output column "FORMULA0,Account" (26)" failed because truncation occurred, and the truncation row disposition on "output column "FORMULA0,Account" (26)" specifies failure on truncation. A truncation error occurred on the specified object of the specified component.
I was trying it. Have excel installed as well. The problem is that I can't add the reference Microsoft.Office.Interop.Excel under .NET components in the Script Task, because it is not available in the list. Any help will be appreciated?
Monday, June 03, 2013 - 10:32:55 AM - Stuart Steedman
It would seem that adding "IMEX=1" would solve everything, but in reality it doesn't. Even with IMEX set to 1, if a column has data that Excel's ISAM driver interprets as numeric in the first eight rows, it will still set the column datatype as "numeric" whether IMEX=1 or not, and will still return a NULL if a subsequent row contains a string. My example is a customer purchase order number field where the first fifteen entries are numeric, like 65088976, and the sixteenth entry is C65098787. The Excel Source returns this cell, and any others like it, as NULL. Increasing the number of rows to use for the guess is out of the question, as I cannot meddle with registry settings on a production server. (I tried it on my local workstation to see if it would work, and it didn't, anyway, so it's still not a solution.) I tried inserting data conversion tasks, derived column tasks, etc, but the dropping of the actual cell value happens at the moment the Excel source reads the row.
Wednesday, December 18, 2013 - 2:23:45 PM - Scott Coleman
If you're going to use Excel interop to load the spreadsheet, why don't you just move the code from a Script Task to a Script Source Component? Just add a loop to copy the cell values row by row into the row buffer. Then you have complete control over how every column is interpreted.