Learn more about SQL Server tools

   
   















































Importing Data From Excel Using SSIS - Part 2

MSSQLTips author Arshad Ali By:   |   Read Comments (19)   |   Related Tips: More > Integration Services Excel
Problem

In my last tip, "Importing Data From Excel Using SSIS - Part 1" we found out that the SSIS Excel Connection Manager determines the data type of each column of the worksheet on the basis of data for that particular column from the first 8 rows. This is the default behavior and connection manager uses a value in the registry key to consider the number of rows for data type determination. We also saw that the data import fails if the data in others rows (other than first 8 rows) is not compatible or has a longer length than the data in the first 8 rows. I talked about different ways to handle this to make the data import succeed. I also talked about the easiest solution which is to change the registry key value, but this has its own implications and hence the question; what other options are there without creating too much additional overhead?

Solution

In my last tip, the easiest solution was to make a change in the registry but in many scenarios you might not have control to make this change for varous reasons.  Even if you have control in changing this setting, this change might cause your other SSIS packages to perform poorly (based on the amount of data you have in an Excel worksheet) and it may impact other systems as well wherever this registry key is being referenced. So I have another solution for this issue.

The basic idea of this solution is to convert the Excel worksheet to CSV file and use the Flat File Connection Manager/Source adapter to import data from the CSV file. With the Flat File Connection Manager/Source Adapter we have more control to define the data type and length of each column.

These are questions that come to mind when thinking about converting an Excel worksheet to a CSV file:

  • When saving an Excel worksheet using a csv extension does this make it a CSV file?
  • A CSV file uses a comma as column separator, but what if there are commas in the data itself?
  • What is the impact of converting an Excel worksheet to CSV file?

Well, simply saving the Excel worksheet using a CSV extension will not make it CSV file as the storage format is different for both of these file types. Rather we can use the Excel Object Library to save the Excel worksheet as a CSV file using the Script Task in SSIS and then we can import the data directly from the CSV file as shown below. 

Importing Data From Excel Using SSIS

In the Script Task, when writing code for conversion, first of all you need to add a reference to Microsoft.Office.Interop.Excel under .NET components as shown below: 

you need to add a reference to Microsoft.Office.Interop.Excel under .NET component

After adding the required reference, the References node in the Solution Explorer will look like this:

the References node in the Solution Explorer will look like this

Once you have added the required reference, you can add these lines of code. The complete list of code for converting Excel worksheet to CSV file is provided below. You need to provide the location and name of the Excel worksheet along with the name of the worksheet itself and then the location and name for the CSV file which will be created:

/*
   Microsoft SQL Server Integration Services Script Task
   Write scripts using Microsoft Visual C# 2008.
   The ScriptMain is the entry point class of the script.
*/
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using Excel = Microsoft.Office.Interop.Excel;
using Microsoft.Office.Interop.Excel; 
namespace ST_6dc747ff29cf41c6ac11b7c0bca33d19.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion
        /*
  The execution engine calls this method when the task executes.
  To access the object model, use the Dts property. Connections, variables, events,
  and logging features are available as members of the Dts property as shown in the following examples.
  To reference a variable, call Dts.Variables["MyCaseSensitiveVariableName"].Value;
  To post a log entry, call Dts.Log("This is my log text", 999, null);
  To fire an event, call Dts.Events.FireInformation(99, "test", "hit the help message", "", 0, true);
  To use the connections collection use something like the following:
  ConnectionManager cm = Dts.Connections.Add("OLEDB");
  cm.ConnectionString = "Data Source=localhost;Initial Catalog=AdventureWorks;Provider=SQLNCLI10;Integrated Security=SSPI;Auto Translate=False;";
  Before returning from this method, set the value of Dts.TaskResult to indicate success or failure.
  
  To open Help, press F1.
 */
        private static Workbook mWorkBook;
        private static Sheets mWorkSheets;
        private static Worksheet mWSheet1;
        private static Excel.Application oXL;
        private static string ErrorMessage = string.Empty;
        public void Main()
        {
            try
            {
                string sourceExcelPathAndName = @"D:\Excel Import\Excel Import.xls";
                string targetCSVPathAndName = @"D:\Excel Import\Excel Import.csv";
                string excelSheetName = @"Sheet1";
                string columnDelimeter = @"|#|";
                int headerRowsToSkip = 0;
                if (ConvertExcelToCSV(sourceExcelPathAndName, targetCSVPathAndName, excelSheetName, columnDelimeter, headerRowsToSkip) == true)
                {
                    Dts.TaskResult = (int)ScriptResults.Success;
                }
                else
                {
                    Dts.TaskResult = (int)ScriptResults.Failure;
                }
            }
            catch (Exception ex)
            {
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }
        public static bool ConvertExcelToCSV(string sourceExcelPathAndName, string targetCSVPathAndName, string excelSheetName, string columnDelimeter, int headerRowsToSkip)
        {
            try
            {
                oXL = new Excel.Application();
                oXL.Visible = false;
                oXL.DisplayAlerts = false;
                Excel.Workbooks workbooks = oXL.Workbooks;
                mWorkBook = workbooks.Open(sourceExcelPathAndName, 0, false, 5, "", "", false, XlPlatform.xlWindows, "", true, false, 0, true, false, false);
                //Get all the sheets in the workbook
                mWorkSheets = mWorkBook.Worksheets;
                //Get the specified sheet
                mWSheet1 = (Worksheet)mWorkSheets.get_Item(excelSheetName);
                Excel.Range range = mWSheet1.UsedRange;
                //deleting the specified number of rows from the top
                Excel.Range rngCurrentRow;
                for (int i = 0; i < headerRowsToSkip; i++)
                {
                    rngCurrentRow = range.get_Range("A1", Type.Missing).EntireRow;
                    rngCurrentRow.Delete(XlDeleteShiftDirection.xlShiftUp);
                }
                //replacing ENTER with a space
                range.Replace("\n", " ", Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing);
                //replacing COMMA with the column delimeter
                range.Replace(",", columnDelimeter, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing);
                mWorkBook.SaveAs(targetCSVPathAndName, XlFileFormat.xlCSVMSDOS,
                Type.Missing, Type.Missing, Type.Missing, Type.Missing, Microsoft.Office.Interop.Excel.XlSaveAsAccessMode.xlExclusive,
                Type.Missing, Type.Missing, Type.Missing,
                Type.Missing, false);
                return true;
            }
            catch (Exception ex)
            {
                ErrorMessage = ex.ToString();
                return false;
            }
            finally
            {
                if (mWSheet1 != null) mWSheet1 = null;
                if (mWorkBook != null) mWorkBook.Close(Type.Missing, Type.Missing, Type.Missing);
                if (mWorkBook != null) mWorkBook = null;
                if (oXL != null) oXL.Quit();
                System.Runtime.InteropServices.Marshal.ReleaseComObject(oXL);
                if (oXL != null) oXL = null;
                GC.WaitForPendingFinalizers();
                GC.Collect();
                GC.WaitForPendingFinalizers();
                GC.Collect();
            }
        }
    }
}

Below you can see the Excel worksheet which I am using as the source file for conversion.  You will notice row number 15 has a long string and also the description of this row contains commas in it as well.

you can see the Excel worksheet which I have used as source file for conversion

After the conversion, the CSV file will look like this. You will notice where there were commas we now have some special characters "|#|":

After the conversion, the CSV file will look like this

The idea behind this is, before conversion replace all the commas with some special characters and after importing the data from the CSV file update the special characters back to a comma.  This can be done using code such as this.

SELECT * FROM [dbo].[ProductInformation]
UPDATE [dbo].[ProductInformation]
SET [Description] = REPLACE([Description], '|#|', ',')
SELECT * FROM [dbo].[ProductInformation] 

In order to use this approach, we need to have extra storage space for having both a CSV file along with an Excel file. Apart from that, the Excel object library will take a few seconds to save the file as a CSV file. I haven't tried it on very large file, though I think this should not take much longer.  Once you have the data loaded you can do anything else you need to at that point.

Next Steps


Last Update: 10/2/2012


About the author
MSSQLTips author Arshad Ali
Arshad Ali is a SQL and BI Developer focusing on Data Warehousing projects for Microsoft.

View all my tips


print tip Print  
Become a paid author





join MSSQLTips for free SQL Server tips     



Learn more about SQL Server tools
Post a comment or let the author know this tip helped you.

       All comments are reviewed, so stay on subject or we may delete your comment.

*Name   *Email Notify for updates



       Note: your email address is not published. Required fields are marked with an asterisk (*)


Get free SQL tips:

*Enter Code refresh code     



Wednesday, October 29, 2014 - 12:30:11 PM - Hussaini Read The Tip

Hi Arshad,

Thanks Arshad for very clear and detail explanation. I know my question is not related this topic but more similar I want to Import data from MS Access and I tried few times but was not successful, and do you have any topics or notes that could help us.

 

Thanks,

Hussaini. 


Monday, March 31, 2014 - 7:21:24 PM - rick Read The Tip

How do we run this from a SQL Server Agent Job like Curly said? It doesn't want to run...


Wednesday, December 18, 2013 - 2:23:45 PM - Scott Coleman Read The Tip

If you're going to use Excel interop to load the spreadsheet, why don't you just move the code from a Script Task to a Script Source Component?  Just add a loop to copy the cell values row by row into the row buffer.  Then you have complete control over how every column is interpreted.


Monday, October 14, 2013 - 2:45:50 PM - Robert Mitchell Read The Tip

Stuart,

 

It would seem that adding "IMEX=1" would solve everything, but in reality it doesn't. Even with IMEX set to 1, if a column has data that Excel's ISAM driver interprets as numeric in the first eight rows, it will still set the column datatype as "numeric" whether IMEX=1 or not, and will still return a NULL if a subsequent row contains a string. My example is a customer purchase order number field where the first fifteen entries are numeric, like 65088976, and the sixteenth entry is C65098787. The Excel Source returns this cell, and any others like it, as NULL. Increasing the number of rows to use for the guess is out of the question, as I cannot meddle with registry settings on a production server. (I tried it on my local workstation to see if it would work, and it didn't, anyway, so it's still not a solution.) I tried inserting data conversion tasks, derived column tasks, etc, but the dropping of the actual cell value happens at the moment the Excel source reads the row. 


Monday, July 29, 2013 - 4:39:19 AM - Abhi Desai Read The Tip

Hi Arshad,

Thanks for such great article and your simple language.

Your Part 1 and Part 2 both are really helpful.

Ragards,
Abhi Desai


Tuesday, June 04, 2013 - 11:25:42 AM - Arshad Read The Tip

Hi Stuart and Izhar,

Please have a look at this.

http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/78b87712-8ffe-4c72-914b-f1c031ba6c75


Monday, June 03, 2013 - 10:32:55 AM - Stuart Steedman Read The Tip

This came up as a featured article today.  Yet, the reasons for it and the solution to the problem are both poorly conceived.

 

The solution to the problem is to set the connection string with the extra flag: IMEX=1

(As mentioned by Izhar Azati in the comments to Part 1)

This renders the whole purpose of this article obsolete.

 

Secondly, converting a multi-worksheet workbook into a CSV is not going to work and even if it were just a single worksheet, this is a massive overhead for no good reason.


Sunday, June 02, 2013 - 11:40:26 AM - Faran Read The Tip

Hi Guys 

I was trying it. Have excel installed as well. The problem is that I can't add the reference Microsoft.Office.Interop.Excel under .NET components  in the Script Task,  because it is not available in the list. Any help will be appreciated?

 

regards

 

Faran


Friday, May 24, 2013 - 2:29:44 AM - ashish Read The Tip

 

hi

i have created excel file from sql table but for varchar columns

i'm getting single quotes before text like for example

('Myname)

why is it so ?

i have used ado.net source and excel destination..


Monday, April 15, 2013 - 11:47:27 AM - KevH Read The Tip

Does this handle speach marks, commas and blank columns?


Thursday, March 21, 2013 - 10:36:03 AM - Kashif Read The Tip

Hi Arshad,

I am trying to load a pipe '|' delimited file using SSIS, I am getting the following error related to target code page. I appreciate if you can provide some help.

Error: 0xC02020A1 at Data Flow Task 1, Source - Account_txt [1]: Data conversion failed. The data conversion for column "FORMULA0,Account" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.".
Error: 0xC020902A at Data Flow Task 1, Source - Account_txt [1]: The "output column "FORMULA0,Account" (26)" failed because truncation occurred, and the truncation row disposition on "output column "FORMULA0,Account" (26)" specifies failure on truncation. A truncation error occurred on the specified object of the specified component.


Tuesday, January 08, 2013 - 9:17:56 AM - Imane Read The Tip

Hi,

I'm trying to find a way to change from Excel to text file using SQL code.

It is somehow like a converter from Excel to .txt file with the rules I can add to that txt file in order to have the structure that I want.

Can anyone help please?

 

Thanks!

 


Wednesday, December 19, 2012 - 7:20:37 AM - curly Read The Tip

Chris, You do have to install Exel on the server, nothing else works. I tried to add Microsoft.Office.Interop.Excel.dll to the windows\asembly 10 diffrent ways with no luck. Installing Excel will put the dll in the asembly so you can use it. The package will pickup and build the script fine, but will not execute correctly if the dll is not in the assembly.

My problme now is that my package wont run the script from a job. it does run correctly in debug mode though. I find it difficult to debug because Arshad's code dont have any descriptive error output with it. I'm a SQL guy, not a c# programmer. Can anybody help out with some code for error debugging in this script?

Arshad - Great Job with this solution. This helped me out big time.


Tuesday, November 27, 2012 - 12:20:22 PM - Chris Read The Tip

Hi, Arshad.  Thanks for posting the solution described above.  I have a question.  Does using the Microsoft.Office.Interop.Excel dll require Office to be installed on the server?  I thought that I saw a post about this.  Thanks.

Thanks...Chris


Monday, October 15, 2012 - 8:39:39 AM - jcidge Read The Tip

Does anyone know, in a SSIS loop to read multiple Excel files, does the excel connection manager scan the first N rows of EACH file it opens to set the data types for each pass OR does it  use only on the first file it finds ?


Monday, October 08, 2012 - 2:17:40 PM - Megan Brooks Read The Tip

I have been having to deal with these issues for a number of years. As I blogged back in 2009, my solution has been to hand-convert each Excel file to tab-delimited text and then read the data using a flat-file data source with suitable metadata settings. I avoided CSV because of the potential for incorrectly handling embedded commas.  CSV can work if the data is properly enclosed in quotation marks and the double-quote is specified as the text delimiter in the flat-file source, but as I recall Excel doesn't always write or read the data correctly. I don't remember the details now, but it seems like Import/Export operations were broken in this regard. Or maybe it was Open/Save As. Always test.

I hadn't thought about doing the Excel-to-text conversion right in the SSIS package. I don't have enough data volume for it to be a problem right now (most of my Excel data comes in once or twice a year, and there aren't that many different files), but if there is more in the future then I will look into it. Thank you!


Monday, October 08, 2012 - 6:12:23 AM - Marce Read The Tip

Hey good man thank you a lot for this information. As i have seen you quite dominate rendering excel files in SSIS id like to ask you if you know how to password protect spreadsheets once created. The flow would be like this: grab the xls from an specific folder, open it, password protect it, close it. I have seen some ssis extensions suites but as far as i know there should be a way to put some code in the mpackage so that the processed file can be automized. i would endless appreciate this, Cheers!


Tuesday, October 02, 2012 - 12:33:49 PM - Russ Read The Tip

I've been working with importing Excel files using SSIS and the interop .dll

Have you had any issues with hanging EXCEL.exe processes and if so, how have you handled them?

 

Thanks!


Tuesday, October 02, 2012 - 10:19:22 AM - Bill Read The Tip

Good stuff, a reasonable work around that is better than changing a registry setting.




 
Sponsor Information