Retrieve Excel Schema Using SQL Integration Services SSIS

By:   |   Comments (32)   |   Related: 1 | 2 | 3 | More > Integration Services Excel


Problem

We use SSIS to periodically load data into our data warehouse.  While much of the data we process is in relational data stores, we do have some Excel spreadsheets that we need to process.  In one particular case we load an Excel spreadsheet that is produced by an external application and every month the number of sheets varies and the sheet names are also different.  How can we determine the sheet names in our SSIS package and process this variable number of sheets?

Solution

The first step is to retrieve the schema information from the Excel spreadsheet.  The sheet name in the Excel spreadsheet becomes the table name in a SQL statement; the sheet columns are the columns in the SQL statement.  Let's start out by looking at a sample Excel spreadsheet that we might need to process with an SSIS package:

sample excel

The main points about the above Excel spreadsheet are:

  • We have an external system that generates an Excel spreadsheet with a list of invoices by week.
  • The spreadsheet can be generated for any number of weeks; each week is in its own sheet.
  • The sheet names represent the year and the week number in the year.

Based on our knowledge of the Excel Source that we use in a Data Flow task, we need to know the sheet name in order to import the data.  In our example, however, the sheet name varies and the number of sheets also varies.  What we need then is a way to query the Excel spreadsheet and get the list of sheets.

You can get the schema information from an Excel spreadsheet by using the OleDbConnection .NET Framework class.  The OleDbConnection class has a method called GetOleDbSchemaTable that will return the list of sheets (i.e. tables) in a spreadsheet and the list of columns in a particular sheet.  Let's create a simple SSIS package to demonstrate how to query this information and process the sheets.  For additional details on the GetOleDbSchemaTable method, refer to this article on the Microsoft web site.

Sample SSIS Package

We'll create an SSIS package that will process all of the sheets in a single Excel file.  The Excel file to process will be specified on the command line and stored in the package variable ExcelFilePath.  Our SSIS sample package will have the following control flow:

ssis control flow
  • Truncate Staging Tables is an Execute SQL task that truncates two staging tables used during processing.
  • Get Schema from Excel File is a Data Flow task that retrieves the schema information from each sheet in the Excel spreadsheet and stores it the stg_ExcelMetadata staging table.
  • Get List of Excel Tables to Process is an Execute SQL task that gets the distinct list of tables from the stg_ExcelMetadata  table and stores them in the package variable ExcelTableList.
  • Process Each Excel Table is a Foreach Loop Container task that iterates through the list of Excel tables in the ExcelTableList package variable.  Each time through the loop the Excel table to be processed is stored in the ExcelTable package variable.
  • Process Excel Table is a Data Flow task that reads the data from the single Excel sheet per the ExcelTable package variable and inserts the data into the staging table stg_Invoice.

The Get Schema from Excel File task is the most interesting part of our sample SSIS package and it looks like this:

ssis data flow

Get Excel Metadata is a Script Source.  It contains the VB.NET code that retrieves the schema information from the Excel file.  Write to Staging Table is an OLE DB Destination that inserts the schema information into the stg_ExcelMetadata table which is defined as follows:

CREATE TABLE [dbo].[stg_ExcelMetadata](
 [EXCEL_FILE_PATH] [nvarchar](256) NULL,
 [DATA_TYPE] [nvarchar](50) NULL,
 [COLUMN_NAME] [nvarchar](50) NULL,
 [TABLE_NAME] [nvarchar](50) NULL
)

A Script Source allows you to write VB.NET code to retrieve data from just about any source and insert it into the data flow.  The key point is that you have the entire .NET Framework at your disposal.  There are two steps in the configuration of the Script Source component:

Step 1: Define the output columns; these are the columns that you want to insert into the data flow.  In our case they are all strings:

script source outputs

Step 2: Write the VB.NET code:

Public Overrides Sub CreateNewOutputRows()
  Dim excelFilePath As String = Me.Variables.ExcelFilePath.ToString()
  Dim strCn As String = "Provider=Microsoft.Jet.OLEDB.4.0;" & _
     "Data Source=" + excelFilePath + ";Extended Properties=Excel 8.0"
  Dim dtTables As DataTable
  Dim dtColumns As DataTable
  Dim tableName As String
  Dim cn As OleDbConnection = New OleDbConnection(strCn)
  cn.Open()
  dtTables = cn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, Nothing)
  For Each row As DataRow In dtTables.Rows
    tableName = row("TABLE_NAME").ToString()
    dtColumns = cn.GetOleDbSchemaTable(OleDbSchemaGuid.Columns, _
                   New Object() {Nothing, Nothing, tableName, Nothing})
    For Each columnRow As DataRow In dtColumns.Rows
      OutputBuffer.AddRow()
      OutputBuffer.EXCELFILEPATH = excelFilePath
      OutputBuffer.TABLENAME = tableName
      OutputBuffer.COLUMNNAME = columnRow("COLUMN_NAME").ToString()
      OutputBuffer.DATATYPE = columnRow("DATA_TYPE").ToString()
    Next
  Next
  cn.Close()
  OutputBuffer.SetEndOfRowset()
End Sub

The main points about the above code snippet are:

  • In a Script Source component you add VB.NET code to the CreateNewOutputRows subroutine to retrieve data and insert it into the data flow.
  • The ExcelFilePath package variable is passed in to the Script Source component and used in the connection string.
  • Create an OleDbConnection object, open the connection and call the GetOleDbSchemaTable method to retrieve the list of sheets and the columns in each sheet.
  • The GetOleDbSchemaTable method returns a DataTable; this is a standard ADO.NET class that has rows and columns.
  • The OutputBuffer class is used to add rows to the data flow and also to assign values to the output columns.  The OutputBuffer class gets its name based on the name you specify on the Inputs and Outputs page in Step 1 above.
  • You should call the SetEndOfRowset method on the OutputBuffer to indicate that you are done adding rows.

The Get List of Excel Tables to Process task executes a query to get the list of sheets in the Excel spreadsheet (from the staging table) then stores the list in the package variable ExcelTableList.  The Process Each Excel Table task iterates over the list of tables in the Excel spreadsheet and executes the Process Each Excel Table task on each table.  This technique was also used in our earlier tip How To Implement Batch Processing in SQL Server Integration Services (SSIS).

There are two steps required to configure the Get List of Excel Tables to Process task: 

exec sql general

The General page (shown above) is where you specify the query; it's just selecting the list of tables from the staging table that we populated in the Get Schema from Excel File task.  Setting the ResultSet to Full result set allows us to capture the query results into a package variable which we need to specify on the Result Set page:

exec sql result set

Note that the data type of the ExcelTableList variable must be Object (i.e. the .NET Framework System.Object class) in order for the variable to hold the list of tables from our query. 

There are two steps required to configure the the Process Each Excel Table task:

foreach collection

The Collection page (shown above) is where you specify the type of enumerator; in our case it must be Foreach ADO Enumerator.  The ADO object source variable is ExcelTableList which is the variable we specified for the Result Set in the Get List of Excel Tables task.  For Enumeration Mode we pick Rows in the first table (there is only one table in our result set).

The Variable Mappings page is used to assign value(s) from the result set to package variable(s) during each iteration.  We'll use a package variable named ExcelTable:

 
foreach variable mappings
Finally the last step in our SSIS package is the Process Excel Table task which looks like this:
process excel table data flow

The main points about the above Data Flow task are:

  • The Excel Source reads a single sheet from our Excel file.
  • The Derived Column task adds the ExcelFilePath and ExcelTable package variables to the data flow so that we can save these in the staging table.
  • Write to Staging is an OLD DB Destination that inserts rows into the staging table stg_Invoice.

The staging table is defined as follows:

CREATE TABLE [dbo].[stg_Invoice] (
 [ExcelFilePath]  [nvarchar](255)  NULL,
 [ExcelTable]  [nvarchar](255)  NULL,
 [InvoiceDate] [float]  NULL,
 [InvoiceNumber] [nvarchar](255) NULL,
 [CustomerNumber] [nvarchar](255) NULL,
 [InvoiceAmount] [float] NULL
)

While the above Data Flow task is relatively straight forward, there are two subtle points that we need to take into consideration.  First we need to configure the Excel Connection Manager that is used by the Excel Source.  The ExcelFilePath property needs to be set to the ExcelFilePath package variable that is passed in on the command line.  Click the button in the Expressions property of the Excel Connection Manager and assign the ExcelFilePath package variable to the ExcelFilePath property as shown below:

excel file path

Second remember that the Data Flow task is being executed once for each sheet in our Excel spreadsheet.  The way to make this work is to configure the Excel Source Connection Manager page to specify the Data access mode as Table name or view name variable and select the ExcelTable package variable as the Variable name as shown below:

excel source connection manager

As the Process Each Excel Table task iterates through the list of sheets in the Excel file, it assigns each sheet to the ExcelTable package variable then executes the Process Excel Table task which operates on the sheet specified by the ExcelTable package variable.

Running the SSIS Package

Run the sample SSIS package using the DTEXEC command line utility and set the ExcelFilePath package variable to the full path of the Excel file to process; e.g.:

DTEXEC /FILE ExcelMetadata.dtsx /SET "\Package.Variables[User::ExcelFilePath].Value";"c:\drop\sample.xls"

After running the package you can query the stg_ExcelMetadata table to see the schema information:

staging results

Note that the TABLE_NAME column has a '$' character at the end of it and is enclosed in single quotes.  The sheet name does not show the '$' character or the single quotes in Excel.  The DATA_TYPE column values are: 5=double precision, 130=text.  You can find the details on additional types here.  Excel supports a very small set of column types.

You can also query the stg_Invoice table to see the data that was loaded from the Excel sheets: 

stg invoice results

The results above only show the first ten rows; if you scroll through all of the results you will see rows from each of the three Excel sheets in our sample Excel spreadsheet.

Next Steps
  • Take a look at the sample code here and experiment with retrieving schema information from an Excel file. 
  • Although the column information from the Excel sheets was retrieved, it wasn't actually used in the example.  However, you could certainly use the column list in a variety of ways; e.g. you could validate that all columns you expect are available and abort with an appropriate error if necessary.
  • The Script Source component in the data flow is a great way to retrieve data using VB.NET code and get that data into the data flow.  SQL Server 2008 SSIS supports C# code in Script components in addition to VB.NET.
  • The Foreach Loop Container task is a very useful technique for implementing a batch or repetitive type of process.
  • Each sheet in the example Excel spreadsheet has the same schema; i.e. column list.  To process sheets with different column lists you would need a separate Data Flow task for each distinct list of columns.  You could also use a Script task to write ADO.NET code to insert the Excel data into a table.  The Data Flow doesn't support a dynamic column list.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Ray Barley Ray Barley is a Principal Architect at IT Resource Partners and a MSSQLTips.com BI Expert.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Friday, February 12, 2016 - 9:35:51 AM - Ray Barley Back To Top (40665)

 If you want to read multiple Excel files, take a look at this tip: https://www.mssqltips.com/sqlservertip/4165/how-to-read-data-from-multiple-excel-files-with-sql-server-integration-services/

 

 


Friday, February 12, 2016 - 1:55:52 AM - anil kumar Back To Top (40658)

 

 Hi,

 

I am having multiple excel files in one folder, how to implent it. need to load the excel data along with the table name, kindly help me.


Tuesday, September 30, 2014 - 11:45:48 AM - Raymond Barley Back To Top (34776)

There should not be quotes around the table name in the DROP TABLE command; i.e. DROP TABLE ['data'] should be DROP TABLE [data]


Tuesday, September 30, 2014 - 9:51:45 AM - Joe Back To Top (34775)

I have tried the following code and still get the error

Code

IF OBJECT_ID(N'data') IS NOT NULL
    BEGIN
 DROP TABLE ['data']
End

Error

 [Execute SQL Task] Error: Executing the query "IF OBJECT_ID(N'data') IS NOT NULL
    BEGIN
 DROP ..." failed with the following error: "Invalid SQL statement; expected 'DELETE', 'INSERT', 'PROCEDURE', 'SELECT', or 'UPDATE'.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.

 


Monday, September 29, 2014 - 10:24:19 PM - Ray Barley Back To Top (34765)

One thing I notice is if you have a BEGIN you need an End; e.g

if condition

BEGIN

code goes here 

END

ELSE

BEGIN

CODE GOES HERE 

END

 

 


Monday, September 29, 2014 - 6:00:50 PM - Joe Back To Top (34759)

I am trying to check if a excel file exists and if it does drop it then re-create it. the code is below and I get the following error.

Error


[Execute SQL Task] Error: Executing the query "IF OBJECT_ID(N'data') IS NOT NULL
    BEGIN
 DROP ..." failed with the following error: "Invalid SQL statement; expected 'DELETE', 'INSERT', 'PROCEDURE', 'SELECT', or 'UPDATE'.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.

Code

IF OBJECT_ID(N'data') IS NOT NULL
    BEGIN
 DROP TABLE ['data']
Else

CREATE TABLE `data` (
     `Report_Date` DATETIME,
    `LOAN_NUMBER` NVARCHAR(50),
    `BK_CASE_NUMBER` NVARCHAR(50),
    `CREATE_DATE` DATETIME,
    `DATE_IS` DATETIME,
    `DATE_WAS` DATETIME,
    `TOTAL_DAYS` INT,
    `UNIT_MANAGER` NVARCHAR(50),
    `UNIT_MANAGER_NAME` NVARCHAR(50),
    `REC_TYPE` NVARCHAR(25),
    `INIT_PLAN_REVIEWER_ASSIGNED` NVARCHAR(50),
    `INIT_PLAN_REVIEWER_NAME` NVARCHAR(50),
    `DOC_SPECIALIST` NVARCHAR(50),
    `DOC_SPECIALIST_NAME` NVARCHAR(50),
    `POC_PREPARER_ASGND` NVARCHAR(50),
    `POC_PREPARER_NAME` NVARCHAR(50),
    `POC_AUDITOR_ASGND` NVARCHAR(50),
    `POC_AUDITOR_NAME` NVARCHAR(50),
    `IPR_UNIT_MANAGER` NVARCHAR(50),
    `IPR_STATUS` NVARCHAR(50),
    `IPR_IW` NVARCHAR(150),
    `IPR_DAYS` INT,
    `IPR_UPDATE_TIME` DATETIME,
    `DOC_CHECK_UNIT_MANAGER` NVARCHAR(50),
    `DOC_CHECK_STATUS` NVARCHAR(50),
    `DOC_IW` NVARCHAR(150),
    `DOC_DAYS` INT,
    `DOC_CHECK_UPDATE_TIME` DATETIME,
    `POC_PREP_UNIT_MANAGER` NVARCHAR(50),
    `POC_PREP_STATUS` NVARCHAR(50),
    `POC_IW` NVARCHAR(150),
    `POC_DAYS` INT,
    `POC_UPDATE_TIME` DATETIME,
    `POC_AUDIT_UNIT_MANAGER` NVARCHAR(50),
    `POC_AUDIT_STATUS` NVARCHAR(50),
    `POC_AUDIT_IW` NVARCHAR(150),
    `POC_AUDIT_DAYS` INT,
    `POC_AUDIT_UPDATE_TIME` DATETIME,
    `LAST_MODIFIED_BY_USER` NVARCHAR(50),
    `LOCKED_BY_ROLE` NVARCHAR(50),
    `LAST_MODIFIED_TIME` DATETIME,
    `NEXT_STEP` NVARCHAR(50),
    `CURRENT_STEP` NVARCHAR(50),
    `PROCESSDATE` DATETIME,
    `BK_Bucket` NVARCHAR(50),
    `BK_CHAPTER` NVARCHAR(2),
    `BK_POC` DATETIME,
    `bk_rcvd` DATETIME,
    `bk_state` NVARCHAR(2),
    `bk_status` NVARCHAR(50),
    `bkconvert` DATETIME,
    `clerk_name` NVARCHAR(50),
    `COMMENT_ERROR_CD_1_DESC` NVARCHAR(50),
    `DOC_STATUS` NVARCHAR(50),
    `Investor` NVARCHAR(50),
    `Legal_Action` NVARCHAR(50),
    `Legal_Status` NVARCHAR(50),
    `POC_BAR_DATE` DATETIME,
    `POC_FILED` DATETIME,
    `POC_STATUS` NVARCHAR(50),
    `POST_NXDU` DATETIME,
    `System` NVARCHAR(50),
    `ACCOUNT_NBR` NVARCHAR(10),
    `PRIOR_STEP` NVARCHAR(50),
    `COMMENTS_DOC_CHECK_LIST` NVARCHAR(255),
    `COMMENTS_IPR` NVARCHAR(255),
    `COMMENTS_POC` NVARCHAR(255),
    `COMMENTS_POC_AUDIT` NVARCHAR(255),
    `EXCLUSION_REASON` NVARCHAR(50)
)

 

 


Tuesday, November 26, 2013 - 1:39:36 PM - Lakshmi Back To Top (27617)

Is there any way that we can read the Column name and the data in the excel in SSIS Script component as Source.

I have an excel file where some columns are static and some columns change like below. I need to read the column name and see if it has "Name" or  "Value" then just ge thte data in the column and insert into a SQL table. If the column names are not "Name" or  "Value" then i need to append the column name and the value befor inserting into SQL table. Can we do this in Script component.

Name Value SuppVal1 SuppVal2 SuppVal3
Test 1 x a 1
Sample 2 y b 2
King 3 z c 3

Wednesday, November 13, 2013 - 4:14:25 PM - Troy Witthoeft Back To Top (27491)

Here is a full C# version of the Get Excel Metadata script.   The only line I took the libery of editing was the connection string. I prefer to use the Microsoft ACE provider, and I upped the Excel version from 8.0 to 12.0.  Hope it helps someone!! 

public override void CreateNewOutputRows()
    {
        string excelFilePath = this.Variables.ExcelFilePath.ToString();
        string strCn = "Provider=Microsoft.ACE.OLEDB.12.0;" + "Data Source=" + excelFilePath + ";Extended Properties=Excel 8.0";
        DataTable dtTables = null;
        DataTable dtColumns = null;
        string tableName = null;
        OleDbConnection cn = new OleDbConnection(strCn);
        cn.Open();
        dtTables = cn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
        foreach (DataRow row in dtTables.Rows)
        {
            tableName = row["TABLE_NAME"].ToString();
            dtColumns = cn.GetOleDbSchemaTable(OleDbSchemaGuid.Columns, new object[] {
			null,
			null,
			tableName,
			null
		});
            foreach (DataRow columnRow in dtColumns.Rows)
            {
                OutputBuffer.AddRow();
                OutputBuffer.EXCELFILEPATH = excelFilePath;
                OutputBuffer.TABLENAME = tableName;
                OutputBuffer.COLUMNNAME = columnRow["COLUMN_NAME"].ToString();
                OutputBuffer.DATATYPE = columnRow["DATA_TYPE"].ToString();
            }
        }
        cn.Close();
        OutputBuffer.SetEndOfRowset();
    }

Friday, June 28, 2013 - 8:24:18 PM - Ray Barley Back To Top (25626)

I think if you could open the CSV file with Excel then save it s an Excel file (i.e. one of the Excel file formats e.g. Excel 2007 is wht I probably used in this tip) then it might work.  The code specifically opens an Excel file; it can't go directly against a CSV file without some modification).


Friday, June 28, 2013 - 5:42:14 PM - James Back To Top (25624)

I have am trying to design an SSIS package that imports data into a staging table from CSV files. Problem is, that not all the files have the same exact amount of columns, and some files have different column names altoghether.

 

Would this script allow me to pull the data from more than 1 CSV file?

 

Thank you!

 


Friday, April 26, 2013 - 7:57:49 AM - Ray Barley Back To Top (23572)

There are 2 things to check when you get the error EXCEL FILEPATH IS NOT A MEMBER OF *****VBPROJ.VARABLES.

First make sure you have declared the following variable in your package:

Name: ExcelFilePath Scope: Excek=lMetadata Data type: String Value: put in the path to the Sample.xls file; e.g. c:\mssqltips\SSIS_Excel_Metadata\Sample.xls

Second - open the Get Schema from Excel File data flow and edit the Get Excel script source component.  Make sure that the ReadOnlyVariables property (in the Custom Properties section) is set to User::ExcelFilePath

I downloaded the code and the package runs fine; when I deleted the ExcelFilePath variable I got the same error as you.

I don't have a C# version of the script source; C# wasn't an option when I did this tip. 

 

 

 


Thursday, April 25, 2013 - 5:36:50 PM - Vigilant Back To Top (23563)

I tried ur bv.net script to get the sheetname.But the script is throughing error. The error regarding this line:

Dim excelFilePath As String = Me.Variables.ExcelFilePath.ToString()()EXCEL FILEPATH IS NOT A MEMBER OF *****VBPROJ.VARABLES.

I triend to use all references like Microsoft.Office.Interop.Excel,Microsoft.Office.Core,System.Data.SqlClient but it did not help me.

Do you have similar script in C sharp.Actually I am not good in vb.net.Everyday I receive excel files  with different names and different sheetname and all of them have same number of columns and I need to stage everyday's excel to a new sql table with the same name of excel file.So far I am using dynamic sql (opendatasource) but now I want to use SSIS package to achieve this. please help me to figure this out.

 

Thanks,

Vigilant

[email protected]


Wednesday, March 13, 2013 - 6:14:02 PM - Wence Back To Top (22787)

Ray, Thank you for the update but I was hoping to keep the variable so I can read one or multiple sheets, however many exist in the Excel document.  I love the variable because it does just that.  

I did try the recommendation you suggest and it works great for one sheet in a document at a time.  You can imagine how this would be painful if I have to do more than 50 documents each containing any number of sheets.  

Any info you provide is greatly, and I mean GREATLY, appreciated!  


Thursday, March 7, 2013 - 8:38:40 PM - Ray Barley Back To Top (22641)

You can use an Excel Connection Manager (the tip does this).  Set the Data Access Mode to SQL Command.  Specify the SQL command text as SELECT * FROM [SHEET1$!B10:F30]

I found the answer here: http://sqlserversolutions.blogspot.com/2009/02/selecting-excel-range-in-ssis.html

 


Wednesday, March 6, 2013 - 1:14:01 PM - Wence Back To Top (22600)

I am using this solution and it works great for what I am needing but wondering if there is a way to include in the excel variable how to include selection of specific cells along with table name?  

Currently, the variable calls 'SHEET1$'. Is there a way to also inlcude cells 'B10:F30' as I only need certain rows and columns.  Please advise and thanks in advance.

 

Regards,

 

Wence


Monday, December 3, 2012 - 8:38:34 AM - Ray Barley Back To Top (20706)

You will have to provide detailed information about what you're trying to do and what error you're getting.  This forum may be able to help when you have a very specific question or a very general question.  Some things that would be important to know - are you trying to import multiple worksheets from a single excel file, are you trying to import from multiple excel files, what have you done, what specifically is not working.  Do all of the worksheets have the same exact columns  and types.

Probably the biggest question is this - does this tip represent the best approach for what you are trying to accomplish?

 

For instance here's a tip that access excel using a linked server: http://www.mssqltips.com/sqlservertip/2018/using-a-sql-server-linked-server-to-query-excel-files/


Monday, December 3, 2012 - 7:29:18 AM - rakesh Back To Top (20702)

Hi Ray,

 

thanku ur reply, if i have multiple worksheet file like client, RBS, ABS,XYS,SDS,SDS,SFFI,SIYF,SIFH,QEIAS,SLGUIR,GHFIR,AQTUJD,GFOGI more then70 worksheet files, if i run then error show

 

can you please suggest me which point I am missing.

Thanks

Rakesh

 


Thursday, November 29, 2012 - 9:32:46 AM - Ray Barley Back To Top (20630)

I can't think of anything.  Try running / debugging the code from http://support.microsoft.com/kb/318373/en-us.  Maybe something will jump out at you.


Thursday, November 29, 2012 - 4:48:58 AM - rakesh bhatia Back To Top (20622)

Hi Ray,

Thanks for such a great post.. I am using your code for loading data from excel file having multiple workbook. but only problem i am facing is, it load data from 1-to-9 sheet only, but my excel file contain more than 9 files.

can you please suggest me which point I am missing.

Thanks

Rakesh

 

 


Tuesday, May 22, 2012 - 10:36:46 AM - Ray Barley Back To Top (17591)

I would try add IMEX=1 to the Excel connection string which forces every input column to come in as just a string.  Then you will have to do whatever transform is necessary to get the type that you really want.

 

Here's the details on Excel connection strings: http://www.connectionstrings.com/excel


Tuesday, May 22, 2012 - 8:29:33 AM - Ramya Back To Top (17586)

In My Excell source i have the columns

LastCollectedTime ModifiedDate
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL

while loding into oledb destination i am getting error.

my task is like that what data we have in that columns must lodeed ino sql DB.

in SQL DB the two column have DateTime Datatype


Thursday, October 21, 2010 - 6:23:27 PM - nmeyer Back To Top (10289)

A great post!  Thank you. 

I was able to use concepts here to read employee leave balances scattered throughout over 600 spreadsheets that had anywhere from 17 to 29 columns in a spreadsheet.  The balances were not consistently in the same cells within the sheets and the sheets themselves had different numbers of columns. 

Your metadata code helped me read each spreadsheet and write a custom, 30-column query for each spreadsheet based on the number of columns in each spreadsheet. (For example:Select Top 15 F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12, F13, F14, F15, F16, F17, F18, F19, F20, F21, F22, F23, F24, 'F25' as F25, 'F26' as F26, 'F27' as F27, 'F28' as F28, 'F29' as F29, 'F30' as F30 FROM [2010-11$] for a spreadsheet with 24 columns and Select Top 15 F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12, F13, F14, F15, F16, F17, F18, F19, F20, F21, F22, F23, F24, F25, F26, F27, F28, F29, 'F30' as F30 FROM [2010-11$] for a spreadsheet with 29 columns).  I wrote these queries to the staging table for each spreadsheet along with spreadsheet location/name.

Then with an execute SQL task I read each standardized, 30-column SQL statement into a variable that I passed to a single dataflow.  There, I was able to read each spreadsheets into an ado recordset (in memory) and then shred the recordset, first looping through rows and columns looking for key words and capturing the cell locations of those key words. Then, since the data I wanted was always at the same location relative to the key words, I was able to specify a cell address that held my desired data and then loop back through the recordset to read the data I wanted from the specified the cell addresses.  (used OleDbDataAdapter code I found somewhere else on the internet to iterate through rows and columns of the ado recordset).

Once I had the data I needed from each spreadsheet, I wrote it to a database table for later use/reference.

Four days ago I knew nothing about OleDbDataAdapters or GetOleDbSchemaTable and very little VB, but with guru posts like yours and others, I got it done!

 


Friday, September 3, 2010 - 9:18:50 AM - Ray Barley Back To Top (10116)
This is just a thought - could you add a script task to the SSIS package that opens up the CSV file and saves it as an Excel file?   I think you can do this using the Excel object model, something like this:

Dim objExcel
    Dim objWorkBook
    Set objExcel = CreateObject("EXCEL.APPLICATION")
    Set objWorkBook = objExcel.Workbooks.Open("C:\temp\test.csv")
    objWorkBook.SaveAs ("C:\temp\test2.xls")
    objWorkBook.Close True

 

To test this I manually created a csv file with Notepad, putting column names in the first row.  Then I opened the csv using Excel 2007 and saved it as a .xls file.  Finally I added an Excel Source to an SSIS package data flow, pointed to the .xls file and verified that it is able to read the Excel file and it's columns/rows.

 


Friday, September 3, 2010 - 6:54:25 AM - Albert Back To Top (10114)
Hi Ray,

I think your  tip on the above topic "How To Retrieve Excel Schema Information Using SQL Server Integration Services (SSIS) 2 " is great. I have a similar problemwhere I am trying to retreive an Excel schema information, but this time the Excel file is save in a "CSV" format rather than in "XLS" format. I couldn't get it to work as the "script component" throws up an error.

Is there anyway I can retreive the Excel file saved in a CSV format to generate the file schema information. Your help would be much appreciated. Counting on your co-operation.

Thanks

Albert


Monday, March 30, 2009 - 5:36:19 AM - Henry82 Back To Top (3094)

[quote user="raybarley"]The only way I know of to get this to work is the ExcelFilepath variable has to have an intial value (in the variables window) that points to an existing Excel file and the ExcelTable variable has to have an initial value that is the name of a sheet in that same Excel file[/quote]

That did the trick! I didn't think about initial values and my value columns was set to '0'...

Thanks!


Monday, March 30, 2009 - 4:24:49 AM - raybarley Back To Top (3093)

The only way I know of to get this to work is the ExcelFilepath variable has to have an intial value (in the variables window) that points to an existing Excel file and the ExcelTable variable has to have an initial value that is the name of a sheet in that same Excel file.  As part of the development environment you need an Excel spreadsheet thatthe package can open in the designer.

When you create a package you typically start with hard-coded values in the Excel Connection Manager and Excel Source.  After you get them configured you go back and use variables and/or expressions but you have to make sure those variables and expressions have valid values for when you're working with the package in the designer.

As you correctly point out the Excel Source can't figure out thecolumn list unless it can access an actual spreadsheet.

 


Monday, March 30, 2009 - 3:50:21 AM - Henry82 Back To Top (3092)

(My first time working with variables)

I added the three variables to my package as described in the article. The "ExcelTable" variable is (supposed to be) populated by the ForEach loop container. Up to this point everything works.

My issue is with the Import step within the ForEach:
I choose the Excel Connection Manager - added the filepath to the expression properties (as per document). Now when I double click on the Excel Source, I choose my connection, then "Table name or view name from variable" as the Data Access Mode, and lastly "User::ExcelTable" as the variable. When I choose Columns I get an error:

"Error at IMPORT[Excel Source[1]]: A destination table name has not been provided"

Am I missing something? HOW can the Excel Source anyway get the column schema before the variable is populated (thus it doesnt know which sheet to open)? Some MS logic I am simply unable to understand.

Thanks


Saturday, February 21, 2009 - 8:01:08 AM - raybarley Back To Top (2807)

In the sample package take a look at the Process Each Excel Table task.  This is a Foreach Loop task that reads in one sheet from the Excel spreadsheet at a time.  Prior to this task, the User::ExcelTableList package variable is populated with the list of sheets (or tables) in the Excel spreadsheet.  The Foreach Loop task iterates thru the list of sheets and assigns each one to the User::ExcelTable package variable.  This variable assignment happens on the Variable Mappings page of the Foreach Loop editor.  Then the Foreach Loop executes the Data Flow task contained in it which reads the particular Excel sheet and writes the contents to a table.


Thursday, February 19, 2009 - 10:24:14 AM - Mirela MG Back To Top (2795)

I wonder how to initialize the User::ExcelTable variable in the code. Let's say I don't want myself to fill in the Value, so I don't need to know the name of the tab from excel file when I have to import it. 

I would like to automate this process as much as possible.

Thank you.


Monday, February 9, 2009 - 6:14:11 AM - admin Back To Top (2721)

This has been fixed.   This is the correct URL:

http://www.mssqltips.com/tipimages/1674_SSIS_Excel_Metadata_Sample.zip


Friday, February 6, 2009 - 11:37:47 AM - HartmannR Back To Top (2709)

 What is the user name and password for the download?

http://www.mssqltips.com/.../1674_SSIS_Excel_Metadata_Sample.zip

 


Friday, January 30, 2009 - 3:54:32 AM - [email protected] Back To Top (2647)

Hello,

Is there anyone could help me to fix this item?

I got a try but i'm having some troubles with the   Execute sql task "Get list of excel tables to process".  

If i set the resultset to full result ,when i run the packate  i get back the folowing error message.

  [Attivitą Esegui SQL] Error: Esecuzione della query "SELECT DISTINCT TABLE_NAME FROM STG_EXCELMETADATA" non riuscita con l'errore seguente: "Interfaccia non registrata. (Eccezione da HRESULT: 0x80040154 (REGDB_E_CLASSNOTREG))". Possibili cause: problemi nella query, impostazione non corretta della proprietą "ResultSet", parametri non impostati correttamente o problemi di attivazione della connessione.

 I've the italian version of Visul Studio, anyway would it means:

Your query fails, not registred interface ( exception  from HRESULT: 0x80040154 (REGDB_E_CLASSNOTREG))".  possible causes: resultset property not properly set

 

 Thanks

 

 

 















get free sql tips
agree to terms