Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Dynamically find where table data starts in Excel using SSIS


By:   |   Last Updated: 2010-12-02   |   Comments (20)   |   Related Tips: 1 | 2 | 3 | More > Integration Services Excel

Problem

Recently I worked on a project to import Excel spreadsheets from various vendors into our database and not all spreadsheets had the same row number for the header record. This is because most of the vendors had additional information on the top of the header row. In this tip I am going to show how the SSIS script task can be used to solve this issue.

Solution

In order to know where the header record starts we need to open the spreadsheet and search for the headers cell by cell, in this tip I am going to show how we could accomplish this with SSIS script task. Lets look at a sample spreadsheet, the structure is changed for simplicity purposes.

using ssis script task

In the example the header row starts at row 5, to get this information during run time of the SSIS package we used a script task before the data flow task. Below is how the SSIS package Control Flow looks like.

 the SSIS package Control Flow

Add a new string variable called "varTabName", and include this variable as a ReadWriteVariable for the Script Task. Add the following code to Script Task

public void Main()  
{      
 string filePath = "C:\\MSSQLTips\\MSSQLTip1.xlsx";      
 string tabName = "Sheet1$";      
 String strCn = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source="        
 + filePath + ";Extended Properties=\"Excel 12.0;HDR=NO;IMEX=1\";";      
 OleDbConnection cn = new OleDbConnection(strCn);      
 string strSQL = "Select * From [" + tabName + "A1:D100]";
 int iCnt = 0;      
 OleDbDataAdapter objAdapter = new OleDbDataAdapter(strSQL, cn);
 DataSet ds = new DataSet();
 objAdapter.Fill(ds, "dSheet1");
 DataTable dt = ds.Tables["dSheet1"];
 foreach (DataRow row in dt.Rows)
 {
  iCnt = iCnt + 1;
  if ((row[0].ToString() == "HdrCol1")              
  & (row[1].ToString() == "HdrCol2")              
  & (row[2].ToString() == "HdrCol3")              
  & (row[3].ToString() == "HdrCol4"))          
  {              
   Dts.Variables["varTabName"].Value = tabName + "A"                  
   + iCnt.ToString() + ":D1048576";              
   break;          
  }      
 }      
 cn.Close();      
 Dts.TaskResult = (int)ScriptResults.Success;  
}  

For simplicity reasons the file path and the tab name of the excel spreadsheet have been hardcoded. We use an OLEDB Connection to connect and query the spreadsheet on a range of cells. Since our sample spreadsheet has 4 columns of data and since we know that the header is always in the top 100 rows we can use the following query "select * from Sheet1$A1:D100" to pull this data into a data set. We then loop through the rows of the data set until we find the Header row, then we append the counter to the tab name and set the varTabName variable. For this example the value of the varTabName variable would be Sheet1$A5:D1048576, where 1048576 is the maximum number of rows for excel 2007.

In the Data flow task, choose Excel as the source and for Data access mode use "Table name or view name variable" option and in the Variable Name drop down choose User::varTabName

choose Excel as the source and for Data access mode use "Table name or view name variable" option and in the Variable Name drop down choose User::varTabName

Below is the output from the data viewer grid for the data flow task.

the output from the data viewer grid for the data flow task

We are now able to import the data from the spreadsheet by dynamically setting the data range in the SSIS script task.

Next Steps
  • Add exception handling for the script task.
  • We can use for Each loop container to loop though multiple files, the file path needs to be a variable which has to be passed to the Script Task.
  • We can use GetOleDbSchemaTable to find all the tabs and loop through those.


Last Updated: 2010-12-02


next webcast button


next tip button



About the author
MSSQLTips author Vamsi Bogullu Vamsi Bogullu is a Database Architect administering database servers, developing applications and designing enterprise BI solutions.

View all my tips




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Wednesday, February 06, 2019 - 9:04:25 AM - Greg Robidoux Back To Top

Hi Shivani,

did you research this error message: "Microsoft.ACE.OLEDB.12.0' provider is not registered on the local machine"

Also, take a look at this: https://social.msdn.microsoft.com/Forums/en-US/600c7fb6-e7af-4a72-9323-c80c379703d2/the-microsoftaceoledb120-provider-is-not-registered-on-the-local-machine

-Greg


Wednesday, February 06, 2019 - 6:53:58 AM - Shivani Bandil Back To Top

  HI,

I am getting the below error.

,The 'Microsoft.ACE.OLEDB.12.0' provider is not registered on the local machine.   at System.Data.OleDb.OleDbServicesWrapper.GetDataSource(OleDbConnectionString constr, DataSourceWrapper& datasrcWrapper)

   at System.Data.OleDb.OleDbConnectionInternal..ctor(OleDbConnectionString constr, OleDbConnection connection)

   at System.Data.OleDb.OleDbConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningObject)

   at System.Data.ProviderBase.DbConnectionFactory.CreateNonPooledConnection(DbConnection owningConnection, DbConnectionPoolGroup poolGroup, DbConnectionOptions userOptions)

   at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)

   at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)

   at System.Data.ProviderBase.DbConnectionInternal.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)

   at System.Data.OleDb.OleDbConnection.Open()

   at System.Data.Common.DbDataAdapter.FillInternal(DataSet dataset, DataTable[] datatables, Int32 startRecord, Int32 maxRecords, String srcTable, IDbCommand command, CommandBehavior behavior)

   at System.Data.Common.DbDataAdapter.Fill(DataSet dataSet, Int32 startRecord, Int32 maxRecords, String srcTable, IDbCommand command, CommandBehavior behavior)

   at System.Data.Common.DbDataAdapter.Fill(DataSet dataSet, String srcTable)

   at System.Data.OleDb.ScriptMain.Main()

P.S : Cannot install anything on server


Monday, July 25, 2016 - 12:32:08 PM - Sandra Back To Top

 I've tried the above code; but now I'm getting the error 'that a destination table has not been provided.'  How do I fix this?

 


Sunday, July 10, 2016 - 1:15:35 AM - Trupti Back To Top

 

 Hi Vanshi,

 

Hope you are doing well.

I am trying above code and didn't understand what is the difference between 'sheet1' and 'dsheet1'.

Also if I use sheet name like, 'Adjusted 4001A',  start getting errors.

Could you please help?

Thanks 


Wednesday, January 28, 2015 - 2:01:50 AM - John Back To Top


Hi,
    I have used ur above scripts but I am getting below error ar script task level:


at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor)
   at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at System.RuntimeType.InvokeMember(String name, BindingFlags bindingFlags, Binder binder, Object target, Object[] providedArgs, ParameterModifier[] modifiers, CultureInfo culture, String[] namedParams)
   at Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTATaskScriptingEngine.ExecuteScript()


And at excel connection I am getting error 'that a destination table has not been provided.'

Could you plz help me out?

Thanks in advance

 


Tuesday, September 02, 2014 - 10:45:20 AM - MANISH BHARGAVA Back To Top

getiing error

 

Index and length must refer to a location within the string.
Parameter name: length (mscorlib)


Tuesday, May 20, 2014 - 7:19:16 PM - nisha Back To Top

Above mentioned error is rectified once  i included 

using System.Data.OleDb;

but when i execute the ssis package i am getting the below error.

at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor)
   at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at System.RuntimeType.InvokeMember(String name, BindingFlags bindingFlags, Binder binder, Object target, Object[] providedArgs, ParameterModifier[] modifiers, CultureInfo culture, String[] namedParams)
   at Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTATaskScriptingEngine.ExecuteScript()

 

 


Tuesday, May 20, 2014 - 6:20:30 PM - nisha Back To Top
 Hi ,

I am getting an error if I copy this code.Error is : "The type or namespace name 'Oledbconenction' could not be found.(are you missing an using directive or assembly reference)"

Any help appreciated!!

Thanks

Nisha V Krishnan

 


Friday, January 24, 2014 - 2:45:31 PM - AWarner Back To Top

I have the script task working.  The issue I am having is when I select preview in the Excel Source it throws an error indicating that a destination table has not been provided. As for the destination table I have the Excel source going to an OLE DB destination, but validation is not working because the Excel source has no column information to map to the OLE DB Destination.  Wouldn't the script task have to run first so that the Excel source would then have the column info which would then map to the OLE DB Destination?


Monday, May 27, 2013 - 9:50:30 AM - Pragyan Rath Back To Top

 

I tried this code excatly what is noted above but it provides error on index.

 

ADDITIONAL INFORMATION:

Index and length must refer to a location within the string.
Parameter name: length (mscorlib)

 

Could you help ?

Also are you trying to get value of a cell by using the command row[0].toString ?


Saturday, March 09, 2013 - 8:41:24 PM - Salman Back To Top

Hi,

I got same requirement but I need to add header data as well with line item also with multiple files(*.xls or *.xlsx). I you have already provided same solutions to other please send me a zip file.

 

Thanks in advance.

 


Wednesday, February 20, 2013 - 10:34:06 AM - Peter Williams Back To Top

Hi Vamsi,

Great article.  I just have one further question relating to the problem you describe.

Your solution describes how to identify a starting for the data you wish to import when you are dealing with multiple files with different starting points.

However, I'm not sure how to pass a variable filename into this solution. 

I have the same problem as you describe, multiple files, changing starting point.  How do I get the filename variable prior to executing this code to identify the dynamic start & end point of data to read?

Thanks for your help.
Pete


Thursday, January 03, 2013 - 8:31:33 AM - raghavender Back To Top

Hello Vamsuli,

 

Could you please send me the zip file of your package.(Dynamically find where table data starts in Excel using SSIS)


Thursday, June 07, 2012 - 6:38:02 PM - Gary Back To Top

Vamsi,

I tried doing this but I cannot get it to work.  Can you send me the zip file of your package so I can play with it?

Thanks,

Gary


Saturday, May 12, 2012 - 4:43:43 PM - shruthi Back To Top

hi 

thanks for the tip.

i woluld like to know if there is any way to read date ,description line1 and description line 2 along with all the header columns and put in into sql table.

 

thanks

shruthi


Thursday, January 26, 2012 - 8:50:14 AM - Jeremy Kadlec Back To Top

Everyone,

Sorry for the code issue.  The tip has been updated.

Please let me know if you have any issues at this point in time.

Thank you,
Jeremy Kadlec


Wednesday, January 25, 2012 - 11:36:23 AM - Vamsi Back To Top
public void Main()
{
    string filePath = "C:\\MSSQLTips\\MSSQLTip1.xlsx";
    string tabName = "Sheet1$";
    String strCn = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source="
      + filePath + ";Extended Properties=\"Excel 12.0;HDR=NO;IMEX=1\";";
    OleDbConnection cn = new OleDbConnection(strCn);
    string strSQL = "Select * From [" + tabName + "A1:D100]";
    int iCnt = 0;
    OleDbDataAdapter objAdapter = new OleDbDataAdapter(strSQL, cn);
    DataSet ds = new DataSet();
    objAdapter.Fill(ds, "dSheet1");
    DataTable dt = ds.Tables["dSheet1"];
    foreach (DataRow row in dt.Rows)
    {
        iCnt = iCnt + 1;
        if ((row[0].ToString() == "HdrCol1")
            & (row[1].ToString() == "HdrCol2")
            & (row[2].ToString() == "HdrCol3")
            & (row[3].ToString() == "HdrCol4"))
        {
            Dts.Variables["varTabName"].Value = tabName + "A"
                + iCnt.ToString() + ":D1048576";
            break;
        }
    }
    cn.Close();
    Dts.TaskResult = (int)ScriptResults.Success;
}

Wednesday, January 25, 2012 - 11:10:42 AM - KawasakiNinja636 Back To Top

Hello Vamsuli

Thanks for a great article, this look exactly what I need.  Unfortunately I cannot see all of the C# code you have in the script box, can you send me this please?

Thanks

Steven.


Thursday, December 02, 2010 - 1:41:18 PM - Vamsi Back To Top

Hi, 

I believe you already have the data in a SQL table, one of the ways of accomplishing your task on SQL 2005/2008 is below, hope this helps.

 

 

Create table #tmp (ID int, Term int, class varchar(10))

insert into #tmp(ID, Term, class)

values

(304, 20103, 'EE2'), (156, 20102, 'CS3'), (167, 20101, 'MC1'), (167, 20101, 'SE3')


Select ID, Term, LEFT(Class, Len(class)-1) Class from 

(

Select distinct t1.ID, t1.term, (Select t.class +',' As [text()]

 from #tmp t where t.ID = t1.ID and t.Term = t1.Term

 FOR XML PATH('')) class

 from #tmp t1

 ) v

 


Thursday, December 02, 2010 - 12:32:52 PM - Mounish Back To Top

Hi,

I have a table like this

  ID   TERM   CLASS
---------------------------
 304    20103      EE2
 156      20102    CS3
 167    20101    MC1
 167    20101    SE3

As you can see, for the ID 167, there are two different values 'MC1' & 'SE3' under CLASS column. I want to create a query which coalesces
'MC1' & 'SE3' into one field, I would like the resulting table to be like

  ID   TERM   CLASS
---------------------------
 304    20103      EE2
 156      20102    CS3
 167    20101    MC1,SE3

Anu help?


Learn more about SQL Server tools