Dynamically find where table data starts in Excel using SSIS

By:   |   Comments (21)   |   Related: 1 | 2 | 3 | More > Integration Services Excel


Problem

Recently I worked on a project to import Excel spreadsheets from various vendors into our database and not all spreadsheets had the same row number for the header record. This is because most of the vendors had additional information on the top of the header row. In this tip I am going to show how the SSIS script task can be used to solve this issue.

Solution

In order to know where the header record starts we need to open the spreadsheet and search for the headers cell by cell, in this tip I am going to show how we could accomplish this with SSIS script task. Lets look at a sample spreadsheet, the structure is changed for simplicity purposes.

using ssis script task

In the example the header row starts at row 5, to get this information during run time of the SSIS package we used a script task before the data flow task. Below is how the SSIS package Control Flow looks like.

 the SSIS package Control Flow

Add a new string variable called "varTabName", and include this variable as a ReadWriteVariable for the Script Task. Add the following code to Script Task

public void Main()  
{      
 string filePath = "C:\\MSSQLTips\\MSSQLTip1.xlsx";      
 string tabName = "Sheet1$";      
 String strCn = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source="        
 + filePath + ";Extended Properties=\"Excel 12.0;HDR=NO;IMEX=1\";";      
 OleDbConnection cn = new OleDbConnection(strCn);      
 string strSQL = "Select * From [" + tabName + "A1:D100]";
 int iCnt = 0;      
 OleDbDataAdapter objAdapter = new OleDbDataAdapter(strSQL, cn);
 DataSet ds = new DataSet();
 objAdapter.Fill(ds, "dSheet1");
 DataTable dt = ds.Tables["dSheet1"];
 foreach (DataRow row in dt.Rows)
 {
  iCnt = iCnt + 1;
  if ((row[0].ToString() == "HdrCol1")              
  & (row[1].ToString() == "HdrCol2")              
  & (row[2].ToString() == "HdrCol3")              
  & (row[3].ToString() == "HdrCol4"))          
  {              
   Dts.Variables["varTabName"].Value = tabName + "A"                  
   + iCnt.ToString() + ":D1048576";              
   break;          
  }      
 }      
 cn.Close();      
 Dts.TaskResult = (int)ScriptResults.Success;  
}  

For simplicity reasons the file path and the tab name of the excel spreadsheet have been hardcoded. We use an OLEDB Connection to connect and query the spreadsheet on a range of cells. Since our sample spreadsheet has 4 columns of data and since we know that the header is always in the top 100 rows we can use the following query "select * from Sheet1$A1:D100" to pull this data into a data set. We then loop through the rows of the data set until we find the Header row, then we append the counter to the tab name and set the varTabName variable. For this example the value of the varTabName variable would be Sheet1$A5:D1048576, where 1048576 is the maximum number of rows for excel 2007.

In the Data flow task, choose Excel as the source and for Data access mode use "Table name or view name variable" option and in the Variable Name drop down choose User::varTabName

choose Excel as the source and for Data access mode use "Table name or view name variable" option and in the Variable Name drop down choose User::varTabName

Below is the output from the data viewer grid for the data flow task.

the output from the data viewer grid for the data flow task

We are now able to import the data from the spreadsheet by dynamically setting the data range in the SSIS script task.

Next Steps
  • Add exception handling for the script task.
  • We can use for Each loop container to loop though multiple files, the file path needs to be a variable which has to be passed to the Script Task.
  • We can use GetOleDbSchemaTable to find all the tabs and loop through those.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Vamsi Bogullu Vamsi Bogullu is a Database Architect administering database servers, developing applications and designing enterprise BI solutions.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Wednesday, August 2, 2023 - 11:07:51 PM - trapti vijayvergiya Back To Top (91457)
Hi, Thanks for the script. It works fine for me when I try to run it using visual studio, but when we scheduled it as a job on server with is 2010 I get below errow - Error: 2023-08-02 10:12:16.20
Code: 0x00000001
Source: Loop through the Excel to find the row where main data starts
Description: Exception has been thrown by the target of an invocation.
End Error

Can you help me fixing this ?

Wednesday, February 6, 2019 - 9:04:25 AM - Greg Robidoux Back To Top (78977)

Hi Shivani,

did you research this error message: "Microsoft.ACE.OLEDB.12.0' provider is not registered on the local machine"

Also, take a look at this: https://social.msdn.microsoft.com/Forums/en-US/600c7fb6-e7af-4a72-9323-c80c379703d2/the-microsoftaceoledb120-provider-is-not-registered-on-the-local-machine

-Greg


Wednesday, February 6, 2019 - 6:53:58 AM - Shivani Bandil Back To Top (78976)

  HI,

I am getting the below error.

,The 'Microsoft.ACE.OLEDB.12.0' provider is not registered on the local machine.   at System.Data.OleDb.OleDbServicesWrapper.GetDataSource(OleDbConnectionString constr, DataSourceWrapper& datasrcWrapper)

   at System.Data.OleDb.OleDbConnectionInternal..ctor(OleDbConnectionString constr, OleDbConnection connection)

   at System.Data.OleDb.OleDbConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningObject)

   at System.Data.ProviderBase.DbConnectionFactory.CreateNonPooledConnection(DbConnection owningConnection, DbConnectionPoolGroup poolGroup, DbConnectionOptions userOptions)

   at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)

   at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)

   at System.Data.ProviderBase.DbConnectionInternal.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)

   at System.Data.OleDb.OleDbConnection.Open()

   at System.Data.Common.DbDataAdapter.FillInternal(DataSet dataset, DataTable[] datatables, Int32 startRecord, Int32 maxRecords, String srcTable, IDbCommand command, CommandBehavior behavior)

   at System.Data.Common.DbDataAdapter.Fill(DataSet dataSet, Int32 startRecord, Int32 maxRecords, String srcTable, IDbCommand command, CommandBehavior behavior)

   at System.Data.Common.DbDataAdapter.Fill(DataSet dataSet, String srcTable)

   at System.Data.OleDb.ScriptMain.Main()

P.S : Cannot install anything on server


Monday, July 25, 2016 - 12:32:08 PM - Sandra Back To Top (42970)

 I've tried the above code; but now I'm getting the error 'that a destination table has not been provided.'  How do I fix this?

 


Sunday, July 10, 2016 - 1:15:35 AM - Trupti Back To Top (41854)

 

 Hi Vanshi,

 

Hope you are doing well.

I am trying above code and didn't understand what is the difference between 'sheet1' and 'dsheet1'.

Also if I use sheet name like, 'Adjusted 4001A',  start getting errors.

Could you please help?

Thanks 


Wednesday, January 28, 2015 - 2:01:50 AM - John Back To Top (36080)


Hi,
    I have used ur above scripts but I am getting below error ar script task level:


at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor)
   at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at System.RuntimeType.InvokeMember(String name, BindingFlags bindingFlags, Binder binder, Object target, Object[] providedArgs, ParameterModifier[] modifiers, CultureInfo culture, String[] namedParams)
   at Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTATaskScriptingEngine.ExecuteScript()


And at excel connection I am getting error 'that a destination table has not been provided.'

Could you plz help me out?

Thanks in advance

 


Tuesday, September 2, 2014 - 10:45:20 AM - MANISH BHARGAVA Back To Top (34352)

getiing error

 

Index and length must refer to a location within the string.
Parameter name: length (mscorlib)


Tuesday, May 20, 2014 - 7:19:16 PM - nisha Back To Top (30862)

Above mentioned error is rectified once  i included 

using System.Data.OleDb;

but when i execute the ssis package i am getting the below error.

at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor)
   at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at System.RuntimeType.InvokeMember(String name, BindingFlags bindingFlags, Binder binder, Object target, Object[] providedArgs, ParameterModifier[] modifiers, CultureInfo culture, String[] namedParams)
   at Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTATaskScriptingEngine.ExecuteScript()

 

 


Tuesday, May 20, 2014 - 6:20:30 PM - nisha Back To Top (30859)
 Hi ,

I am getting an error if I copy this code.Error is : "The type or namespace name 'Oledbconenction' could not be found.(are you missing an using directive or assembly reference)"

Any help appreciated!!

Thanks

Nisha V Krishnan

 


Friday, January 24, 2014 - 2:45:31 PM - AWarner Back To Top (28218)

I have the script task working.  The issue I am having is when I select preview in the Excel Source it throws an error indicating that a destination table has not been provided. As for the destination table I have the Excel source going to an OLE DB destination, but validation is not working because the Excel source has no column information to map to the OLE DB Destination.  Wouldn't the script task have to run first so that the Excel source would then have the column info which would then map to the OLE DB Destination?


Monday, May 27, 2013 - 9:50:30 AM - Pragyan Rath Back To Top (25151)

 

I tried this code excatly what is noted above but it provides error on index.

 

ADDITIONAL INFORMATION:

Index and length must refer to a location within the string.
Parameter name: length (mscorlib)

 

Could you help ?

Also are you trying to get value of a cell by using the command row[0].toString ?


Saturday, March 9, 2013 - 8:41:24 PM - Salman Back To Top (22684)

Hi,

I got same requirement but I need to add header data as well with line item also with multiple files(*.xls or *.xlsx). I you have already provided same solutions to other please send me a zip file.

 

Thanks in advance.

 


Wednesday, February 20, 2013 - 10:34:06 AM - Peter Williams Back To Top (22309)

Hi Vamsi,

Great article.  I just have one further question relating to the problem you describe.

Your solution describes how to identify a starting for the data you wish to import when you are dealing with multiple files with different starting points.

However, I'm not sure how to pass a variable filename into this solution. 

I have the same problem as you describe, multiple files, changing starting point.  How do I get the filename variable prior to executing this code to identify the dynamic start & end point of data to read?

Thanks for your help.
Pete


Thursday, January 3, 2013 - 8:31:33 AM - raghavender Back To Top (21248)

Hello Vamsuli,

 

Could you please send me the zip file of your package.(Dynamically find where table data starts in Excel using SSIS)


Thursday, June 7, 2012 - 6:38:02 PM - Gary Back To Top (17865)

Vamsi,

I tried doing this but I cannot get it to work.  Can you send me the zip file of your package so I can play with it?

Thanks,

Gary


Saturday, May 12, 2012 - 4:43:43 PM - shruthi Back To Top (17438)

hi 

thanks for the tip.

i woluld like to know if there is any way to read date ,description line1 and description line 2 along with all the header columns and put in into sql table.

 

thanks

shruthi


Thursday, January 26, 2012 - 8:50:14 AM - Jeremy Kadlec Back To Top (15784)

Everyone,

Sorry for the code issue.  The tip has been updated.

Please let me know if you have any issues at this point in time.

Thank you,
Jeremy Kadlec


Wednesday, January 25, 2012 - 11:36:23 AM - Vamsi Back To Top (15777)
public void Main()
{
    string filePath = "C:\\MSSQLTips\\MSSQLTip1.xlsx";
    string tabName = "Sheet1$";
    String strCn = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source="
      + filePath + ";Extended Properties=\"Excel 12.0;HDR=NO;IMEX=1\";";
    OleDbConnection cn = new OleDbConnection(strCn);
    string strSQL = "Select * From [" + tabName + "A1:D100]";
    int iCnt = 0;
    OleDbDataAdapter objAdapter = new OleDbDataAdapter(strSQL, cn);
    DataSet ds = new DataSet();
    objAdapter.Fill(ds, "dSheet1");
    DataTable dt = ds.Tables["dSheet1"];
    foreach (DataRow row in dt.Rows)
    {
        iCnt = iCnt + 1;
        if ((row[0].ToString() == "HdrCol1")
            & (row[1].ToString() == "HdrCol2")
            & (row[2].ToString() == "HdrCol3")
            & (row[3].ToString() == "HdrCol4"))
        {
            Dts.Variables["varTabName"].Value = tabName + "A"
                + iCnt.ToString() + ":D1048576";
            break;
        }
    }
    cn.Close();
    Dts.TaskResult = (int)ScriptResults.Success;
}

Wednesday, January 25, 2012 - 11:10:42 AM - KawasakiNinja636 Back To Top (15775)

Hello Vamsuli

Thanks for a great article, this look exactly what I need.  Unfortunately I cannot see all of the C# code you have in the script box, can you send me this please?

Thanks

Steven.


Thursday, December 2, 2010 - 1:41:18 PM - Vamsi Back To Top (10411)

Hi, 

I believe you already have the data in a SQL table, one of the ways of accomplishing your task on SQL 2005/2008 is below, hope this helps.

 

 

Create table #tmp (ID int, Term int, class varchar(10))

insert into #tmp(ID, Term, class)

values

(304, 20103, 'EE2'), (156, 20102, 'CS3'), (167, 20101, 'MC1'), (167, 20101, 'SE3')


Select ID, Term, LEFT(Class, Len(class)-1) Class from 

(

Select distinct t1.ID, t1.term, (Select t.class +',' As [text()]

 from #tmp t where t.ID = t1.ID and t.Term = t1.Term

 FOR XML PATH('')) class

 from #tmp t1

 ) v

 


Thursday, December 2, 2010 - 12:32:52 PM - Mounish Back To Top (10410)

Hi,

I have a table like this

  ID   TERM   CLASS
---------------------------
 304    20103      EE2
 156      20102    CS3
 167    20101    MC1
 167    20101    SE3

As you can see, for the ID 167, there are two different values 'MC1' & 'SE3' under CLASS column. I want to create a query which coalesces
'MC1' & 'SE3' into one field, I would like the resulting table to be like

  ID   TERM   CLASS
---------------------------
 304    20103      EE2
 156      20102    CS3
 167    20101    MC1,SE3

Anu help?















get free sql tips
agree to terms