Dynamically find where table data starts in Excel using SSIS

Problem

Recently I worked on a project to import Excel spreadsheets from various vendors into our database and not all spreadsheets had the same row number for the header record. This is because most of the vendors had additional information on the top of the header row. In this tip I am going to show how the SSIS script task can be used to solve this issue.

Solution

In order to know where the header record starts we need to open the spreadsheet and search for the headers cell by cell, in this tip I am going to show how we could accomplish this with SSIS script task. Lets look at a sample spreadsheet, the structure is changed for simplicity purposes.

using ssis script task

In the example the header row starts at row 5, to get this information during run time of the SSIS package we used a script task before the data flow task. Below is how the SSIS package Control Flow looks like.

 the SSIS package Control Flow

Add a new string variable called “varTabName”, and include this variable as a ReadWriteVariable for the Script Task. Add the following code to Script Task

public void Main()
{
string filePath = “C:\\MSSQLTips\\MSSQLTip1.xlsx”;
string tabName = “Sheet1$”;
String strCn = “Provider=Microsoft.ACE.OLEDB.12.0;Data Source=”
+ filePath + “;Extended Properties=\”Excel 12.0;HDR=NO;IMEX=1\”;”;
OleDbConnection cn = new OleDbConnection(strCn);
string strSQL = “Select * From [” + tabName + “A1:D100]”;
int iCnt = 0;
OleDbDataAdapter objAdapter = new OleDbDataAdapter(strSQL, cn);
DataSet ds = new DataSet();
objAdapter.Fill(ds, “dSheet1”);
DataTable dt = ds.Tables[“dSheet1”];
foreach (DataRow row in dt.Rows)
{
iCnt = iCnt + 1;
if ((row[0].ToString() == “HdrCol1”)
& (row[1].ToString() == “HdrCol2”)
& (row[2].ToString() == “HdrCol3”)
& (row[3].ToString() == “HdrCol4”))
{
Dts.Variables[“varTabName”].Value = tabName + “A”
+ iCnt.ToString() + “:D1048576”;
break;
}
}
cn.Close();
Dts.TaskResult = (int)ScriptResults.Success;
}

For simplicity reasons the file path and the tab name of the excel spreadsheet have been hardcoded. We use an OLEDB Connection to connect and query the spreadsheet on a range of cells. Since our sample spreadsheet has 4 columns of data and since we know that the header is always in the top 100 rows we can use the following query “select * from Sheet1$A1:D100” to pull this data into a data set. We then loop through the rows of the data set until we find the Header row, then we append the counter to the tab name and set the varTabName variable. For this example the value of the varTabName variable would be Sheet1$A5:D1048576, where 1048576 is the maximum number of rows for excel 2007.

In the Data flow task, choose Excel as the source and for Data access mode use “Table name or view name variable” option and in the Variable Name drop down choose User::varTabName

choose Excel as the source and for Data access mode use "Table name or view name variable" option and in the Variable Name drop down choose User::varTabName

Below is the output from the data viewer grid for the data flow task.

the output from the data viewer grid for the data flow task

We are now able to import the data from the spreadsheet by dynamically setting the data range in the SSIS script task.

Next Steps

  • Add exception handling for the script task.
  • We can use for Each loop container to loop though multiple files, the file path needs to be a variable which has to be passed to the Script Task.
  • We can use GetOleDbSchemaTable to find all the tabs and loop through those.

One comment

  1. Hi, Thanks for the script. It works fine for me when I try to run it using visual studio, but when we scheduled it as a job on server with is 2010 I get below errow – Error: 2023-08-02 10:12:16.20
    Code: 0x00000001
    Source: Loop through the Excel to find the row where main data starts
    Description: Exception has been thrown by the target of an invocation.
    End Error

    Can you help me fixing this ?

Leave a Reply

Your email address will not be published. Required fields are marked *