By: Vamsi Bogullu | Comments (21) | Related: 1 | 2 | 3 | More > Integration Services Excel
Problem
Recently I worked on a project to import Excel spreadsheets from various vendors into our database and not all spreadsheets had the same row number for the header record. This is because most of the vendors had additional information on the top of the header row. In this tip I am going to show how the SSIS script task can be used to solve this issue.
Solution
In order to know where the header record starts we need to open the spreadsheet and search for the headers cell by cell, in this tip I am going to show how we could accomplish this with SSIS script task. Lets look at a sample spreadsheet, the structure is changed for simplicity purposes.
In the example the header row starts at row 5, to get this information during run time of the SSIS package we used a script task before the data flow task. Below is how the SSIS package Control Flow looks like.
Add a new string variable called "varTabName", and include this variable as a ReadWriteVariable for the Script Task. Add the following code to Script Task
public void Main() { string filePath = "C:\\MSSQLTips\\MSSQLTip1.xlsx"; string tabName = "Sheet1$"; String strCn = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + filePath + ";Extended Properties=\"Excel 12.0;HDR=NO;IMEX=1\";"; OleDbConnection cn = new OleDbConnection(strCn); string strSQL = "Select * From [" + tabName + "A1:D100]"; int iCnt = 0; OleDbDataAdapter objAdapter = new OleDbDataAdapter(strSQL, cn); DataSet ds = new DataSet(); objAdapter.Fill(ds, "dSheet1"); DataTable dt = ds.Tables["dSheet1"]; foreach (DataRow row in dt.Rows) { iCnt = iCnt + 1; if ((row[0].ToString() == "HdrCol1") & (row[1].ToString() == "HdrCol2") & (row[2].ToString() == "HdrCol3") & (row[3].ToString() == "HdrCol4")) { Dts.Variables["varTabName"].Value = tabName + "A" + iCnt.ToString() + ":D1048576"; break; } } cn.Close(); Dts.TaskResult = (int)ScriptResults.Success; }
For simplicity reasons the file path and the tab name of the excel spreadsheet have been hardcoded. We use an OLEDB Connection to connect and query the spreadsheet on a range of cells. Since our sample spreadsheet has 4 columns of data and since we know that the header is always in the top 100 rows we can use the following query "select * from Sheet1$A1:D100" to pull this data into a data set. We then loop through the rows of the data set until we find the Header row, then we append the counter to the tab name and set the varTabName variable. For this example the value of the varTabName variable would be Sheet1$A5:D1048576, where 1048576 is the maximum number of rows for excel 2007.
In the Data flow task, choose Excel as the source and for Data access mode use "Table name or view name variable" option and in the Variable Name drop down choose User::varTabName
Below is the output from the data viewer grid for the data flow task.
We are now able to import the data from the spreadsheet by dynamically setting the data range in the SSIS script task.
Next Steps
- Add exception handling for the script task.
- We can use for Each loop container to loop though multiple files, the file path needs to be a variable which has to be passed to the Script Task.
- We can use GetOleDbSchemaTable to find all the tabs and loop through those.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips