Using SQL Server Integration Services to Generate Excel Files Based on Criteria
I have a SQL Server table containing a large dataset and I am interested in creating one Excel file for every product category name using SQL Server Integration Services (SSIS).
Let's start by taking a look at my sample dataset which is taken from AdventureworksDW2016CTP3. When I count the rows, I have approximately 60K records. The column that I am particularly interested in is 'EnglishProductCategoryName' which contains the Category group Name. This is the column that I am interested in grouping my data by and having SSIS generate an individual file for each category.
On a Distinct Count, I can see that there are 3 Product Categories (Accessories, Bikes and Clothing), which means that I will want 3 Excel files with all of the aggregated Detail Data listed above.
Getting Started with the SSIS Package
I'll start by creating a new SSIS Project in Visual Studio SSDT called 'Using SSIS to Generate Excel Files'.
Next, I'll need to set the Run64BitRuntime Debugging property to 'False' by right-clicking on the SSIS project and navigating to 'Properties'.
This will prevent execution errors later since we are generating files using Excel and need a 32bit runtime.
Secondly, I'll need to download and Install the Microsoft Access Database Engine Redistributable (Note: I selected the 32bit version for my solution).
Designing the SSIS Package
I'll start by adding a new OLE DB data source by right-clicking the connection manager section and specifying my connection as 'AdventureworksDW2016CTP3'.
Next, I'll create an Excel connection by right clicking on the connection manager and pointing to the folder where my excel file resides:
The Excel connection will have to contain the following Expression for the connection string property so that the connection is updated dynamically at run-time. I'll do this by right clicking the newly set up Excel connection and clicking properties.
Under Properties, navigate to Expressions and click the … icon. Then select Connection String in the Property Expressions Editor and add the user variables below containing the Folder Path and client Id. This will create a dynamic file and folder path that is set at run time based on the query parameter.
I'll then create the following Variables, by clicking the variables icon in the top right-hand corner:
Next, I'll add an Execute SQL Task to Retrieve the Top Level which will basically contain a distinct count of the Product Category Names. I'll do this by dragging and dropping an Execute SQL Task onto the Control Flow canvas. I'll then double click the Execute SQL Task, set the connection to my AdventureWorksDW2016CTP3 OLE DB Source, click the … near SQL Statement and then enter the Distinct Select statement below.
I'll also add the following Result Set Variable name to the user variable named TopLevelClientID. This is critical because it will store the distinct EnglishProdutCategoryNames from the SQL Query into a variable which will be used in further processing steps of the SSIS package.
Once this is completed, I will add a For Each Loop Container to Enumerate the Individual Top Level. From the SSIS toolbox, drag and drop a For Each Loop container to the control flow canvas:
Double click the For Each Loop container. Under Collection, I made sure to specify the enumerator as a Foreach ADO Enumerator and selected the appropriate User Variable.
Under Variable Mappings, I added the User:ClientID variable. This is important because the package will process every ClientID within the For Each loop container which will ensure that when the Excel files are generated, every file will have the correct records in the file that are grouped by and associated with the distinct EnglishProductCategoryNames that we initially specified in our DISTINCT SQL query:
Within the For Each Enumerator Loop, I added an Execute SQL Task to Create a new Excel File with the following details for the Connection Type and SQL Statement (Note that the columns must match the OLE DB Source columns exactly for the mapping to be accurate). Additionally, I specified my data types as VARCHAR(255) for standardization and simplicity. This may be altered as you see fit for your specific data source and types.
CREATE TABLE 'ProductCategories' ( 'ProductCategoryName' VARCHAR(255), 'Model' VARCHAR(255), 'CustomerKey' VARCHAR(255), 'Region' VARCHAR(255), 'Age' VARCHAR(255), 'IncomeGroup' VARCHAR(255), 'CalendarYear' VARCHAR(255), 'FiscalYear' VARCHAR(255), 'Month' VARCHAR(255), 'OrderNumber' VARCHAR(255), 'LineNumber' VARCHAR(255), 'Quantity' VARCHAR(255), 'Amount' VARCHAR(255) )
The last step of the package design is to add a Data Flow task that will create the Excel Files from my initial 60k+ record set. To do this, I will drag and drop an OLEDB Source task from the SSIS Toolbox onto the Data Flow Task canvas. I will also drag and drop an Excel Destination task from the SSIS Toolbox onto the Data Flow Task canvas. I will then connect the OLEDB Source to the Excel Destination.
Lastly, I will double-click the OLEDB Source Task labeled 'Data_WithSourceInfo_OLEDBSource' and enter the following SQL command as the source with a Where clause =?. The ? is to allow a Parameterized Query. The parameterized query is to allow a type-specific value when replacing the ? with their respective value. In our scenario, the value for 'EnglishProductCategoryName' is the value that we will be passing into this query.
SELECT [EnglishProductCategoryName], [Model], [CustomerKey], [Region], [Age], [IncomeGroup], [CalendarYear], [FiscalYear], [Month], [OrderNumber], [LineNumber], [Quantity], [Amount], FROM [AdventureworksDW2016CTP3].[dbo].[vDMPrep] WHERE [EnglishProductCategoryName] = ?
Next, I click Parameters and add the following Input Parameter mappings with the appropriate User Variable as follows:
I then double-click the Excel Destination Task to open it and Select the Excel Sheet 'ProductCategories$':
Finally, I will map the input columns to the destination. Note that the destination columns will not match the Input Column headers, but they must be mapped in the correct sort order.
Running the SSIS Package
We've now covered all the design elements of this SSIS package. It's now time to run the package by right-clicking the SSIS Package in the Solution Explorer and clicking 'Execute Package'.
Here is what the completed and successfully executed control flow will look like:
Exploring the Excel Files
After the package runs successfully, I will now have 3 files in my folder, which was specified in the variables section of the SSIS package.
I now have one Excel file for each product category.
When I open the Accessories Product Category file, I now see all the records from my original SQL Query grouped by the Product Category.
For example, Accessories has 36,092 records in both the SQL Query as well as the Excel File:
- In this article, we developed an SSIS package that takes a SQL Server table containing a category name and we created an Excel file containing the records for each category by using variables, dynamic connection strings, and for each loop in SSIS.
- Remember to download the Microsoft Access Database Engine Redistributable to prevent any issues with generating Excel files using SSIS.
About the author
View all my tips