Intelligent Conditional Split in an SSIS Package
Conditional Split is one of the most commonly used transforms in any SSIS Package development routine. The limitation with conditional split is that any record that satisfies the first condition from the conditions list is routed to its corresponding path exclusively. In a logical sense, a record might satisfy more than one condition and one might want the record to be routed to all paths for processing, but with the Conditional Split transform this is not possible. We need a more intelligent conditional split where we can selectively route the records to more than one output path. In this tip we will look at how to facilitate this intelligent conditional split.
We already know the problem in question and the solution that we are looking for can be facilitated using a synchronous script transform.
Say we have a list of employees having EmployeeID from 1 to 40. The requirement we have is that employees from 1 to 10 need to be processed by a particular logic, employees from 1 to 20 by another logic, 1 to 30 by another logic and 1 to 40 by another logic. If we have to do it using regular transforms, the easiest way I would think is to use a Multicast Transform and place 4 conditional split transforms. But this is not efficient as I would have to keep adding a conditional split transform for each range due to the limitation of this transform to direct rows to more than one output path.
Script transformation is one such transform which can provide intelligent split functionality. We will be using the Employee table from the AdventureWorks database, so I take it for granted that you have this available on your development machine.
To build the solution, follow the below steps:
1) Create a new SSIS project and add a new package.
2) Add a Data Flow task to the package and edit it.
3) Add an OLE DB Adapter to this data flow task and configure it to read the Employee table from the AdventureWorks database.
4) Add a Script transformation next to it. Edit the properties and on the Input Columns page select all of the columns.
5) On the Inputs and Outputs page, configure it to have four output paths named "1stBatch", "2ndBatch", "3rdBatch" and "4thBatch" as shown below.
6) Configure the SynchronousInputID property for each output path to point to the only available input path by selecting from the dropdown. Give a non-zero value to ExclusionGroup for all the output paths. For this example, assign a value of "1".
After you are done with the editing, your Inputs and Outputs page should look something like the below figure.
7) Once you make a Script output synchronous, the DirectRowTo[OutputPathName] method becomes available for each synchronous output path on the Row object as shown in the below code. To implement the code for our requirement, edit the Script and add the code as shown in the below figure.
8) Now add four multicast transforms to the code and join each corresponding output path from the script transform to its relevant multicast transform. After following these steps, your package should look like the below figure.
Execute the package and you should be able to see the result as shown in the below figure, which is exactly what we wanted. Using just a single transform we are able to validate the logic and route the same record to multiple output paths based on the requirement.
- Implement this solution in your existing and/or future packages.
- Try changing the value of ExclusionGroup property back to "0" for any output path and see if the DirectRowTo[OutputPath] method still remains available.
About the author
View all my tips
Article Last Updated: 2010-10-14