Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

How to make an SSIS Merge Join transformation fail safe from sorting irregularities


By:   |   Last Updated: 2010-03-08   |   Comments (2)   |   Related Tips: More > Integration Services Data Flow Transformations

Problem

The SQL Server Integration Services (SSIS) Merge Join transformation requires input data streams to be sorted for its operation. If during the design time you try to join two non-sorted inputs, it won't allow you to do so. One of the options is to sort the data in the database rather than using the Sort transformation in SSIS, which is a more speedier option. But if the sort order is changed unknowingly, SSIS would not raise an exception and the results would be unpredictable.

Solution

In any general SSIS Development approach that involved SQL Server as the source database, we use mostly parameterized stored procedures or views to fetch data from the source tables and sort it as per the requirement using the ORDER BY statement. For the discussion of this problem, we would consider stored procedures only. This approach of sorting data using ORDER BY introduces two different issues:

1) Stored procedures which sort data in this way, are not directly consumable for any other requirements that need the same data set but need a different sort order to use with Merge Join transformation. So the approach that one would take is, use Sort transformation in the SSIS package or recreate the same logic with a different sort order.

2) For some reasons, if one inadvertently changes to sort order or intentionally changes the sort order to tailor it for use in his/her own package and the original package which expects a particular sort order would not even realize that the package logic has been broken due to sort order changes.

The solution for this is to make the Sorting parameterized from the package. And in this tip we would simulate the problem and see how this solution can be implemented.

Follow the steps below to simulate the problem, and then we would look at how to solve this problem.

1) Download and install AdventureWorks database

2) Create a new SSIS Project, name it something meaningful. I did this test on the R2 version, and I named it Rnd R2.

3) We would use HumanResources.Employee and HumanResources.EmployeeAddress tables from the Adventureworks database for our test. Create two simple stored procedures as seen in the picture below.

management studio

4) Add a DataFlow Task to the package, and add two OLEDB Source Adapters. Name one of them as Employee and other as EmployeeAddress, and use the stored procedures we developed in both of these adapters accordingly. To use this stored procedure select "Data Access Mode" as "SQL Command" and key in "SQL command text" as "EXEC GetEmployees" for example.

5) Now right-click the source adapter and configure the sort hints as shown in the picture below. This provides hints to the stream that the data is sorted and sorting is done on EmployeeID. If you see carefully, we sorted data in both of our stored procedures using the same column.

advanced editor

6) Now add a Merge Join transformation, use Inner Join as the join type and join both of these sources. For our example, to keep things simple, we would just use Multicast as a destination. After you are done with this, your package should look something like the below picture.

visual studio

7) Modify the GetEmployeeAddress and change the Order By from EmployeeID to AddressID and see the result. If the GetEmployeeAddress is sorted by EmployeeId it returns 290 rows, and after changing the sorting in the stored procedure to AddressID, only 5 rows are returned as shown below.

visual studio

Now its time to implement our solution, and take control of the sorting into SSIS. For this we need to make the ORDER BY statement parameterized. Modify the GetEmployeeAddress as shown in the below picture. Also modify the "SQL Command Text" of EmployeeAddress Source Adapter to "EXEC GetEmployeeAddress 'EmployeeID'".

management studio

Execute the package and it should return again all the 290 rows. By implementing this solution, we take the control of sorting at the discretion of SSIS requirements, also making the stored procedure reuseable for all of the different packages that need to use the same logic, but different sorting. Also this reduces the risk of unpredictable behavior with the Merge Join transformation due to changes in the sort order.

Next Steps


Last Updated: 2010-03-08


next webcast button


next tip button



About the author
MSSQLTips author Siddharth Mehta Siddharth Mehta is an Associate Manager with Accenture in the Avanade Division focusing on Business Intelligence.

View all my tips




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Thursday, May 31, 2012 - 9:19:25 AM - Galina Back To Top

I have 60000 rows in my table and Merge operation works very slow.


Tuesday, November 02, 2010 - 7:36:46 AM - praveen,C Back To Top

Hi,

   I want to know how to perform the perfomance tuning on SSIS packages


Learn more about SQL Server tools