solving sql server problems for millions of dbas and developers since 2006



SQL Server DBA Tips SQL Server Developer Tips SQL Server Business Intelligence Tips SQL Server Career Tips SQL Server Tip Categories SQL Server Tutorials SQL Server Webcasts SQL Server Whitepapers SQL Server Tools SQL Server Questions and Answers MSSQLTips Authors About MSSQLTips SQL Server User Groups MSSLQTips Giveaways MSSQLTips Advertising Options

MSSQLTips Facebook Page MSSQLTips LinkedIn Page MSSQLTips RSS Feed MSSQLTips Twitter Page MSSQLTips Google+ Page





SQL Product Highlight

Red Gate Software - SQL Monitor

SQL Server performance monitoring and alerting - SQL Monitor offers an easy entrance to advanced server monitoring with a simple design that's a refreshing change from the status quo. Red Gate have added custom metrics and user roles to the product without spoiling its ease-of-use, to help you answer that timeless question, 'How healthy are your servers?'

Learn more!




Merge multiple data sources with SQL Server Integration Services

By: | Read Comments (6) | Print

Greg is the President of Edgewood Solutions and a co-founder of MSSQLTips.com.

Related Tips: More

Problem
When loading data into SQL Server you have the option of using SQL Server Integration Services to handle more complex loading and data transforms then just doing a straight load such as using BCP.  One problem that you may be faced with is that data is given to you in multiple files such as sales and sales orders, but the loading process requires you to join these flat files during the load instead of doing a preload and then later merging the data.  What options exist and how can this be done?

Solution
SQL Server Integration Services (SSIS) offers a lot more features and options then DTS offered.  One of these new options is the MERGE JOIN task.  With this task you can merge multiple input files into one process and handle this source data as if it was from one source.

Let's take a look at an example of how to use this. 

Here we have two source files an OrderHeader and an OrderDetail.  We want to merge this data and load into one table in SQL Server called Orders.

OrderHeader source file.

OrderDetail source file

Orders table


Building the SSIS Package

First create a new SSIS package and create the three Connections that we will need.

  1. Flat File Source 1 - OrderHeader
  2. Flat File Source 2 - OrderDetail
  3. OLE DB Destination - SQLServer

Then add a DATA FLOW task.

Next we need to build our load from these two flat file sources and then use the MERGE JOIN task to merge the data.  So the Data Flow steps would look something like this.

At this point if you try to edit the MERGE JOIN task you will get the below error.  The reason for this is because the data needs to be sorted for the MERGE JOIN task to work.  We will look at two options for handling this sorting need.


Option #1 - Data is presorted prior to loading the data.

Let's assume that are data is sorted prior to loading.  We therefore need to tell SSIS this is the case as well as show which column the data is sorted on.  First if you right click on "Flat File Source" and select the "Show Advanced Editor".  On the Input and Output Properties tab you need to change the "IsSorted" to True for both of the Flat File Sources.

Next you need to let SSIS know which column is the SortKey.  Here we are specifying the OrderID column.  This also needs to be done for both of the flat file sources.

Once this is complete you will be able to move on with the setup and select the input process as shown below.

From here you can select the columns that you want to have for output as well as determine what type of join you want to employ between these two files.

Lastly you would need to add your OLE Destination, select the table and map the columns to finish the process.


Option #2 - Source data is not sorted

With this load process, let's assume the source data is not sorted first, so we need to use the SORT task to sort the data prior to using the MERGE JOIN task.  The following shows our Flat File sources and then a SORT task after each one of these and then lastly our MERGE JOIN task.

If you right click the Sort task and select Edit you will get a screen such as following.  Here you need to select which column the data should be sorted on.  This needs to be done for both of the flat source files.

After this is done you can move on and finish the load process.  The MERGE JOIN works just like it was stated above as well as the OLE DB Destination.

Next Steps

  • As you can see this process is not all that complicated, but not knowing where and how to set the IsSorted parameter could get frustrating.
  • This gives you two different options that you can use to employ the MERGE JOIN task.  The ideal process would be to presort the data, so this extra step can be eliminated when loading your data.  Also, if the files are large the Sort task could be a time consuming task.


Related Tips: More | Become a paid author


Last Update: 9/5/2007

Share: Share 






Comments and Feedback:

Friday, March 19, 2010 - 10:32:21 AM - darck1 Read The Tip

Is there a way I can merge join without it first looking at every record in each datasource? If I have over 1,000,000 records in the left table and 34 in the right table but the join is the only method of selection then I don't want to have to process a million records.


Thursday, September 09, 2010 - 4:01:13 PM - Netty Read The Tip
EXTREMELY well written!  Thanks for sharing! :)


Wednesday, February 16, 2011 - 10:07:50 AM - swingnchad Read The Tip

Nice intro to BIDS and SSIS!  This is a very good place to start.


Friday, March 11, 2011 - 5:46:41 PM - Anonymous Read The Tip

Sunday, April 15, 2012 - 7:42:11 PM - Shahid Read The Tip

Very well written. Screenshots are really very helpful to easily understand. Very well done (Y)


Saturday, April 28, 2012 - 4:14:48 PM - td Read The Tip

The sorting stage seems to take forever on 4 million rows on each file.  Is that sound normal?  Is there a better way to do this? How about using for loop or the multi flat.file container?  thoughts?



Post a Comment or Question

Keep it clean and stay on the subject or we may delete your comment.
Your email address is not published. Required fields are marked with an asterisk (*)

*Name   *Email   Notify for updates
Comments
*Enter Code refresh code


 
Sponsor Information
"Amazing, Amazing, Amazing! SQL doctor is truly one of the most powerful tools I have seen."

SQL Monitor – For database professionals who need results on Day One. Try it online.

Need SQL Server help and not sure where to turn? Reach out to the Edgewood consultants for a Health Check.

Get SQL Server Tips Straight from Kevin Kline.

Join the over million SQL Server Professionals who get their issues resolved daily.

Learn SQL Server 2012, Performance Tuning, Development, Administration, Replication and more - free webcasts


Copyright (c) 2006-2012 Edgewood Solutions, LLC All rights reserved
privacy | disclaimer | copyright | advertise | about
authors | contribute | feedback | giveaways | user groups
Some names and products listed are the registered trademarks of their respective owners.


Edgewood Solutions LLC | MSSharePointTips.com | MSSQLTips.com