Merge multiple data sources with SQL Server Integration Services

By:   |   Comments (32)   |   Related: More > Integration Services Data Flow Transformations


Problem

When loading data into SQL Server you have the option of using SQL Server Integration Services to handle more complex loading and data transforms then just doing a straight load such as using BCP.  One problem that you may be faced with is that data is given to you in multiple files such as sales and sales orders, but the loading process requires you to join these flat files during the load instead of doing a preload and then later merging the data.  What options exist and how can this be done?

Solution

SQL Server Integration Services (SSIS) offers a lot more features and options then DTS offered.  One of these new options is the MERGE JOIN task.  With this task you can merge multiple input files into one process and handle this source data as if it was from one source.

Let's take a look at an example of how to use this. 

Here we have two source files an OrderHeader and an OrderDetail.  We want to merge this data and load into one table in SQL Server called Orders.

OrderHeader source file

customer

OrderDetail source file

amount

Orders table

column name

Building the SSIS Package

First create a new SSIS package and create the three Connections that we will need.

  1. Flat File Source 1 - OrderHeader
  2. Flat File Source 2 - OrderDetail
  3. OLE DB Destination - SQLServer
connection managers

Then add a DATA FLOW task.

data flow task

Next we need to build our load from these two flat file sources and then use the MERGE JOIN task to merge the data.  So the Data Flow steps would look something like this.

flat file source

At this point if you try to edit the MERGE JOIN task you will get the below error.  The reason for this is because the data needs to be sorted for the MERGE JOIN task to work.  We will look at two options for handling this sorting need.

microsoft visual studio

Option #1 - Data is presorted prior to loading the data.

Let's assume that are data is sorted prior to loading.  We therefore need to tell SSIS this is the case as well as show which column the data is sorted on.  First if you right click on "Flat File Source" and select the "Show Advanced Editor".  On the Input and Output Properties tab you need to change the "IsSorted" to True for both of the Flat File Sources.

flat file source output

Next you need to let SSIS know which column is the SortKey.  Here we are specifying the OrderID column.  This also needs to be done for both of the flat file sources.

common properties

Once this is complete you will be able to move on with the setup and select the input process as shown below.

output

From here you can select the columns that you want to have for output as well as determine what type of join you want to employ between these two files.

transformation editor

Lastly you would need to add your OLE Destination, select the table and map the columns to finish the process.

merge join

Option #2 - Source data is not sorted

With this load process, let's assume the source data is not sorted first, so we need to use the SORT task to sort the data prior to using the MERGE JOIN task.  The following shows our Flat File sources and then a SORT task after each one of these and then lastly our MERGE JOIN task.

flat file source

If you right click the Sort task and select Edit you will get a screen such as following.  Here you need to select which column the data should be sorted on.  This needs to be done for both of the flat source files.

order id

After this is done you can move on and finish the load process.  The MERGE JOIN works just like it was stated above as well as the OLE DB Destination.

flat file source
Next Steps
  • As you can see this process is not all that complicated, but not knowing where and how to set the IsSorted parameter could get frustrating.
  • This gives you two different options that you can use to employ the MERGE JOIN task.  The ideal process would be to presort the data, so this extra step can be eliminated when loading your data.  Also, if the files are large the Sort task could be a time consuming task.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Greg Robidoux Greg Robidoux is the President and founder of Edgewood Solutions, a technology services company delivering services and solutions for Microsoft SQL Server. He is also one of the co-founders of MSSQLTips.com. Greg has been working with SQL Server since 1999, has authored numerous database-related articles, and delivered several presentations related to SQL Server. Before SQL Server, he worked on many data platforms such as DB2, Oracle, Sybase, and Informix.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Wednesday, August 24, 2016 - 6:41:40 AM - Praveen Dube Back To Top (43175)

 OK THANKS FOR HELP

 


Wednesday, June 17, 2015 - 4:10:44 AM - Ajay Khatri Back To Top (37941)

Very Helpful....!!!!!!!!!!!!!!!!

 


Tuesday, February 18, 2014 - 6:41:41 PM - Momo Back To Top (29496)

Thanks

 


Monday, January 27, 2014 - 6:19:07 PM - Jason Back To Top (29243)

How to you merge more then 2 tables using merge join, when joining the 3rd input source it gives the error saying " Cannot create connector. The destination component does not have any available inputs for use in creating path.

Please help


Thursday, January 2, 2014 - 3:19:29 AM - Dhanunjay Back To Top (27935)

It helped me a lot .

 

Thanks a lot

Dhanunjay


Friday, December 6, 2013 - 7:33:21 PM - Jennifer Back To Top (27723)

Thank you so much for these instructions, it was very helpful,  Where I'm getting into trouble is the third, fourth and fifth table I'm adding to the join.  I'm getting an error "All available inputs on target component are connect to outputs.  Edit this component to define new inpute before processing.  Any suggestions?


Wednesday, April 24, 2013 - 7:20:33 AM - Greg Robidoux Back To Top (23537)

@Ramesh - yes you can use a merge join if you want.  Just follow the example above and substitute your Dept table for the OrderHeader and your other table for OrderDetail.


Wednesday, April 24, 2013 - 6:37:20 AM - Ramesh Back To Top (23536)

 

Thanks for that suggestion,

but in my lookup transformation haveing reference data of Dept table,

Am New to SSIS  i have a doubt in merge join we have one source(like emp table which is oledb source) and

one transformation(like Dept table which is a lookup transformation this transformation having reference data of dept) and

i have one destination (which is oledb destination).

now my Requirement is can i use MERGE JOIN in place of Transformation.

different databases

How can i achieve this.


Tuesday, April 23, 2013 - 10:06:06 AM - Greg Robidoux Back To Top (23516)

@Ramesh - if both of these tables are in the same database then you can just do a join instead of using an SSIS task.

It is hard to tell exactly what you need to do from your description.


Tuesday, April 23, 2013 - 6:10:57 AM - Ramesh Back To Top (23503)

 

 

Hi Friends,

Am New to SSIS  i have a doubt in merge join we have one source(like emp table which is oledb source) and

one transformation(like Dept table which is a lookup transformation) and

i have one destination (which is oledb destination).

now my Requirement is can i use MERGE JOIN in place of Transformation.

How can i achieve this.


Friday, April 12, 2013 - 4:19:29 PM - Greg Robidoux Back To Top (23337)

@arun - are these tables all SQL Server tables or are they from some other source?

If they are all SQL Server tables you could just write a SQL statement joining the tables and not use the MERGE task.


Friday, April 12, 2013 - 2:56:47 PM - Arun Back To Top (23335)

@greg - thanks for your response. Could you elaborate on that please? I am trying to use merge join, and for that sorting all the data in all the tables and their resultant output, consequently, it eats up a lot of time and ends up complaining about buffer. The tables that I read from for instance have all got half a million records.


Friday, April 12, 2013 - 9:49:47 AM - Greg Robidoux Back To Top (23328)

@arun - you should be able to do multiple merges to do more than 2 sources.


Friday, April 12, 2013 - 2:58:50 AM - Arun Back To Top (23321)

Hi,

 

I am new to SSIS, is it possible to copy data from 4 source tables to one destination table using SSIS? I can see from 2 to 1 is possible.

 

Thanks.


Thursday, March 21, 2013 - 3:53:03 AM - Nagaraj G Back To Top (22927)

Very well written. Screenshots are really very helpful to easily understand. Very well done (Y)

 


Saturday, March 9, 2013 - 8:11:32 PM - kerany Back To Top (22683)

@Robert Bosco, you can use the conditional split component to load your data into different tables with condition


Monday, November 26, 2012 - 7:17:36 AM - Greg Robidoux Back To Top (20534)

@mita - yes you should be able to join two sets of data based on SSN.  Just make sure the data is sorted by SSN.


Friday, November 23, 2012 - 4:05:30 PM - mita Back To Top (20498)

I am trying to create a single row from two selects using merge join in SSIS and the join key is SSN from two selects. Can it be done using this example ?


Friday, August 10, 2012 - 4:07:43 PM - Jason Back To Top (18996)

Excellent tip.


Monday, July 23, 2012 - 1:29:42 AM - vijay Back To Top (18736)

Very Nice artical for basic learners.... easy to understand


Thursday, July 12, 2012 - 1:34:06 PM - AJ Back To Top (18479)

How can I transfer only the data are in flat source but not in flat source 1? I mean if I had a row in OrderHeader  like this :

order id  order date ...

5           7/12/12 ...

that this order id (5) is not in Order detail table. but I want transfer only this record (order id = 5) to detination.


Tuesday, July 10, 2012 - 11:15:34 AM - Greg Robidoux Back To Top (18424)

Robert Bosco - see if you can use a Multicast task after the Merge Join task.


Tuesday, July 10, 2012 - 9:48:40 AM - Robert Bosco JOHN PETER Back To Top (18423)

Greg Robidoux  - I understood the concept that you explained above and I had also followed.

In OLE DB Destination part, I can have selected either only one table or one view but my question is that I ought to be use multiple destination tables. 



Tuesday, July 10, 2012 - 8:33:04 AM - Greg Robidoux Back To Top (18420)

Robert Bosco - you should be able to follow the process above to do what you are asking.


Tuesday, July 10, 2012 - 5:04:20 AM - Robert Bosco JOHN PETER Back To Top (18415)

I  want to merge multiple flat file data and load into multiple table in SQL Server.

Could you please advice me.

Have a nice day 


Thursday, June 28, 2012 - 3:27:04 AM - Meinathan Back To Top (18240)

Very good.... Thanks for sharing.


Saturday, April 28, 2012 - 4:14:48 PM - td Back To Top (17177)

The sorting stage seems to take forever on 4 million rows on each file.  Is that sound normal?  Is there a better way to do this? How about using for loop or the multi flat.file container?  thoughts?


Sunday, April 15, 2012 - 7:42:11 PM - Shahid Back To Top (16927)

Very well written. Screenshots are really very helpful to easily understand. Very well done (Y)


Friday, March 11, 2011 - 5:46:41 PM - Anonymous Back To Top (13186)

Wednesday, February 16, 2011 - 10:07:50 AM - swingnchad Back To Top (12942)

Nice intro to BIDS and SSIS!  This is a very good place to start.


Thursday, September 9, 2010 - 4:01:13 PM - Netty Back To Top (10143)
EXTREMELY well written!  Thanks for sharing! :)


Friday, March 19, 2010 - 10:32:21 AM - darck1 Back To Top (5085)

Is there a way I can merge join without it first looking at every record in each datasource? If I have over 1,000,000 records in the left table and 34 in the right table but the join is the only method of selection then I don't want to have to process a million records.















get free sql tips
agree to terms