Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Synchronize Table Data Using a Merge Join in SSIS


By:   |   Read Comments (8)   |   Related Tips: More > Integration Services Data Flow Transformations

Attend this free live MSSQLTips webcast

SQL Server Performance Tuning Tips and Tricks
click here to learn more


Problem

We have a need to synchronize data in tables across SQL Server instances. There are multiple ways this can be done such as using link servers, replication, TableDiff utility, etc., but in this tip I am going to demonstrate one-way synchronization using an SSIS Merge Join.

Solution

I have two SQL Server instances: serverA and serverB. I also have two different databases, TestingA on serverA and TestingB on serverB. Both of these databases have the same data in table Customer_Detail with a unique key of custID.   Then we changed some of the data on serverA and now I want the data on serverB to be the same for reporting purposes, so based on that I will demonstrate how to synchronize this data from serverA to serverB using a Merge Join in SSIS.

Below is what my SSIS Package looks like. I will discuss each of the steps in the package.

SSIS Package Synchronization

Step 1: Initiate SQL Connections

I have two OLE DB Sources, one for serverA (source) and the other for serverB (destination). Below are how these connections are setup.

Source connection: serverA

serverA - Description: serverA connection

Source connection: serverB

serverB - Description: serverB connection

Step 2: Apply sorting on tables

When applying a Merge Join, the data needs to be sorted for both inputs, hence I am applying a sort operation to both sides. An alternative to this step is to pre-sort the data in a query and use a SQL command instead of table. This needs to be done for both inputs.

SSIS Data sorting - Description: apply sorting

Step 3: Applying Merge Join

I have applied a full outer join on the unique column custid in the Merge Join. I need to get NULL records from the right and left tables where the custid does not match and this will determine if we need to perform a delete or insert operation.

Full Outer Join as Merge Join Transformation Editor - Description: apply merge join

Step 4: Split the Records

In the next step, I have applied a conditional split.  I have added three different conditions based on the full outer join result. These are the three conditions:

  1. UPDATE (custID match): If the custID exists on both the source and destination then we do an update.
  2. INSERT (custid not found in destination): If custID is found only on the left (source) then we need to insert this data into the destination table.
  3. DELETE (custID is not found in source): If custID is not found in the source table then we need to delete this data from the destination table.
Conditional Split Transformation Editor - Description: split operations

Step 5: Find Updated Record

As mentioned above the UPDATE (custID Match) condition, it returns the data where custID matches. We don't know if the data has changed on the source, so to minimize updating every record I have added a conditional split that compares each column to see if any data is different.  If it is, then we will do the update and if not we will not update the record.  Only the records that are different will go to Step 6.

Condition

(custName! = [custName (1)]) || (custAddress!= [custAddress (1)]) || (pincode != [pincode (1)]) || (buaddress != [buaddress (1)]) || (varmobile != [varmobile (1)]) || (dtentrydate != [dtentrydate (1)]) || (ref_userid != [ref_userid (1)])
finding updated records in the Conditional Split Transformation Editor - Description: finding updated records

Step 6: Performing Insert, Update and Delete to Destination Table

So we now have three streams of data to do either an Insert, Update or Delete.

Insert Records

I have an OLE DB Destination connection object Customer_detail and mapped the input columns as follows.

Insert the data
Mapping data

Update Records

For updating the data, I have an OLE DB Destination connection and added a sqlcommand as follows using the Advanced Editor.

updataing data - Description: Conditional base 
Updating the records
updating the data - Description: Updating the data

After initiating the command with parameter ?, I then applied the input column mapping with available parameter destination columns based on the destination table.

Mapping columns - Description: Mapping columns

DELETE Records

For deleting data, I have an OLE DB Destination connection and added a sqlcommand as follows using the Advanced Editor.

Deleting the record - Description: Delete the records
configure command - Description: delete the records

For the delete record the column mapping is as follows.

Mapping the data - Description: Mapping columns

Running the Package

After setting up all of the above steps, randomly changing some of the data, I then run the package and get the following.

Synchronization  - Description: Package execution

Conclusion

As per the above package synchronization, I have performed insert, update and delete operations with data from the source to the destination. In SSIS, there are also other methods available for synchronization data such as using the tablediff utility in SSIS. I suggest you find the method that works best for your business needs.

Next Steps


Last Update:


next webcast button


next tip button



About the author
MSSQLTips author Bhavesh Patel Bhavesh Patel is a SQL Server database professional with 10+ years of experience.

View all my tips





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Tuesday, May 29, 2018 - 10:41:52 AM - Mark Landry Back To Top

PS This behavior occurs for me b/c I chose to sort the data at the source ("order by") and removed the sort tasks from the package. The sort tasks will inherently read in all records before completing. So those tasks effectively buffer all input rows and there's no chance of the catch-22 I encountered.


Wednesday, May 16, 2018 - 2:34:48 PM - Mark Landry Back To Top

Overall this is an excellent example and saved me a good deal of effort. But it has one major fault that's not obvious.

If the target table is larger than the SSIS buffer, then the package could hang. That's because the SELECT from the target table (ServerB OLEDB Source) will transactionally block all subsequent INSERTs and UPDATEs. Since the SSIS pipeline is full, no more rows can be read from the target table and the package hangs indefinitely. It's not a database deadlock because the SELECT is free to continue but SSIS won't read any more rows because the buffer is full.

My solution was to increase DefaultMaxBufferSize and DefaultMaxBufferRows to accomodate the entire source table into the pipeline.

See https://www.desertislesql.com/wordpress1/?p=318


Tuesday, May 08, 2018 - 10:27:20 AM - Patel Bhavesh Back To Top

Hi Priya,

You can go with the primary key column but the case is every time same data updates if you not got the actual updated record.
There are multiple option you can keep and used method as per appropiate your buisness scenario..


 


Monday, May 07, 2018 - 4:20:45 PM - PRIYA Back To Top

 HELLO,

In the example that you have given there are only 7 columns so in the conditional split you have given all the 7 columns.  but the table im working on has 60+ columns.

do I have to give all the 60+ columns or is it okay if i just give the primary key columns?

Appreciate your response

 


Thursday, March 29, 2018 - 11:02:07 PM - Daniel Dickler Back To Top

Well done and very useful! Thanks for sharing!


Thursday, February 15, 2018 - 3:42:21 AM - Alex Tabone Back To Top

Many thanks for sharing this tutorial. One small correction: in Step 6, under "Update Records", it should be "OLE DB Command" (not "OLE DB Destination connection").


Sunday, December 03, 2017 - 12:20:50 AM - Bhavesh Patel Back To Top

Thanks Gerardo Arevalo for this advice. some times this sorting operatior cost created problem in server so with respect of this, I have mentioned alernate in step2. It is better to use T-SQL with Order statement instead of sorting component. for more clearification I have added reference of such discussion..

https://social.msdn.microsoft.com/Forums/sqlserver/en-US/231079cb-9aad-4c0f-b126-73d2acee1629/tuning-ssis-sort-transformation-vs-using-tsql-order-statement?forum=sqlintegrationservices

 

 

 

 

 

https://social.msdn.microsoft.com/Forums/sqlserver/en-US/231079cb-9aad-4c0f-b126-73d2acee1629/tuning-ssis-sort-transformation-vs-using-tsql-order-statement?forum=sqlintegrationservices

 


Tuesday, November 21, 2017 - 10:06:07 AM - Gerardo Arevalo Back To Top

Nice post, well documented. But, a  notice of caution before trying to implement that on a production server: sort and merge join components run entirely on memory, the SSIS server must have plenty enoung free memory to sort (twice) all the data, and then compare between them (join). I have seen to fail implementations like this while in dev boxes ran succesfuly as the (T/SQL) load in the prod server usually is ten times the dev load, letting less usuable memory for SSIS operations.


Learn more about SQL Server tools