Learn more about SQL Server tools

   
   






















































Connect with MSSQLTips




MSSQLTips Facebook Page MSSQLTips LinkedIn Page MSSQLTips RSS Feed MSSQLTips Pinterest Page MSSQLTips Twitter Page MSSQLTips Google+ Page

Use Caution with SQL Server's MERGE Statement

MSSQLTips author Aaron Bertrand By:   |   Read Comments (12)   |   Related Tips: More > T-SQL
Problem

SQL Server 2008 introduced the MERGE statement, which promised to be a simpler way to combine insert/update/delete statements, such as those used during ETL (extract, transform and load) operations. However, MERGE originally shipped with several "wrong results" and other bugs - some of which have been addressed, and some of which continue to exist even in the early preview releases of SQL Server 2014. People also tend to make some leaps of faith regarding atomicity - they don't realize that the single statement actually performs different operations separately, and thus can suffer from issues due to concurrency and race conditions just like separate statements can.

Solution

I have been recommending that - for now - people stick to their tried and true methods of separate statements. Here are the major reasons:

Bugs with the SQL Server Merge Statement

It can be quite difficult to validate and guarantee that you are immune from any of the bugs that still exist. A few Connect items that you should be aware of, that are either still active, closed as "Won't Fix"/"By Design", or have only been fixed in specific versions (often requiring a cumulative update or on-demand hotfix):

 
Connect issue Current status*
#773895 : MERGE Incorrectly Reports Unique Key Violations

(Active)

#771336 : Indexed view is not updated on data changes in base table

(Fixed in certain builds)

#766165 : MERGE evaluates filtered index per row, not post operation, which causes filtered index violation

(Won't Fix)

#723696 : Basic MERGE upsert causing deadlocks

(By Design)

#713699 : A system assertion check has failed ("cxrowset.cpp":1528)

(Won't Fix)

#699055 : MERGE query plans allow FK and CHECK constraint violations

(Active)

#685800 : Parameterized DELETE and MERGE Allow Foreign Key Constraint Violations

(Won't Fix)

#654746 : merge in SQL2008 SP2 still suffers from "Attempting to set a non-NULL-able column's value to NULL"

(Active)

#635778 : NOT MATCHED and MATCHED parts of a SQL MERGE statement are not optimized

(Active)

#633132 : MERGE INTO WITH FILTERED SOURCE does not work properly

(Active)

#596086 : MERGE statement bug when INSERT/DELETE used and filtered index

(Active)

#583719 : MERGE statement treats non-nullable computed columns incorrectly in some scenarios

(Active)

#581548 : SQL2008 R2 Merge statement with only table variables fails

(Fixed only in 2012+)

#539084 : Search condition on a non-key column and an ORDER BY in source derived table breaks MERGE completely

(Active)

#357419 : MERGE statement bypasses Referential Integrity

(Fixed in a COD hotfix)

*Note: This list represents the status of these items at the time of writing; some may have changed by the time you are reading.

Some of these bugs can be worked around in the meantime using trace flags, but is the condensed format of MERGE really worth all of the extra testing that will require? Also, keep in mind that this list is not necessarily exhaustive - just look at how many bugs have actually been reported on Connect that I haven't investigated and highlighted here. Also, to give an idea of how many bugs might still have gone undetected, check out these informative blog posts by Paul White to understand how hard these bugs can be to even notice, never mind track down and fix.

SQL Server Merge Concurrency Issues

MERGE *looks* like it will take care of concurrency for you, because implicitly it seems to a single, atomic statement. However, under the covers, SQL Server really does perform the different operations independently. This means that you could end up with race conditions or primary key conflicts when multiple users are attempting to fire the same MERGE statement concurrently. Dan Guzman went into a lot of detail in his blog post a few years ago, but basically this means that unless you use a HOLDLOCK hint on your MERGE target, your statement is vulnerable to race conditions. In reviewing customer code and questions out in the community, I don't recall ever coming across a HOLDLOCK hint naturally, except in cases where someone was demonstrating the very race condition I'm talking about. The pattern should be:

MERGE dbo.TableName WITH (HOLDLOCK) AS target
USING ... AS source ...;

And not what I typically see:

MERGE dbo.TableName AS target
USING ... AS source ...;

SQL Server Merge Effect on Triggers

Due to those same mechanics, converting your insert/update/delete code should be thoroughly tested when triggers are involved. Prevailing wisdom has instilled in most of us the mindset that, in SQL Server, triggers fire once per statement. However, with MERGE, this wisdom must be revisited, because it's not quite true any longer. Let's take this simple example:

 
SET NOCOUNT ON;
GO
CREATE TABLE dbo.MyTable(id INT);
GO
INSERT dbo.MyTable VALUES(1),(4);
GO
CREATE TRIGGER dbo.MyTable_All
 ON dbo.MyTable
 FOR INSERT, UPDATE, DELETE
AS
BEGIN
  PRINT 'Executing trigger. Rows affected: ' + RTRIM(@@ROWCOUNT);
  
  IF EXISTS (SELECT 1 FROM inserted) AND NOT EXISTS (SELECT 1 FROM deleted)
  BEGIN
    PRINT '  I am an insert...';
  END
  IF EXISTS (SELECT 1 FROM inserted) AND EXISTS (SELECT 1 FROM deleted)
  BEGIN
    PRINT '  I am an update...';
  END
  IF NOT EXISTS (SELECT 1 FROM inserted) AND EXISTS (SELECT 1 FROM deleted)
  BEGIN
    PRINT '  I am a delete...';
  END
END
GO

Now, when I first started playing with MERGE, I expected the trigger to fire once, just like it does for any multi-row operation. But this is not the case; it actually fires the trigger for each operation that ends up happening as a result of the MERGE command. For example, with this MERGE:

MERGE dbo.MyTable WITH (HOLDLOCK) AS Target
USING (VALUES(1),(2),(3)) AS Source(id)
ON Target.id = Source.id
WHEN MATCHED THEN UPDATE SET Target.id = Source.id
WHEN NOT MATCHED THEN INSERT(id) VALUES(Source.id)
WHEN NOT MATCHED BY SOURCE THEN DELETE;

I expected this output, since my MERGE operation affected a total of 4 rows:

Executing trigger. Rows affected: 4
  I am an insert...
  I am an update...
  I am a delete...

And then when I realized that the trigger gets fired multiple times, I expected this output, since I updated one row, inserted two new ones, and deleted one row:

Executing trigger. Rows affected: 2
  I am an insert...
Executing trigger. Rows affected: 1
  I am an update...
Executing trigger. Rows affected: 1
  I am a delete...

What I got was surprising to me - it made it seem like a hybrid between the trigger firing once and the trigger firing multiple times:

Executing trigger. Rows affected: 4
  I am an insert...
Executing trigger. Rows affected: 4
  I am an update...
Executing trigger. Rows affected: 4
  I am a delete...

So, the lesson here is, if you already have triggers and you're switching to MERGE, don't rely on @@ROWCOUNT for anything. This is true also if you have your INSERT / UPDATE / DELETE triggers in different modules - they'll still all report the total @@ROWCOUNT for the operation even when no rows are affected by that branch. So when you do make this switch you'll want to invest in refactoring or at least heavily testing your triggers in all unit test outcomes.

Improve Your Productivity or not with the SQL Server Merge Statement?

This is a minor and subjective gripe, but I've always found the Merge syntax quite daunting and difficult to memorize. I can't imagine there are many people out there who can write a fully-featured MERGE statement without consulting Books Online, and without going through many more cycles of testing to verify that all branches function as expected.

Conclusion

I am not trying to spread fear, sound like Chicken Little, or make you wary about any new feature that is introduced in SQL Server, however the increasing number of issues I see with MERGE make me feel obligated to document the potential problems, make sure that you are very well aware of them before deciding to use the syntax. Personally, I have converted a few implementations back to separate statements, because there came a point where I no longer felt I could trust the behavior of MERGE. If you're going to use it, please be careful!

Next Steps


    Last Update: 10/17/2013


    About the author
    MSSQLTips author Aaron Bertrand
    Aaron Bertrand is a Senior Consultant at SQL Sentry, Inc., and has been contributing to the community for over a decade, first earning the Microsoft MVP award in 1997.

    View all my tips


    print tip Print  
    Become a paid author





    join MSSQLTips for free SQL Server tips     



    Learn more about SQL Server tools
    Post a comment or let the author know this tip helped you.

           All comments are reviewed, so stay on subject or we may delete your comment.

    *Name   *Email Notify for updates



           Note: your email address is not published. Required fields are marked with an asterisk (*)


    Get free SQL tips:

    *Enter Code refresh code     



    Wednesday, May 28, 2014 - 4:47:52 PM - HLGEM Read The Tip

    I have to add that when you use it and, later on down the road, you have an issue where you need to figure out a problem with it, it is much harder to debug than an update statment followed by an insert. 

     


    Sunday, January 26, 2014 - 8:28:58 PM - Aaron Bertrand Read The Tip

    Gordon, I still wouldn't use it. I haven't done thorough testing of any of these bugs, and I have no idea which one(s) might apply in your scenario. Even if I did perform testing against all of them, and didn't find any bugs that affected my scenario, a very subtle difference could make the bugs appear in yours. The bugs exist, some of them closed as "by design" or "won't fix," and some of them "incorrect results" bugs; ruling them out in a limited test doesn't make them disappear any more than posting a "no bears" sign on my lawn keeps bears away - even though it may *seem* to work. :-)

    For the code that I write, and that I want to stand by for as long as it exists, I just can't justify the risk enough to use it or recommend it - even in cases where much of the risk is mitigated. I'm all about being cautious and consistent. I'm sure you and I have talked about more than one of these in the past, so I apologize if I'm repeating myself; it's for the benefit of other readers who haven't been privy to those conversations (or had my "Bad Habits to Kick" blog posts shoved down their throats by myself or others over the past few years):

    1. Because some day statements without terminators will be deprecated, even though it may not happen in my lifetime, I always place a semi-colon at the end of every single statement even if - right now - that statement, or the one following it, doesn't require it. (I also always prefix CTEs with a semi-colon, since too many times people have come back and said "I get a syntax error" - which was caused because the preceding statement, where they pasted the code, wasn't properly terminated.) Blog post here.
       
    2. Since the INFORMATION_SCHEMA views are incomplete, and completely disregard most new features, I use the catalog views always, even though in some isolated scenarios I *could* get the answer from INFORMATION_SCHEMA. I don't see the purpose of using something only in the few scenarios where I know it is complete and can't break, and still have to turn elsewhere in all other cases. Even if I am attempting to write code that is cross-platform portable, I can't think of very many useful queries that could use only INFORMATION_SCHEMA and be truly compatible across platforms - and even then, I'm not sure that metadata queries are ever going to be in the top 1000 concerns when migrating between or writing for multiple platforms. Blog post here.
       
    3. I always use 'yyyymmdd' for date literals, even when I know the system is US English and nobody will be using the code under different locale, regional settings, language and dateformat settings, etc. Why? Because someone could take my code with a literal like 09/07/2013, and think that it's safe to use under a British or Canadian locale (and may not notice that I actually meant September 7th, not July 9th). Blog post here.
       
    4. I don't use BETWEEN for any type of date/time range queries, even when I know the underlying data type is DATE or there is a constraint that prevents a time portion other than midnight. I can't control the future of those data types or the eternal presence of those constraints, and since I know an open-ended range (>= beginning and < the day after the end) will *always* work, I see no reason to use BETWEEN in the few scenarios where it will be "okay." Blog post here.

    5. I always use the schema prefix, even in cases where I know objects are only in the dbo schema, and the only users that will execute the code have dbo as their default schema. Again, because someone else may read the code and think it's ok to leave out the schema in general (enough people already think it's ok; I don't need to encourage anyone else). Blog post here.
       
    6. I always set an explicit length when defining variables, like CONVERT(VARCHAR(11), some_int) instead of CONVERT(VARCHAR, some_int). While *I* know that it is safe, because in that specific scenario it's implicitly a VARCHAR(30), there are other cases where it's a VARCHAR(1), and I don't want people to learn that leaving out the length anywhere is ok - because that will just lead to them (or someone that learns from them) to use it in a place where it does matter. The most dangerous thing here is that in some scenarios SQL Server won't even tell you that you lost data; it just silently truncates. Blog post here.
       
    7. I always specify the full names for shorthand such as that used in DATEPART, DATEDD, DATEDIFF, etc. Even though I know MM will never be interpreted as anything other than MONTH, I always type out MONTH. Not just abbreviations like W and Y have completely unexpected outcomes that I've successfully trumped almost every MVP I've asked, but also because it just makes the code that much more self-documenting at absolutely no cost to me. Blog post here.
       
    8. I try to always use explicit DATEADD operations, instead of relying on things like GETDATE()-1. The reason here is simple: the shortcut math does not work with the new types, like DATE and DATETIME2. Same blog post as #7.
       
    9. I use EXEC sp_executesql always, even in cases where it offers no tangible benefits over EXEC(). This is primarily because most people do need those benefits (such as the ability to pass strongly-typed parameters, or retrieve output parameters), or are prone to inevitable syntax errors that ensue when leaving out the parentheses. I see no advantage to only using EXEC() in the few cases where they aren't needed, only to switch gears when they are. Blog post here.
       
    10. Because metadata functions like OBJECT_NAME() don't obey transaction isolation level semantics, I try to avoid them, preferring to use explicit joins to sys.schemas, sys.tables, etc. It ends up with bulkier code, but code that will still work when someone issues the metadata query I've provided under READ UNCOMMITTED during a lengthy index rebuild. (See details about this here.)

    I could go on and on and on and on. The point is that I don't like using things when I can only use them in limited scenarios where I know they're safe, and can't trust them the rest of the time. Most of these I handle this way not only to protect my own code and that of my customers, since I may not have any control over what will happen to the code in the future, but also as a mantra for using and demonstrating best practices as much as possible. I don't want to skate by on shortcuts, I don't want my code to *need* disclaimers that say "this is safe only in this scenario because...", and I don't want anyone reading my code to pick up those habits even in cases where it may be okay. There is a massive potential trickle-down effect there, especially for someone answering a lot of questions on Stack Overflow, MSDN forums, etc. This is one of the reasons you see me often pointing out some of the above scenarios in other people's answers - I don't want "the right answer" to include anything that could break for other users in some scenarios (even rare ones). I should write an over-arching blog post that frames all of my "Bad Habits to Kick" and related posts with something along these lines, stressing the importance of not protecting your own code (you may know all the ins and outs of all the above scenarios), but the importance of the things others learn from you.

    The fact that the syntax is ANSI standard doesn't make it any more intuitive or familiar, and doesn't really help anyone, with the rare exception of someone coming to SQL Server from a platform that has also implemented MERGE syntax. For me, I see no reason to use it in any scenario, since - aside from the convenience of a single OUTPUT clause, where an OUTPUT clause is even possible - it can't do anything for me that independent statements can't do. What can MERGE do that separate statements can't do, even in cases where concurrency isn't an issue? I will keep writing separate UPDATE and INSERT statements, since I can rely on those with a much higher degree of confidence in all scenarios, and I see no reason to use (or advocate) something that I can't trust in all scenarios, even though I can rely on it in a few.

    So, no, I can't condone the use of MERGE in your scenario, and think it is a bad idea in current versions of SQL Server. You, of course, may have different priorities than me.


    Sunday, January 26, 2014 - 2:03:12 PM - Gordon Linoff Read The Tip

    Aaron,

    In the case that you are using MERGE without concurrency and with no triggers on the underlying tables, is it still a bad idea?  Which of the bugs apply in this case?  I would mostly be using MERGE in scheduled database maintenance jobs.

    Although I agree with you about the syntax, it is unfortunately based on the ANSI standard.

    Cheers


    Thursday, November 21, 2013 - 8:30:18 AM - Syed Read The Tip

    Hello All,

    I have a excel sheet as source and SQL db_table as target , I want to update Table and Insert new records if it is not exist, Through "DATAFLOW TASK" in "SSIS 2008" Please Help me out of this horrible situation 

     

    Thanks in Advance


    Monday, October 28, 2013 - 11:58:55 AM - Jeff Moden Read The Tip

    Awesome article on MERGE, especially concerning how triggers handle it.  Thanks for taking the time to test and post, Aaron.


    Monday, October 21, 2013 - 10:22:49 AM - David Stout (dbslave) Read The Tip

    Aaron, Thanks for clearing that up, and for the article.


    Monday, October 21, 2013 - 7:41:01 AM - Aaron Bertrand Read The Tip

    David, this is relevant for SQL Server 2008, 2008 R2, 2012, and 2014. The only exception would be bugs that have been fixed quietly in newer versions, or as a side effect of some other change (e.g. optimizer improvements).


    Sunday, October 20, 2013 - 11:16:33 AM - David Stout (dbslave) Read The Tip

    This article addresses SQL 2008 can some one tell me if this is also true of SQL 2008R2?


    Sunday, October 20, 2013 - 9:22:35 AM - Aaron Bertrand Read The Tip

    Alex, I don't know that any pattern can be made invulnerable to race conditions. This has very little to do with the statements you use, and more to do with the trade-offs you make between isolation level and concurrency. For example, if you wrap several statements in a serializable transaction, conflicts are impossible, but that can really have an impact on transaction throughput.


    Sunday, October 20, 2013 - 1:28:04 AM - Alex Friedman Read The Tip

    This is very worrying. I really love the MERGE statement, and it's a pity it is so buggy.

    What would you say are the best "tried and true methods of separate statements"? I've seen several patterns floating around, and most also vulnerable to race conditions.

     


    Friday, October 18, 2013 - 4:31:13 PM - TimothyAWiseman Read The Tip

    It is always good to be on the watch for unexpected behavior.  Thanks for pointing some of this out.


    Thursday, October 17, 2013 - 2:08:30 PM - Theron Read The Tip

    I read this just as I was about to code some upsert logic using merge, so thanks for the warning.  I had a good experience with Oracle's merge on a previous project and was glad to see it implemented in SQL Server as well.   Now I think I'll go ahead and use the separate statements instead!




     
    Sponsor Information