Compare Data from Two SQL Server Tables

By:   |   Updated: 2022-09-06   |   Comments (3)   |   Related: More > TSQL


Problem

Several years ago, I tried to figure out how to compare two tables. My goal was to highlight the differences of the data between the two tables. The situation was your typical TableB was a backup of TableA before we performed some updates. You may be able to guess, but things didn't go as expected. At that time, I had no idea how to return the differences. My manager suggested using something like a delta sync. Well, I didn't have any idea what that was either; keep in mind this was a long time ago. If you look up the word delta in the dictionary, it has several meanings. The one which resonated with me was "a difference between two things or values." After some research, I was off and running. Throughout the years, I've used several methods to compare tables. One has always stood out to me as the simplest.

Solution

In this tip, I'll explore using an often-neglected method for comparing two tables. There are undoubtedly multiple ways to accomplish this goal. The most common approach has one glaring downside for me. The technique I'll present has always stuck with me, and maybe it will with you.

Building Your Dataset

To get started, let's go ahead and build a small dataset. You can easily add more rows if you would like. Please don't run these sample scripts in a production environment. You don't want your DBA asking why there's a database called SqlHabits.

USE [master];
GO

IF DATABASEPROPERTYEX('SqlHabits', 'Version') IS NOT NULL
BEGIN
    ALTER DATABASE SqlHabits SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE SqlHabits;
END;
GO

CREATE DATABASE SqlHabits;
GO

USE SqlHabits;
GO

CREATE TABLE dbo.SourceTable
(
    Id INT NOT NULL,
    FirstName NVARCHAR(250) NOT NULL,
    LastName NVARCHAR(250) NOT NULL,
    Email NVARCHAR(250) NULL
);
GO

CREATE TABLE dbo.DestinationTable
(
    Id INT NOT NULL,
    FirstName NVARCHAR(250) NOT NULL,
    LastName NVARCHAR(250) NOT NULL,
    Email NVARCHAR(250) NULL
);
GO

The script above will create two tables. They're identical except for the name. In the example below, we load three rows into our SourceTable. Additionally, we're adding three rows to the destination. However, there are slight differences. Run the script below to follow along.

INSERT INTO dbo.SourceTable
(
    Id,
    FirstName,
    LastName,
    Email
)
VALUES
(1, 'Chip', 'Munk', '[email protected]'),
(2, 'Frank', 'Enstein', '[email protected]'),
(3, 'Penny', 'Wise', '[email protected]');
GO

INSERT INTO dbo.DestinationTable
(
    Id,
    FirstName,
    LastName,
    Email
)
VALUES
(1, 'Chip', 'Munk', '[email protected]'),
(2, 'Frank', 'Ensein', '[email protected]'),
(3, 'Penny', 'Wise', NULL);
GO

SourceTable is what I would call the source of truth in this example. Our mission is to identify differences between DestinationTable and SourceTable.

If you look closely at the data, you will see there are differences in the data for Id 2 and Id 3.

Find Data Differences from Two Tables Using LEFT JOIN

A standard method for identifying two tables' row differences is a LEFT JOIN. A LEFT JOIN will return all rows from the LEFT table and any matching ones on the right. Let's say the Id in both tables is a primary key for simplicity's sake. The code below should give us the results we're looking for.

SELECT st.Id,
       st.FirstName,
       st.LastName,
       st.Email
FROM dbo.SourceTable st
    LEFT JOIN dbo.DestinationTable dt
        ON dt.Id = st.Id
WHERE dt.FirstName <> st.FirstName
      OR dt.LastName <> st.LastName
      OR ISNULL(dt.Email, '') <> ISNULL(st.Email, '');
GO

Since the email can be NULL in both tables, we need to add a check. You generally don't want to update the destination table if there are no differences. For example, if your destination table has temporal enabled, you'll end up with a bunch of extra rows in the history table. My main issue with doing this via a LEFT JOIN is the NULL check which needs to be performed. Imagine if you have 10 or 20 columns that you need to check. You could end up with something like this, or even worse, depending on how you handle the NULL check.

SELECT st.Id,
       st.FirstName,
       st.LastName,
       st.Email
FROM dbo.SourceTable st
    LEFT JOIN dbo.DestinationTable dt
        ON dt.Id = st.Id
WHERE ISNULL(dt.Column1, '') <> ISNULL(st.Column1, '')
      OR ISNULL(dt.Column2, '') <> ISNULL(st.Column2, '')
      OR ISNULL(dt.Column3, '') <> ISNULL(st.Column3, '')
      OR ISNULL(dt.Column4, '') <> ISNULL(st.Column4, '')
      OR ISNULL(dt.Column5, '') <> ISNULL(st.Column5, '')
      OR ISNULL(dt.Column6, '') <> ISNULL(st.Column6, '')
      OR ISNULL(dt.Column7, '') <> ISNULL(st.Column7, '')
      OR ISNULL(dt.Column8, '') <> ISNULL(st.Column8, '')
      OR ISNULL(dt.Column9, '') <> ISNULL(st.Column9, '')
      OR ISNULL(dt.Column10, '') <> ISNULL(st.Column10, '');
GO

Since the code above is for illustration purposes, it will not execute unless you add the additional columns. Another potential issue might be if you allow a column to be NULL down the road and are not checking it.

Find Data Differences from Two Tables Using EXCEPT

I prefer using EXCEPT to perform the check. The EXCEPT operator returns rows from the left query, not in the right query. In the example below, the left query is on top. One of its main advantages is that you don't need to worry about checking for NULLs. Below is the code using EXCEPT to get our results.

SELECT Id,
       FirstName,
       LastName,
       Email
FROM dbo.SourceTable
EXCEPT
SELECT Id,
       FirstName,
       LastName,
       Email
FROM dbo.DestinationTable;
GO
Using Except Results

Like with the LEFT JOIN, imagine having to compare 10 or 20 columns. To me, the code below is much easier on the eyes.

SELECT Id,
       FirstName,
       LastName,
       Email,
       Column1,
       Column2,
       Column3,
       Column4,
       Column5,
       Column6,
       Column7,
       Column8,
       Column9,
       Column10
FROM dbo.SourceTable
EXCEPT
SELECT Id,
       FirstName,
       LastName,
       Email,
       Column1,
       Column2,
       Column3,
       Column4,
       Column5,
       Column6,
       Column7,
       Column8,
       Column9,
       Column10
FROM dbo.DestinationTable;
GO

Drawbacks of EXCEPT

Now, if you asked me about the drawbacks to using EXCEPT, I would say there are at least two.

  • The first would be performance. In my experience, a LEFT JOIN will provide better performance than EXCEPT. Looking at the execution plan and testing on a huge table should prove this.
  • The other drawback will be if you don't like using EXCEPT. There is nothing wrong with preferring the LEFT JOIN. Additionally, EXCEPT requires an equal number of columns in each SELECT. However, for our purposes of checking differences, I don't think that will be different than a LEFT JOIN.

Conclusion

In this tip, I presented a common challenge you can run into with the differentiation of two data sets. You've likely encountered this if you have ever worked with loading data and then updating based on the differences. A method I used for a long time was a simple LEFT JOIN. This technique worked for the most part, but I was never happy with it. That was, of course, till I discovered the EXCEPT operator. In the comment section below, I would like to hear your experiences with EXCEPT and if you can name other benefits or drawbacks.

Next Steps
  • Would you like to learn more about SQL joins in general? Please check out that tip by Jim Evans.
  • For a comprehensive overview of the LEFT JOIN, please check out that tip by John Miner.
  • Greg Robidoux created an informative tip covering everything related to both the EXCEPT and INTERSECT operators. I highly recommend you check it out.





get scripts

next tip button



About the author
MSSQLTips author Jared Westover Jared Westover (@westoverjared) is a SQL Server specialist with nearly two decades of industry experience.

View all my tips


Article Last Updated: 2022-09-06

Comments For This Article




Wednesday, September 14, 2022 - 12:44:21 PM - Kevin Back To Top (90479)
The EXCEPT performance should be about the same as the LEFT JOIN, since the ISNULL will be non deterministic and both types will be scanning.

Saturday, September 10, 2022 - 11:56:45 AM - Izhar Azati Back To Top (90462)
The SSDT has "New Data Comparison".
https://docs.microsoft.com/en-us/sql/ssdt/how-to-compare-and-synchronize-the-data-of-two-databases?view=sql-server-ver16

Tuesday, September 6, 2022 - 2:12:33 PM - Jimbo99 Back To Top (90445)
Two tables for differences is a matter of comparing record number & date stamps for data columns, when they are supposed to contain the same information. That is one might be newer or older version of the same database with the same tables. Even fields.that have changed as approved database configuration changes structurally. Obviously what determines the updates of any records is the accuracy of the data. And there is no way to determine that beyond assumptions that one set of data is more accurate than the other and that anything more modern for input & update was done to reflect true data. Data corruptions can be fabricated with criminal intent if the data is of that nature.

Data meaning determines and historical accuracy may need to be preserved, that may rule the day as for what does or doesn't get updated. One would have to know why the data needed to match & doesn't. Is it a matter of having an older backup from a catastrophic disk fail & total rebuild.














get free sql tips
agree to terms