Compare Data from Two SQL Server Tables
Several years ago, I tried to figure out how to compare two tables. My goal was to highlight the differences of the data between the two tables. The situation was your typical TableB was a backup of TableA before we performed some updates. You may be able to guess, but things didn't go as expected. At that time, I had no idea how to return the differences. My manager suggested using something like a delta sync. Well, I didn't have any idea what that was either; keep in mind this was a long time ago. If you look up the word delta in the dictionary, it has several meanings. The one which resonated with me was "a difference between two things or values." After some research, I was off and running. Throughout the years, I've used several methods to compare tables. One has always stood out to me as the simplest.
In this tip, I'll explore using an often-neglected method for comparing two tables. There are undoubtedly multiple ways to accomplish this goal. The most common approach has one glaring downside for me. The technique I'll present has always stuck with me, and maybe it will with you.
Building Your Dataset
To get started, let's go ahead and build a small dataset. You can easily add more rows if you would like. Please don't run these sample scripts in a production environment. You don't want your DBA asking why there's a database called SqlHabits.
USE [master]; GO IF DATABASEPROPERTYEX('SqlHabits', 'Version') IS NOT NULL BEGIN ALTER DATABASE SqlHabits SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE SqlHabits; END; GO CREATE DATABASE SqlHabits; GO USE SqlHabits; GO CREATE TABLE dbo.SourceTable ( Id INT NOT NULL, FirstName NVARCHAR(250) NOT NULL, LastName NVARCHAR(250) NOT NULL, Email NVARCHAR(250) NULL ); GO CREATE TABLE dbo.DestinationTable ( Id INT NOT NULL, FirstName NVARCHAR(250) NOT NULL, LastName NVARCHAR(250) NOT NULL, Email NVARCHAR(250) NULL ); GO
The script above will create two tables. They're identical except for the name. In the example below, we load three rows into our SourceTable. Additionally, we're adding three rows to the destination. However, there are slight differences. Run the script below to follow along.
INSERT INTO dbo.SourceTable ( Id, FirstName, LastName, Email ) VALUES (1, 'Chip', 'Munk', '[email protected]'), (2, 'Frank', 'Enstein', '[email protected]'), (3, 'Penny', 'Wise', '[email protected]'); GO INSERT INTO dbo.DestinationTable ( Id, FirstName, LastName, Email ) VALUES (1, 'Chip', 'Munk', '[email protected]'), (2, 'Frank', 'Ensein', '[email protected]'), (3, 'Penny', 'Wise', NULL); GO
SourceTable is what I would call the source of truth in this example. Our mission is to identify differences between DestinationTable and SourceTable.
If you look closely at the data, you will see there are differences in the data for Id 2 and Id 3.
Find Data Differences from Two Tables Using LEFT JOIN
A standard method for identifying two tables' row differences is a LEFT JOIN. A LEFT JOIN will return all rows from the LEFT table and any matching ones on the right. Let's say the Id in both tables is a primary key for simplicity's sake. The code below should give us the results we're looking for.
SELECT st.Id, st.FirstName, st.LastName, st.Email FROM dbo.SourceTable st LEFT JOIN dbo.DestinationTable dt ON dt.Id = st.Id WHERE dt.FirstName <> st.FirstName OR dt.LastName <> st.LastName OR ISNULL(dt.Email, '') <> ISNULL(st.Email, ''); GO
Since the email can be NULL in both tables, we need to add a check. You generally don't want to update the destination table if there are no differences. For example, if your destination table has temporal enabled, you'll end up with a bunch of extra rows in the history table. My main issue with doing this via a LEFT JOIN is the NULL check which needs to be performed. Imagine if you have 10 or 20 columns that you need to check. You could end up with something like this, or even worse, depending on how you handle the NULL check.
SELECT st.Id, st.FirstName, st.LastName, st.Email FROM dbo.SourceTable st LEFT JOIN dbo.DestinationTable dt ON dt.Id = st.Id WHERE ISNULL(dt.Column1, '') <> ISNULL(st.Column1, '') OR ISNULL(dt.Column2, '') <> ISNULL(st.Column2, '') OR ISNULL(dt.Column3, '') <> ISNULL(st.Column3, '') OR ISNULL(dt.Column4, '') <> ISNULL(st.Column4, '') OR ISNULL(dt.Column5, '') <> ISNULL(st.Column5, '') OR ISNULL(dt.Column6, '') <> ISNULL(st.Column6, '') OR ISNULL(dt.Column7, '') <> ISNULL(st.Column7, '') OR ISNULL(dt.Column8, '') <> ISNULL(st.Column8, '') OR ISNULL(dt.Column9, '') <> ISNULL(st.Column9, '') OR ISNULL(dt.Column10, '') <> ISNULL(st.Column10, ''); GO
Since the code above is for illustration purposes, it will not execute unless you add the additional columns. Another potential issue might be if you allow a column to be NULL down the road and are not checking it.
Find Data Differences from Two Tables Using EXCEPT
I prefer using EXCEPT to perform the check. The EXCEPT operator returns rows from the left query, not in the right query. In the example below, the left query is on top. One of its main advantages is that you don't need to worry about checking for NULLs. Below is the code using EXCEPT to get our results.
SELECT Id, FirstName, LastName, Email FROM dbo.SourceTable EXCEPT SELECT Id, FirstName, LastName, Email FROM dbo.DestinationTable; GO
Like with the LEFT JOIN, imagine having to compare 10 or 20 columns. To me, the code below is much easier on the eyes.
SELECT Id, FirstName, LastName, Email, Column1, Column2, Column3, Column4, Column5, Column6, Column7, Column8, Column9, Column10 FROM dbo.SourceTable EXCEPT SELECT Id, FirstName, LastName, Email, Column1, Column2, Column3, Column4, Column5, Column6, Column7, Column8, Column9, Column10 FROM dbo.DestinationTable; GO
Drawbacks of EXCEPT
Now, if you asked me about the drawbacks to using EXCEPT, I would say there are at least two.
- The first would be performance. In my experience, a LEFT JOIN will provide better performance than EXCEPT. Looking at the execution plan and testing on a huge table should prove this.
- The other drawback will be if you don't like using EXCEPT. There is nothing wrong with preferring the LEFT JOIN. Additionally, EXCEPT requires an equal number of columns in each SELECT. However, for our purposes of checking differences, I don't think that will be different than a LEFT JOIN.
In this tip, I presented a common challenge you can run into with the differentiation of two data sets. You've likely encountered this if you have ever worked with loading data and then updating based on the differences. A method I used for a long time was a simple LEFT JOIN. This technique worked for the most part, but I was never happy with it. That was, of course, till I discovered the EXCEPT operator. In the comment section below, I would like to hear your experiences with EXCEPT and if you can name other benefits or drawbacks.
- Would you like to learn more about SQL joins in general? Please check out that tip by Jim Evans.
- For a comprehensive overview of the LEFT JOIN, please check out that tip by John Miner.
- Greg Robidoux created an informative tip covering everything related to both the EXCEPT and INTERSECT operators. I highly recommend you check it out.
About the author
View all my tips
Article Last Updated: 2022-09-06