Find Mismatched Data between SQL Server Tables using LEFT JOIN, EXCEPT, NOT IN and NOT EXISTS

By:   |   Updated: 2022-08-15   |   Comments   |   Related: More > TSQL


Problem

The missing number challenge has likely come up if you have ever been on either end of an interview for a SQL development or data analyst position. I remember early on in my career having the problem presented in the context of a SalesOrder and SalesPerson table. The challenge was to find all salespeople who had not created an order. Interviews are stressful situations, especially if the interviewer asks challenging questions, so before going for the interview, you need to understand this problem and have a solution.

Solution

This tip will investigate multiple ways to find missing numbers between tables. You may prefer one solution over the other, and nothing is wrong with that. The performance gains between the methods will likely be minor, depending on the table size, but it is always good to look at performance metrics in addition to solving the problem.

Creating the Dataset

For this demo, we will create two tables:

  • The first one is AllNumbers, this table will contain one column which goes from 1 to 100,000.
  • The second table is MissingNumbers, this will have just even numbers ranging from 2 to 100,000.

Here is the script to create the two tables.

DROP TABLE IF EXISTS dbo.AllNumbers;
DROP TABLE IF EXISTS dbo.MissingNumbers;

CREATE TABLE dbo.AllNumbers
(
    Number INT NOT NULL,
    CONSTRAINT PK_AllNumbers
        PRIMARY KEY CLUSTERED (Number)
        WITH FILLFACTOR = 100
);
CREATE TABLE dbo.MissingNumbers
(
    Number INT NOT NULL,
    CONSTRAINT PK_MissingNumbers
        PRIMARY KEY CLUSTERED (Number)
        WITH FILLFACTOR = 100
);
GO

The script below will insert the data into the AllNumbers table.

DECLARE @UpperBound INT = 100000;
;WITH cteN (Number)
AS (SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id])
    FROM sys.all_columns AS s1
        CROSS JOIN sys.all_columns AS s2)
INSERT INTO dbo.AllNumbers (Number)
SELECT [Number]
FROM cteN
WHERE [Number] <= @UpperBound;

SELECT *
FROM dbo.AllNumbers
Query Results

For the MissingNumbers table we will insert just even numbers. We can use the AllNumbers table and filter by numbers with a remainder of zero using the modulo operator. In short, the modulo operator returns the remainder of one number divided by another.

INSERT INTO dbo.MissingNumbers (Number)
SELECT [Number]
FROM dbo.AllNumbers
WHERE Number % 2 = 0;

SELECT * 
FROM dbo.MissingNumbers

The table should look like the screenshot below.

Query results, even numbers

Compare AllNumbers and MissingNumbers Tables

We can do a couple of queries to compare the data from these two tables.

The first query joins the two tables.

SELECT an.Number AS [All Numbers Table],
       mn.Number AS [Missing Numbers Table]
FROM dbo.AllNumbers an
    JOIN dbo.MissingNumbers mn
        ON an.Number = mn.Number;

We can see that this only returns even numbers from both tables.

Query results, even numbers

We know the AllNumbers table has more records, so let's do a LEFT OUTER JOIN so we can see all data from AllNumbers and any value or a NULL value from MissingNumbers.

SELECT an.Number AS [All Numbers Table],
       mn.Number AS [Missing Numbers Table]
FROM dbo.AllNumbers an
    LEFT OUTER JOIN dbo.MissingNumbers mn
        ON an.Number = mn.Number;

Here is the result. You can see that a LEFT JOIN is trying to match up rows from AllNumbers to the ones from MissingNumbers. Since there are not matches for all rows in MissingNumbers, SQL returns NULL placeholders.

Left Outer Join

Expected Results

So based on the results above, what we want to see for the result set is the following. This will only include odd numbers since these values are not included in the MissingNumbers table.

Query Results, missing numbers

Let's take a look at a few different ways this can be done.

1 - Find Missing Numbers Using LEFT OUTER JOIN

A little tweak to the LEFT JOIN query should do the trick if we want to return all the missing rows from the MissingNumbers table along with including just rows where the Number column from MissingNumers is NULL. The syntax for this is below, notice the use of the WHERE clause.

SELECT an.Number AS [Missing Numbers]
FROM dbo.AllNumbers an
    LEFT OUTER JOIN dbo.MissingNumbers mn
        ON an.Number = mn.Number
WHERE mn.Number IS NULL;

I estimate that most people with this problem would use a LEFT JOIN as the initial solution. If you asked me to provide one clear answer, this would be it.

2 - Find Missing Numbers Using EXCEPT

Another solution to find missing numbers is to use the EXCEPT operator. I often use EXCEPT when I have several columns to compare. An example is performing a delta sync between two tables. EXCEPT returns all the rows on the left query, not the right one. Given the example below, the left table is on top. The syntax for using EXCEPT is below.

SELECT Number
FROM dbo.AllNumbers
EXCEPT
SELECT Number
FROM dbo.MissingNumbers;

We get the same results back when using the LEFT JOIN. One of the downsides of EXCEPT is that you must list all the columns, and the same number must exist between both tables.

3 - Find Missing Numbers Using NOT EXISTS

The NOT EXISTS operator returns true or false based on if the row exists in the subquery. Below is the syntax for finding the missing numbers using NOT EXISTS.

SELECT an.Number
FROM dbo.AllNumbers an
WHERE NOT EXISTS
(
    SELECT Number FROM dbo.MissingNumbers mn WHERE mn.Number = an.Number
);

In the syntax above, we tell SQL to return all the rows from AllNumbers that don’t exist in MissingNumbers.

4 - Find Missing Numbers Using NOT IN

Finally, we can use a NOT IN operator. With the syntax below this will return all rows from the AllNumbers table that are not in the MissingNumbers table.

SELECT an.Number
FROM dbo.AllNumbers an
WHERE an.Number NOT IN
      (
          SELECT Number FROM dbo.MissingNumbers
      );

Eric Blinn has a great tip and points out that NOT IN operators are not conducive to index seek operations and commonly result in slower scan operations. So that is something you want to keep in mind. This method would likely be the last one I would choose. Nothing against NOT IN, but I prefer the other methods.

Conclusion

Four techniques were explored for solving the missing numbers challenge in this tip. In the comments below, please indicate which method you prefer and why. There will be performance differences between using a LEFT JOIN and NOT EXISTS, however, for the most part, they should be small. I recommend going with whichever method you prefer and fits your situation. Good luck with any upcoming interviews; at least if this question comes up, you will nail it!

Next Steps



Get Started Now - Click here to get your free 14 day trial of SolarWinds Database Insights




get scripts

next tip button



About the author
MSSQLTips author Jared Westover Jared Westover (@westoverjared) is a SQL Server specialist with nearly two decades of industry experience.

View all my tips


Article Last Updated: 2022-08-15

Comments For This Article

















get free sql tips
agree to terms