Find Mismatched Data between SQL Server Tables using LEFT JOIN, EXCEPT, NOT IN and NOT EXISTS
The missing number challenge has likely come up if you have ever been on either end of an interview for a SQL development or data analyst position. I remember early on in my career having the problem presented in the context of a SalesOrder and SalesPerson table. The challenge was to find all salespeople who had not created an order. Interviews are stressful situations, especially if the interviewer asks challenging questions, so before going for the interview, you need to understand this problem and have a solution.
This tip will investigate multiple ways to find missing numbers between tables. You may prefer one solution over the other, and nothing is wrong with that. The performance gains between the methods will likely be minor, depending on the table size, but it is always good to look at performance metrics in addition to solving the problem.
Creating the Dataset
For this demo, we will create two tables:
- The first one is AllNumbers, this table will contain one column which goes from 1 to 100,000.
- The second table is MissingNumbers, this will have just even numbers ranging from 2 to 100,000.
Here is the script to create the two tables.
DROP TABLE IF EXISTS dbo.AllNumbers; DROP TABLE IF EXISTS dbo.MissingNumbers; CREATE TABLE dbo.AllNumbers ( Number INT NOT NULL, CONSTRAINT PK_AllNumbers PRIMARY KEY CLUSTERED (Number) WITH FILLFACTOR = 100 ); CREATE TABLE dbo.MissingNumbers ( Number INT NOT NULL, CONSTRAINT PK_MissingNumbers PRIMARY KEY CLUSTERED (Number) WITH FILLFACTOR = 100 ); GO
The script below will insert the data into the AllNumbers table.
DECLARE @UpperBound INT = 100000; ;WITH cteN (Number) AS (SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id]) FROM sys.all_columns AS s1 CROSS JOIN sys.all_columns AS s2) INSERT INTO dbo.AllNumbers (Number) SELECT [Number] FROM cteN WHERE [Number] <= @UpperBound; SELECT * FROM dbo.AllNumbers
For the MissingNumbers table we will insert just even numbers. We can use the AllNumbers table and filter by numbers with a remainder of zero using the modulo operator. In short, the modulo operator returns the remainder of one number divided by another.
INSERT INTO dbo.MissingNumbers (Number) SELECT [Number] FROM dbo.AllNumbers WHERE Number % 2 = 0; SELECT * FROM dbo.MissingNumbers
The table should look like the screenshot below.
Compare AllNumbers and MissingNumbers Tables
We can do a couple of queries to compare the data from these two tables.
The first query joins the two tables.
SELECT an.Number AS [All Numbers Table], mn.Number AS [Missing Numbers Table] FROM dbo.AllNumbers an JOIN dbo.MissingNumbers mn ON an.Number = mn.Number;
We can see that this only returns even numbers from both tables.
We know the AllNumbers table has more records, so let's do a LEFT OUTER JOIN so we can see all data from AllNumbers and any value or a NULL value from MissingNumbers.
SELECT an.Number AS [All Numbers Table], mn.Number AS [Missing Numbers Table] FROM dbo.AllNumbers an LEFT OUTER JOIN dbo.MissingNumbers mn ON an.Number = mn.Number;
Here is the result. You can see that a LEFT JOIN is trying to match up rows from AllNumbers to the ones from MissingNumbers. Since there are not matches for all rows in MissingNumbers, SQL returns NULL placeholders.
So based on the results above, what we want to see for the result set is the following. This will only include odd numbers since these values are not included in the MissingNumbers table.
Let's take a look at a few different ways this can be done.
1 - Find Missing Numbers Using LEFT OUTER JOIN
A little tweak to the LEFT JOIN query should do the trick if we want to return all the missing rows from the MissingNumbers table along with including just rows where the Number column from MissingNumers is NULL. The syntax for this is below, notice the use of the WHERE clause.
SELECT an.Number AS [Missing Numbers] FROM dbo.AllNumbers an LEFT OUTER JOIN dbo.MissingNumbers mn ON an.Number = mn.Number WHERE mn.Number IS NULL;
I estimate that most people with this problem would use a LEFT JOIN as the initial solution. If you asked me to provide one clear answer, this would be it.
2 - Find Missing Numbers Using EXCEPT
Another solution to find missing numbers is to use the EXCEPT operator. I often use EXCEPT when I have several columns to compare. An example is performing a delta sync between two tables. EXCEPT returns all the rows on the left query, not the right one. Given the example below, the left table is on top. The syntax for using EXCEPT is below.
SELECT Number FROM dbo.AllNumbers EXCEPT SELECT Number FROM dbo.MissingNumbers;
We get the same results back when using the LEFT JOIN. One of the downsides of EXCEPT is that you must list all the columns, and the same number must exist between both tables.
3 - Find Missing Numbers Using NOT EXISTS
The NOT EXISTS operator returns true or false based on if the row exists in the subquery. Below is the syntax for finding the missing numbers using NOT EXISTS.
SELECT an.Number FROM dbo.AllNumbers an WHERE NOT EXISTS ( SELECT Number FROM dbo.MissingNumbers mn WHERE mn.Number = an.Number );
In the syntax above, we tell SQL to return all the rows from AllNumbers that don’t exist in MissingNumbers.
4 - Find Missing Numbers Using NOT IN
Finally, we can use a NOT IN operator. With the syntax below this will return all rows from the AllNumbers table that are not in the MissingNumbers table.
SELECT an.Number FROM dbo.AllNumbers an WHERE an.Number NOT IN ( SELECT Number FROM dbo.MissingNumbers );
Eric Blinn has a great tip and points out that NOT IN operators are not conducive to index seek operations and commonly result in slower scan operations. So that is something you want to keep in mind. This method would likely be the last one I would choose. Nothing against NOT IN, but I prefer the other methods.
Four techniques were explored for solving the missing numbers challenge in this tip. In the comments below, please indicate which method you prefer and why. There will be performance differences between using a LEFT JOIN and NOT EXISTS, however, for the most part, they should be small. I recommend going with whichever method you prefer and fits your situation. Good luck with any upcoming interviews; at least if this question comes up, you will nail it!
- Check out this excellent overview of the various joins in SQL Server.
- Developers often overlook the usefulness of the set operators EXCEPT and INTERSECT.
- Review this tip for an overview of using the IN operator.
- See this excellent overview of the performance difference between using the various operators (EXISTS, IN, and JOIN).
About the author
View all my tips
Article Last Updated: 2022-08-15