SQL Not Exists: A Better Way to Find Missing Data

Problem

After years of writing T-SQL a certain way, changing can be tough. When comparing tables for missing rows, developers often use LEFT JOIN. We recently restored a database backup after data was accidentally deleted, and now need to find missing rows. A developer suggested that the LEFT JOIN isn’t the best method. Is there a more efficient way?

Solution

In this article, we’ll explore and compare two common ways to find missing data between tables: LEFT OUTER JOIN and NOT EXISTS. I’ll compare the results from both methods using the actual execution plans, and we’ll examine how each one affects performance when using a LEFT OUTER JOIN versus NOT EXISTS. By the end, you’ll know which method works best for uncovering missing data.

Oh No! The Data Is Missing

At some point, it happens to everyone, every now and then, data comes up missing. Maybe it was accidentally deleted (someone forgot the WHERE clause), or there was a bug in the application. The reason isn’t really important here, unless you’re the one missing the data. What matters most is identifying which rows are missing between two tables.

Checking for Missing Data

Two of the most common methods to check for missing data are LEFT OUTER JOIN and NOT EXISTS. I’m sure you could also use EXCEPT, NOT IN, or maybe even an OUTER APPLY, but we’ll skip those for now.

LEFT OUTER JOIN

Let’s start with a LEFT JOIN (notice I’m omitting the OUTER part since it’s implied). The LEFT JOIN is a favorite among developers, even those with strong T-SQL skills, for finding missing data.

But when you think about it, what does a LEFT JOIN do? I’m not going to show a picture of the Venn diagram or one of those funny versions you see on LinkedIn, but it returns all the rows from the left table and matching rows from the right table. If values are missing in the right table, SQL uses NULL placeholders. The matching records are based on the predicate used in the ON clause, maybe something like this:

/* MSSQLTips.com */
SELECT b.Id
FROM dbo.TableB AS b
    LEFT OUTER JOIN dbo.TableA AS a
        ON a.Id = b.Id
WHERE a.Id IS NULL;

When you specify the WHERE a.Id IS NULL, it tells SQL, “Hey, only return rows that don’t have matching values in the right table.” Another way to think about it is that a LEFT JOIN brings tables together to form a dataset.

So, why would you use a LEFT JOIN to identify missing data? That’s a good question, and I think it comes down to a couple of factors.

Why We Keep Using LEFT JOIN

The primary reason in my mind is habit. We all develop little habits or patterns over the years when writing code. For example, I often use CTEs because I started using them early in my SQL career. I don’t assign any special virtue to them; some people dislike them for various reasons. Someone who has always used a LEFT JOIN will likely keep using it unless they are given a good reason not to or decide to try a different approach.

The second reason is that if you need to return data from the other table, a LEFT JOIN makes sense because it creates a result set. But if you’re trying to find data that doesn’t exist, it won’t appear in that result.

If you don’t need to return data from the right table, is there a better way?

NOT EXISTS Explained

Unlike the join above, NOT EXISTS (or even its EXISTS counterpart) does not join or combine data. It’s a predicate that checks whether something exists in a table. Once it finds the row, it returns true and stops further checks. Sometimes you will hear EXISTS and NOT EXISTS referred to as semi and anti semi joins.

One interesting fact about NOT EXISTS is that it does not follow SQL’s typical three-valued logic of true, false, and unknown. It only returns true or false. I first remember reading this in Itzik-Ben Gan’s classic book T-SQL Querying.

Let’s take a look at the basic syntax.

/* MSSQLTips.com */
SELECT b.Id
FROM dbo.TableB AS b
WHERE NOT EXISTS
(
    SELECT 1
    FROM dbo.TableA AS a
    WHERE a.Id = b.Id
);

In the example above, I’m asking SQL to return all the rows in TableB that don’t exist in TableA. One important thing to remember is that the subquery must include a correlation for it to work correctly. If it doesn’t, it will likely return either zero or all rows, depending on the predicate (EXISTS or NOT EXISTS) you use.

Also, it doesn’t matter what you use for the SELECT clause of the subquery. The query will still run but not return the correlated results without the proper WHERE clause in the subquery as shown above.

/* MSSQLTips.com */
SELECT b.Id
FROM dbo.TableB AS b
WHERE NOT EXISTS
(
    SELECT 1
    FROM dbo.TableA AS a
);

Now, let’s build a dataset to compare their performance side by side.

Build Our Demo Database

One area I want to compare is the performance differences between the two methods on a decent-sized dataset. For our demo, I’ll create a database with two tables: TableA and TableB. The story is that TableA is missing some records. In TableB, we will insert 1 million rows, and in TableA, we’ll insert 999,500 (you can do the math to figure out how many are missing).

/* MSSQLTips.com */
USE [master]
 
IF DB_ID('DemoMissingRows') IS NOT NULL
BEGIN
    ALTER DATABASE DemoMissingRows SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE DemoMissingRows;
END;
GO
 
CREATE DATABASE DemoMissingRows;
GO
 
ALTER DATABASE DemoMissingRows SET RECOVERY SIMPLE;
GO
USE DemoMissingRows;
GO
 
CREATE TABLE dbo.TableA
(
    Id INT NOT NULL,
    SpecialValue NVARCHAR(200) NOT NULL,
    CreatedDateTime DATETIME2(0) NOT NULL
        DEFAULT (SYSUTCDATETIME()),
    CONSTRAINT PK_TableA_Id
        PRIMARY KEY CLUSTERED (Id)
);
 
CREATE TABLE dbo.TableB
(
    Id INT NOT NULL,
    SpecialValue NVARCHAR(200) NOT NULL,
    CreatedDateTime DATETIME2(0) NOT NULL
        DEFAULT (SYSUTCDATETIME()),
    CONSTRAINT PK_TableB_Id
        PRIMARY KEY CLUSTERED (Id)
);
GO
 
INSERT INTO dbo.TableB
(
    Id,
    SpecialValue
)
SELECT value,
       LEFT(CONCAT('Special Row-', value, ' in B '), 200)
FROM GENERATE_SERIES(1, 1000000);
GO
 
INSERT INTO dbo.TableA
(
    Id,
    SpecialValue
)
SELECT value,
       LEFT(CONCAT('Special Row-', value, ' in A '), 200)
FROM GENERATE_SERIES(1, 1000000)
WHERE value % 2000 <> 0;
GO
 
SELECT COUNT(*) AS Row_Count, 'TableA' AS TableName
FROM dbo.TableA
UNION ALL
SELECT COUNT(*) AS Row_Count, 'TableB' AS TableName
FROM dbo.TableB;
GO

Did you notice I used the GENERATE_SERIES function to insert the rows? Usually, I rely on a tally table or something similar, but I wanted to try something different. Keep in mind, you’ll need SQL Server 2022 or an Azure version for GENERATE_SERIES to work. The results of the last SELECT statement confirm that the demo creation worked as expected.

Row_Count   TableName
----------- ---------
1000000     TableB
999500      TableA

Get Ready, Set, Execute

Now that we have our demo dataset, let’s run some tests and compare the performance of the two methods. First, we’ll use the LEFT JOIN. For the performance test, I want to ensure my actual execution plan is enabled, and I’ll use it to check the execution time. Below is the first query we’ll execute.

/* MSSQLTips.com */
SELECT b.Id
FROM dbo.TableB AS b
    LEFT OUTER JOIN dbo.TableA AS a
        ON a.Id = b.Id
WHERE a.Id IS NULL;

Alright, here is the execution plan: the average elapsed time is around 269ms, and the average CPU time is 1455ms. I executed this a few times to ensure the data was in memory. Keep in mind your time may differ. Also, notice the filter operator I highlighted in red.

Next, let’s run the code below, which uses NOT EXISTS and also captures the execution plan, which we’ll compare in the next section.

/* MSSQLTips.com */
SELECT b.Id
FROM dbo.TableB AS b
WHERE NOT EXISTS
(
    SELECT 1 FROM dbo.TableA AS a WHERE a.Id = b.Id
);

The screenshot above shows the execution plan, with an average elapsed time of around 227ms and an average CPU time of 1132ms. As in the prior test, I executed this a few times to ensure the data was in memory.

Evaluating Performance Results

If you compare the execution plans side by side, you’ll see that the main difference is that the NOT EXISTS doesn’t use a filter operator after joining datasets with the hash match. In fairness, that’s what we expect the left join to do: combine the data. When we analyze CPU time differences, NOT EXISTS is approximately 25% faster, and the total elapsed time is nearly 17% quicker. At the start of this article, I asked if there was a better way to find missing rows than using a LEFT JOIN. By better, I mean more efficient. So, based on these results, if you want the best performance and, in my opinion, a more direct approach, choose NOT EXISTS.

SQL Server 2025: Optimizer Behavior

I heard an interesting fact about SQL Server 2025 from Erik Darling: when you use a LEFT JOIN in this way, the optimizer automatically converts it to an anti semi join. To me, this is Microsoft saying, ‘What you are doing isn’t the best way, but let us fix it for you.’ The screenshot below is from the latest preview of SQL Server 2025.

Clean Up

After you finish the demo database, run the statement below to drop the database.

/* MSSQLTips.com */
USE [master];
GO
 
IF DB_ID('DemoMissingRows') IS NOT NULL
BEGIN
    ALTER DATABASE DemoMissingRows SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE DemoMissingRows;
END;
GO

Key Points

If you are only checking for the existence of a row, I would go with EXISTS or NOT EXISTS. A LEFT OUTER JOIN still gets the job done, but EXISTS generally does it in a more efficient (faster) and readable way. Of course, readability is up to you.
Even Microsoft decided to use the anti semi join operator in SQL Server 2025, which is good if you just can’t give up the LEFT JOIN out of habit.
When using NOT EXISTS, don’t forget to add the correlation in the subquery to the outer query. Without it, you will not get back the expected results.

Next Steps

Do you need to compare tables to find differences? I wrote the article, “Options to Compare Data from Two SQL Server Tables,” to help you.
Are you curious to learn about the exciting new features in SQL Server 2025? Koen Verbeeck wrote the article, “Install AdventureWorks Database for SQL Server 2025,” which goes into detail.
Aaron Bertrand wrote about one of his favorite SQL Server 2022 enhancements in the article “SQL Server GENERATE_SERIES Function”. If you’re tired of using a tally table to generate a number sequence, give it a try.

Jared Westover

Jared Westover is a SQL Server specialist with two decades of industry experience covering T-SQL development, performance tuning, administration and Microsoft Fabric. He is currently a software architect at Crowe, an author at Pluralsight and primary contributor at sqlhabits.com. On MSSQLTips.com, Jared is a respected award-winning author for his clever T-SQL solutions and bringing to light new real-world solutions to age-old development problems.

MSSQLTips Awards
- Achiever Award (75+ tips) – 2026
- Author of the Year – 2023
- Author Contender – 2024/2025

The main reason, why EXISTS / NON EXISTS is better (when you don’t need infos from the second table) is, that the EXISTS stops after the first match, while a LEFT JOIN continues to read all rows in the second table.

Imagine, you want to find all customers, that have not made any orders.
SELECT c.* FROM dbo.customers AS c LEFT JOIN dbo.orders AS o ON o.customer_id = o.order_id WHERE o.customer_id IS NULL
vs.
SELECT c.* FROM dbo.customers AS c WHERE NOT EXISTS (SELECT * FROM dbo.orders AS o WHERE o.customer_id = o.order_id)

You have 1000 customers and 1 mio orders. 100 customers are inactive.

The first query with LEFT JOIN reads both tables full and will produce 1 mio rows after the JOIN before it filters the 1 mio down to the 100 unique customers.
The NON EXISTS will (usually) do 1000 Nested Lookups (I assume, that the orders table has an index on the customer_id), stops searching / reading, after it finds an order for a customer and keeps only the 100 rows where no order was found.
—–

@Jared Westover (author): The example above the topic “Build Our Demo Database” misses the WHERE condition in the subquery.

4 Comments

Don Perkins
January 20, 2026 / 6:23 pm Reply
I think it’s important to note that we data folks started using LEFT JOIN for finding missing values because it was a _much_ better (if less-intuitive) approach to using “NOT IN (…)”! “NOT IN (…)” returns all the data from both datasets before doing any comparisons, where the “LEFT JOIN and look for NULL” approach avoids returning the missing common rows entirely.
“NOT EXITS (…)” is a relatively elegant way of stating the intent of “NOT IN (…)” without invoking the complexity of that phrase.
And, yes… it’s going to be hard to let go of “LEFT JOIN and look for NULL” after doing it for over 35+ years.
- Dan J
  June 17, 2026 / 12:47 pm Reply
  I’m with you. It’s muscle memory at this point.
  In most modern T-SQL environments, an NOT IN statement producing the same Left Anti Semi Join as a NOT EXISTS as long as the field in the subquery IS NOT NULL.
  I find NOT IN a bit more intuitive then NOT EXISTS. It’s likely because I can test the subquery which gives me the warm and fuzzies.
  — not exists
  select imf.id
  from afn.import_file imf
  where not exists
  (
  select 1
  from afn.import_file_json imfj
  where imfj.import_file_id = imf.id
  )
  — not in gives same results
  select imf.id
  from afn.import_file imf
  where imf.id not in
  (
  select import_file_id
  from afn.import_file_json
  where import_file_id is not null
  )
Thomas Franz
January 19, 2026 / 1:46 pm Reply
The main reason, why EXISTS / NON EXISTS is better (when you don’t need infos from the second table) is, that the EXISTS stops after the first match, while a LEFT JOIN continues to read all rows in the second table.
Imagine, you want to find all customers, that have not made any orders.
SELECT c.* FROM dbo.customers AS c LEFT JOIN dbo.orders AS o ON o.customer_id = o.order_id WHERE o.customer_id IS NULL
vs.
SELECT c.* FROM dbo.customers AS c WHERE NOT EXISTS (SELECT * FROM dbo.orders AS o WHERE o.customer_id = o.order_id)
You have 1000 customers and 1 mio orders. 100 customers are inactive.
The first query with LEFT JOIN reads both tables full and will produce 1 mio rows after the JOIN before it filters the 1 mio down to the 100 unique customers.
The NON EXISTS will (usually) do 1000 Nested Lookups (I assume, that the orders table has an index on the customer_id), stops searching / reading, after it finds an order for a customer and keeps only the 100 rows where no order was found.
—–
@Jared Westover (author): The example above the topic “Build Our Demo Database” misses the WHERE condition in the subquery.
- Jared Westover
  January 23, 2026 / 7:25 am Reply
  @Thomas
  First off, thank you for taking the time to read the article and leave a thoughtful comment. The query I used in the article, copied below, was intended to illustrate a case where NOT EXISTS is not correlated. After reading your comment and rereading the section, I could have done a better job of highlighting the distinction.
  /* MSSQLTips.com */
  SELECT b.Id
  FROM dbo.TableB AS b
  WHERE NOT EXISTS
  (
  SELECT 1
  FROM dbo.TableA AS a
  );

Find Missing SQL Server Data with NOT EXISTS