SQL Server IN vs EXISTS

By:   |   Updated: 2021-04-29   |   Comments (9)   |   Related: 1 | 2 | 3 | 4 | 5 | > TSQL


Problem

Is there a difference between using the T-SQL IN operator or the EXISTS operator in a WHERE clause to filter for specific values in SQL queries and stored procedures? Is there a logical difference, a performance difference or are they exactly the same? And what about NOT IN and NOT EXISTS?

Solution

In this SQL tutorial we'll investigate if there are any differences between the EXISTS and the IN operator. This can either be logical, i.e. they behave different under certain circumstances, or performance-wise, meaning if using one operator has a performance benefit over the other. We'll be using the AdventureWorks DW 2017 sample database for our test queries for the Microsoft SQL Server DBMS.

SQL IN vs EXISTS Syntax

The IN operator is typically used to filter a column for a certain list of values. For example, review this SELECT statement:

SELECT
     [ProductSubcategoryKey]
    ,[EnglishProductSubcategoryName]
    ,[ProductCategoryKey]
FROM [AdventureWorksDW2017].[dbo].[DimProductSubcategory]
WHERE [ProductCategoryKey] IN (1,2);	

This query searches for all the product subcategories which belong to the product categories Bikes and Categories (ProductCategoryKey 1 and 2).

SQL Server T-SQL IN with static list

You can also use the IN operator to search the values in the result set of a subquery with the following SQL commands:

SELECT
     [ProductSubcategoryKey]
    ,[EnglishProductSubcategoryName]
    ,[ProductCategoryKey]
FROM [AdventureWorksDW2017].[dbo].[DimProductSubcategory]
WHERE [ProductCategoryKey] IN
        (
            SELECT [ProductCategoryKey]
            FROM [dbo].[DimProductCategory]
            WHERE [EnglishProductCategoryName] = 'Bikes'
        );

This query returns all subcategories linked to the Bikes category.

SQL Server T-SQL IN operator with subquery

The benefit of using a subquery is that the query becomes less hard-coded; if the ProductCategoryKey changes for some reason, the second query will still work, while the first query might suddenly return incorrect results. It's important though the subquery returns exactly one column for the IN operator to work.

The EXISTS operator doesn't check for values, but instead checks for the existence of rows. Typically, a subquery is used in conjunction with EXISTS. It actually doesn't matter what the subquery returns, as long as rows are returned.

This query will return all rows from the ProductSubcategory table, because the inner subquery returns rows (which are not related to the outer query at all).

SELECT
     [ProductSubcategoryKey]
    ,[EnglishProductSubcategoryName]
    ,[ProductCategoryKey]
FROM [AdventureWorksDW2017].[dbo].[DimProductSubcategory]
WHERE EXISTS (
        SELECT 1/0
        FROM [dbo].[DimProductCategory]
        WHERE [EnglishProductCategoryName] = 'Bikes'
        );			

As you might have noticed, the subquery has 1/0 in the SELECT clause. In a normal query, this would return a divide by zero error, but inside an EXISTS clause it's perfectly fine, since this division is never calculated. This demonstrates that it's not important what the subquery returns, as long as rows are returned.

To use EXISTS in a more meaningful way, you can use a correlated subquery. In a correlated subquery, we pair values from the outer query with values from the inner (sub)query. This effectively checks if the value of the outer query exists in the table used in the inner query. For example, if we want to return a list of all employees who made a sale, we can write the following query:

SELECT
    [EmployeeKey]
   ,[FirstName]
   ,[LastName]
   ,[Title]
FROM [AdventureWorksDW2017].[dbo].[DimEmployee] e
WHERE EXISTS (
            SELECT 1
            FROM dbo.[FactResellerSales] f
            WHERE e.[EmployeeKey] = f.[EmployeeKey]
        );

In the WHERE clause inside the EXISTS subquery, we correlate the employee key of the outer table – DimEmployee – with the employee key of the inner table – FactResellerSales. If the employee key exists in both tables, a row is returned and EXISTS will return true. If an employee key is not found in FactResellerSales, EXISTS returns false and the employee is omitted from the results:

T-SQL EXISTS to find sales representatives

We can implement the same logic using the IN operator with the following SQL statement:

SELECT
    [EmployeeKey]
   ,[FirstName]
   ,[LastName]
   ,[Title]
FROM [AdventureWorksDW2017].[dbo].[DimEmployee] e
WHERE [EmployeeKey] IN (
            SELECT [EmployeeKey]
            FROM dbo.[FactResellerSales] f
        );		

Both queries return the same result set, but maybe there is an underlying performance difference? Let's compare the execution plans.

This is the plan for EXISTS:

SQL Server execution plan for EXISTS

This is the plan for IN:

SQL Server execution plan for IN

They look exactly the same. When executing both queries at the same time, you can see they get assigned the same cost:

Compare the SQL Server execution plan for EXISTS vs IN

The top execution plan is for EXISTS, the bottom one for IN.

Let's take a look at the IO statistics (you can show these by running the statement SET STATISTICS IO ON). Again, everything is exactly the same:

Compare the SQL Server statistics IO for EXISTS vs IN

So, there's no performance difference that we can prove and both return the same result sets. When would you choose to use one or the other? Here are some guidelines:

  • If you have a small list of static values (and the values are not present in some table), the IN operator is preferred.
  • If you need to check for existence of values in another table, the EXISTS operator is preferred as it clearly demonstrates the intent of the query.
  • If you need to check against more than one single column, you can only use EXISTS since the IN operator only allows you to check for one single column.

Let’s illustrate the last point with an example. In the AdventureWorks data warehouse, we have an Employee dimension. Some employees manage a specific sales territory:

employees manage a territory

Now, it’s possible that a sales person also makes sales in other territories. For example, Michael Blythe – responsible for the Northeast region – has sold in 4 distinct regions:

territories for one manager

Let’s suppose we now only want to find the sales amounts for the sales territory managers, but only for their own region. A possible SQL query could be:

SELECT
     f.[EmployeeKey]
    ,f.[SalesTerritoryKey]
    ,SUM([SalesAmount])
FROM [dbo].[FactResellerSales] f
WHERE EXISTS
        (
            SELECT 1
            FROM [dbo].[DimEmployee] e
            WHERE   f.[EmployeeKey]         = e.[EmployeeKey]
                AND f.[SalesTerritoryKey]   = e.[SalesTerritoryKey]
                AND e.[SalesTerritoryKey]   <> 11 -- the NA region
        )
GROUP BY f.[EmployeeKey]
        ,f.[SalesTerritoryKey];

The result is as follows:

using exists when matching on multiple columns

Inside the EXISTS clause, we retrieve the sales territories managers by filtering out all employees linked to the NA region. In the outer query, we get all sales per sales territory and employee, where the employee and territory is found in the inner query. As you can see, EXISTS allows us to easily check on multiple columns, which is not possible with IN. 

SQL Server NOT IN vs NOT EXISTS

By prefixing the operators with the NOT operator, we negate the Boolean output of those operators. Using NOT IN for example will return all rows with a value that cannot be found in a list.

Using SQL Server NOT IN Clause

There is one special case though: when NULL values come into the picture. If a NULL value is present in the list, the result set is empty!

SQL Server NOT IN with NULLs

This means that NOT IN can return unexpected results if suddenly a NULL value pops up in the result set of the subquery.  NOT EXISTS doesn't have this issue, since it doesn't matter what is returned. If an empty result set is returned, NOT EXISTS will negate this, meaning the current record isn't filtered out:

SQL Server NOT EXISTS with empty result set

The query above returns all employees who haven't made a sale. Logically, NOT IN and NOT EXISTS are the same – meaning they return the same result sets – as long as NULLS aren't involved. Is there a performance difference? Again, both query plans are the same:

SQL Server query plans with NOT IN and NOT EXISTS without any NULL values

The same goes for the IO statistics:

SQL Server IO statistics with NOT IN and NOT EXISTS

There is one gotcha though. The EmployeeKey is not-nullable in FactResellerSales. As demonstrated before, NOT IN can have issues when NULLs are involved. If we change EmployeeKey to be nullable, we get the following execution plans:

SQL Server query plans with NOT IN and NOT EXISTS with NULL values

Quite a difference this time! Because SQL Server now has to take NULL values into account, the execution plan changes. The same can be seen in the IO statistics:

SQL Server IO statistics with Nullable column

Now there's an actual performance difference between NOT IN and NOT EXISTS. When to use which operator? Some guidelines:

  • The same guidelines as for IN and EXISTS can be applied. For checking against a small static list, NOT IN is preferred. Checking for existence in another table? NOT EXISTS is the better choice. Checking against multiple columns, again NOT EXISTS.
  • If one of the columns is nullable, NOT EXISTS is preferred.

Using Joins Instead of IN or EXISTS

The same logic can be implemented with joins as well. An alternative for IN and EXISTS is an INNER JOIN, while a LEFT OUTER JOIN with a WHERE clause checking for NULL values can be used as an alternative for NOT IN and NOT EXISTS. The reason they are not included in this tip – even though they might return the exact same result set and execution plan – is because the intent is different. With IN and EXISTS, you check for the existence of values in another record set. With joins you merge the result sets, which means you have access to all columns of the other table. Checking for existence is more of a "side-effect". When you use (NOT) IN and (NOT) EXISTS, it's really clear what the intent of your query is. Joins on the other hand can have multiple purposes.

Using an INNER JOIN, you can also have multiple rows returned for the same value if there are multiple matches in the second table. If you want to check for existence and if a value exist you need a column from the other table, joins are preferred.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Koen Verbeeck Koen Verbeeck is a seasoned business intelligence consultant at AE. He has over a decade of experience with the Microsoft Data Platform in numerous industries. He holds several certifications and is a prolific writer contributing content about SSIS, ADF, SSAS, SSRS, MDS, Power BI, Snowflake and Azure services. He has spoken at PASS, SQLBits, dataMinds Connect and delivers webinars on MSSQLTips.com. Koen has been awarded the Microsoft MVP data platform award for many years.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2021-04-29

Comments For This Article




Monday, October 25, 2021 - 1:36:46 AM - Koen Verbeeck Back To Top (89355)
Hi Jeff,

a little bit about the query, I explain how the employee table looks like and that each sales employee is responsible for a specific region. Then I explain that some sales persons have made sales for different regions, not just the ones the person is responsible for. In the final query, we use EXISTS to check if a sale was made for a sales person on the region they are responsible for. This is done by the following condition:

f.[EmployeeKey] = e.[EmployeeKey] AND f.[SalesTerritoryKey] = e.[SalesTerritoryKey]

--> meaning the employeekey from the fact table must match an employeekey in the employee table, but also the sales territory key from the fact table must match with the sales territory key in the employee table (this is the sales territory the employee is responsible for)

Hope this helps.
Koen

Friday, October 22, 2021 - 10:58:14 AM - George D Back To Top (89350)
It's not just INNER JOIN that can return duplicate results. Any JOIN could do that. Great article! I use both EXISTS and IN in my queries. Fortunately for me, I tend to use them the exact way you suggested. Thank you for the detailed explanation and comparisons.

Friday, October 22, 2021 - 10:34:20 AM - Jeff Back To Top (89349)
Very good point showing difference between IN vs. EXISTS/JOIN. However the example and the query (esp. group by and amount one) is not easy to understand, appreciate if showing the details of all involving tables, or choose an easier example or query to demonstrate.

Friday, March 5, 2021 - 4:21:52 AM - Binh Thanh Nguyen Back To Top (88345)
Thanks, nice post

Tuesday, November 24, 2020 - 3:15:58 AM - Koen Verbeeck Back To Top (87828)
Thanks Graham, this made my day! It's the reason I write all those articles.

Have a nice day,
Koen

Friday, November 20, 2020 - 6:02:58 PM - Graham Back To Top (87818)
Thanks! This is one of the clearest explanations I've read on this topic.

Friday, January 24, 2020 - 2:58:27 PM - Steve Back To Top (83981)

Thank you for the detailed explanation! Very informative.


Tuesday, November 12, 2019 - 5:01:15 AM - Naresh chikoti Back To Top (83065)

Great Explanation worth reading


Monday, May 13, 2019 - 10:17:15 AM - Jaime Back To Top (80058)

Nice explanation.















get free sql tips
agree to terms