SQL Server Operations from Set Theory
By: Tim Wiseman | Updated: 2013-11-14 | Comments (5) | Related: More > JOIN Tables
SQL Server provides the UNION, EXCEPT, and INTERSECT set operations. How do they work and how can they be used in SQL Server? Can you provide any examples? Check out this tip to learn more.
SQL Server provides the UNION, EXCEPT and INTERSECT set operations which work on complete rows from two queries to generate one result set. They are drawn from the fundamental operations of set theory. In set theory, these operations always have a result where every element is distinct. So these operations will not return a row more than once. SQL Server includes the UNION ALL operation which acts like a UNION, but will allow a row to appear more than once. SQL Server does not currently support an EXCEPT ALL or INTERSECT ALL.
Since these operations compare complete rows, the results of the queries must have the same column names, in the same order, and the types of the columns must be compatible. When comparing rows, it treats a null as equivalent to a null.
To help with some examples here, I created dbo.MultFive which contains the multiples of five less than 10000 and dbo.MultThree which contains the multiples of three less than 10000. Here is the sample code:
/*Making maxnum bigint because top expects a bigint. Not using "GO" between the creations to preserve the value of @maxnum. */ declare @maxnum bigint set @maxnum = 10000 if OBJECT_ID('dbo.AllNums') is not null drop table dbo.AllNums select top (@maxnum) IDENTITY(int, 1, 1) as num into dbo.AllNums from master.dbo.syscolumns sc1, master.dbo.syscolumns sc2 if OBJECT_ID('dbo.MultFive') is not null drop table dbo.MultFive select num * 5 as num into dbo.MultFive from dbo.AllNums where num * 5 < @maxnum if OBJECT_ID('dbo.MultThree') is not null drop table dbo.MultThree select num * 3 as num into dbo.MultThree from dbo.AllNums where num * 3 < @maxnum
SQL Server UNION and UNION ALL Examples
The union of two sets is the set of all distinct entities that are in either of the two origin sets. The UNION of two queries gives each row that appears in either of the tables, but each row will only appear once. So:
SELECT num FROM dbo.MultThree UNION SELECT num FROM dbo.MultFive
gives all of the 4666 numbers up to 10000 that are divisible by either three or five. But it only gives each number one time and in an arbitrary order. If we want to put them numeric order we can add an ORDER BY clause at the end.
UNION ALL returns all results from either set whether or not they are unique. So executing:
SELECT num FROM dbo.MultThree UNION ALL SELECT num FROM dbo.MultFive
returns numerous duplicates. However, it will do so much more quickly as a comparison between the execution plans for those two queries shows.
SQL Server EXCEPT Example
The EXCEPT operator returns the results which are in the first query but not in the second. This is tied to the concept of relative complement in set theory and may be called the set difference. So:
SELECT num FROM dbo.MultThree EXCEPT SELECT num FROM dbo.MultFive
returns the rows that have multiples of threes which are not also multiples of five. As a Venn Diagram it looks like:
Of course, in SQL this could be rewritten without the EXCEPT as:
SELECT DISTINCT num FROM dbo.MultThree t WHERE not exists (SELECT num FROM dbo.MultFive f WHERE t.num = f.num OR t.num IS NULL and f.num IS NULL)
There are a few things worth noticing about this query. The first is that it is distinct because EXCEPT will make the results distinct implicitly. The checks for NULL values are necessary because EXCEPT will treat nulls as equivalent. In this particular case, the efficiency of the version with EXCEPT and the version without is rather close.
SQL Server INTERSECT Example
The INTERSECT operator gives the results which are in both queries. This is implementing the intersection of sets from set theory. The intersection of the table of the multiples of three and the table of multiples of five is the numbers that are multiples of both and can be written as:
SELECT num FROM dbo.MultThree INTERSECT SELECT num FROM dbo.MultFive
It's Venn Diagram looks like:
A query using INTERSECT could be rewritten without it by using exists instead, it would look like:
SELECT DISTINCT num FROM dbo.MultThree t WHERE EXISTS (SELECT num FROM dbo.MultFive f where t.num = f.num OR t.num IS NULL AND f.num IS NULL)
In this example, the two queries have similar performance, but the version using EXISTS instead of INTERSECT is slightly faster. Of course, in this particularly case, we could get the same results while dropping the DISTINCT and dropping the requirements to handle nulls since the tables have no replicated data and have no nulls. But that would not capture the full logic of what INTERSECT does. We could also use IN instead of EXISTS here to simplify the query since there is only one column, but if there were more columns involved INTERSECT will compare all of them while IN could only compare one column or expression.
The set operations of UNION, UNION ALL, EXCEPT, and INTERSECT provide an elegant way to work with different result sets in SQL Server. Unlike most other operations, they operate by comparing all columns involved in the query at once. When comparing the columns, they will treat NULLS as being equivalent. Also, other than UNION ALL, they all implicitly make the results DISTINCT which can have implications for performance as well as for the results returned. With these considerations in mind, they can help simplify certain types of queries that may involve multiple result sets.
- For an overview of the UNION and UNION All operators, see Greg Robidoux's Joining data using UNION and UNION ALL in SQL Server
- For an overview of INTERSECT and EXCEPT operators, see Greg's Compare SQL Server Datasets with INTERSECT and EXCEPT
- For some excellent examples of the use of EXCEPT and INTERSECT, see Robert Sheldon's "The Except and Intersect Operators in SQL Server"
- Itzik Ben-Gan provides some additional tips on use set operations in his " Set Operations" article. This includes techniques for implementing INTERSECT ALL, though this doesn't have the same performance benefits as UNION ALL.
Last Updated: 2013-11-14
About the author
View all my tips