Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

SQL Server Operations from Set Theory


By:   |   Read Comments (5)   |   Related Tips: More > JOIN Tables

ALERT: Did you know 66% of DBAs say their workload is increasing! - Click here to learn more.


Problem

SQL Server provides the UNION, EXCEPT, and INTERSECT set operations. How do they work and how can they be used in SQL Server?  Can you provide any examples?  Check out this tip to learn more.

Solution

SQL Server provides the UNION, EXCEPT and INTERSECT set operations which work on complete rows from two queries to generate one result set. They are drawn from the fundamental operations of set theory. In set theory, these operations always have a result where every element is distinct. So these operations will not return a row more than once. SQL Server includes the UNION ALL operation which acts like a UNION, but will allow a row to appear more than once. SQL Server does not currently support an EXCEPT ALL or INTERSECT ALL.

Since these operations compare complete rows, the results of the queries must have the same column names, in the same order, and the types of the columns must be compatible. When comparing rows, it treats a null as equivalent to a null.

To help with some examples here, I created dbo.MultFive which contains the multiples of five less than 10000 and dbo.MultThree which contains the multiples of three less than 10000.  Here is the sample code:

/*Making maxnum bigint because top expects a bigint.
Not using "GO" between the creations to preserve the value of @maxnum.
*/

declare @maxnum bigint
set @maxnum = 10000

if OBJECT_ID('dbo.AllNums') is not null
 drop table dbo.AllNums
 
select top (@maxnum)
 IDENTITY(int, 1, 1) as num
into dbo.AllNums
from 
 master.dbo.syscolumns sc1,
 master.dbo.syscolumns sc2

if OBJECT_ID('dbo.MultFive') is not null
 drop table dbo.MultFive
 
select 
 num * 5 as num
into dbo.MultFive
from dbo.AllNums
where num * 5 < @maxnum


if OBJECT_ID('dbo.MultThree') is not null
 drop table dbo.MultThree
 
select 
 num * 3 as num
into dbo.MultThree
from dbo.AllNums
where num * 3 < @maxnum

 

SQL Server UNION and UNION ALL Examples

The union of two sets is the set of all distinct entities that are in either of the two origin sets. The UNION of two queries gives each row that appears in either of the tables, but each row will only appear once. So:

SELECT num
FROM dbo.MultThree
UNION
SELECT num
FROM dbo.MultFive   

gives all of the 4666 numbers up to 10000 that are divisible by either three or five. But it only gives each number one time and in an arbitrary order. If we want to put them numeric order we can add an ORDER BY clause at the end.

UNION ALL returns all results from either set whether or not they are unique. So executing:

SELECT num
FROM dbo.MultThree
UNION ALL
SELECT num
FROM dbo.MultFive

returns numerous duplicates. However, it will do so much more quickly as a comparison between the execution plans for those two queries shows.

It will do so much more quickly as a comparison between the execution plans for those two queries shows

SQL Server EXCEPT Example

The EXCEPT operator returns the results which are in the first query but not in the second. This is tied to the concept of relative complement in set theory and may be called the set difference. So:

SELECT num  
FROM dbo.MultThree
EXCEPT
SELECT num 
FROM dbo.MultFive

returns the rows that have multiples of threes which are not also multiples of five. As a Venn Diagram it looks like:

Venn Diagram for Set Except

Of course, in SQL this could be rewritten without the EXCEPT as:

SELECT DISTINCT num
FROM dbo.MultThree t
WHERE not exists
(SELECT num
 FROM dbo.MultFive f
 WHERE t.num = f.num
  OR t.num IS NULL and f.num IS NULL)

There are a few things worth noticing about this query. The first is that it is distinct because EXCEPT will make the results distinct implicitly. The checks for NULL values are necessary because EXCEPT will treat nulls as equivalent. In this particular case, the efficiency of the version with EXCEPT and the version without is rather close.

SQL Server INTERSECT Example

The INTERSECT operator gives the results which are in both queries. This is implementing the intersection of sets from set theory. The intersection of the table of the multiples of three and the table of multiples of five is the numbers that are multiples of both and can be written as:

SELECT num  
FROM dbo.MultThree
INTERSECT
SELECT num 
FROM dbo.MultFive

It's Venn Diagram looks like:

Venn Diagram for Set Except

A query using INTERSECT could be rewritten without it by using exists instead, it would look like:

SELECT DISTINCT num
FROM dbo.MultThree t
WHERE EXISTS
(SELECT num 
FROM dbo.MultFive f
where t.num = f.num
OR t.num IS NULL AND f.num IS NULL)

In this example, the two queries have similar performance, but the version using EXISTS instead of INTERSECT is slightly faster. Of course, in this particularly case, we could get the same results while dropping the DISTINCT and dropping the requirements to handle nulls since the tables have no replicated data and have no nulls. But that would not capture the full logic of what INTERSECT does. We could also use IN instead of EXISTS here to simplify the query since there is only one column, but if there were more columns involved INTERSECT will compare all of them while IN could only compare one column or expression.

CONCLUSION

The set operations of UNION, UNION ALL, EXCEPT, and INTERSECT provide an elegant way to work with different result sets in SQL Server. Unlike most other operations, they operate by comparing all columns involved in the query at once. When comparing the columns, they will treat NULLS as being equivalent. Also, other than UNION ALL, they all implicitly make the results DISTINCT which can have implications for performance as well as for the results returned. With these considerations in mind, they can help simplify certain types of queries that may involve multiple result sets.

Next Steps


Last Update:


next webcast button


next tip button



About the author
MSSQLTips author Tim Wiseman Tim Wiseman is a SQL Server DBA and Python Developer in Las Vegas.

View all my tips





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Thursday, January 22, 2015 - 9:47:28 AM - Bernie Black Back To Top

Nice, quick informative read.  Was unclear on a couple of those options. Well done.


Friday, July 25, 2014 - 6:36:22 PM - TimothyAWiseman Back To Top

Mr. Celko, thank you for comments. Personally, I hope that Microsoft does implement Except All and Intersect All form the standard in the near future. In the meantime, Itzik Ben Gan has an article that discusses a method to implement Intersect All.


Friday, July 18, 2014 - 10:51:45 AM - Joe Celko Back To Top

It is worth mentioning that the ANSI/ISO Standard SQL has UNION [ALL], EXCEPT [ALL] and INTERSECT [ALL] that Microsoft has not implemented. 


Thursday, November 14, 2013 - 5:22:25 PM - TimothyAWiseman Back To Top

Thanks for pointing that out, Gene.  I will ask the editors if we can get that corrected and I apologize for mistake.


Thursday, November 14, 2013 - 1:22:37 PM - Gene Wirchenko Back To Top
Your EXCEPT example has the wrong table for the first SELECT. It should be dbo.MultThree.

Learn more about SQL Server tools