UNION vs. UNION ALL in SQL Server

By:   |   Comments (9)   |   Related: > JOIN Tables


Problem

Sometimes there is a need to combine data from multiple tables or views into one comprehensive dataset. This may be for tables with similar data within the same database or maybe there is a need to combine similar data across databases or even across servers.

In this tip we will take a look at the how to use UNION vs. UNION ALL commands and how they differ.

Solution

In SQL Server you have the ability to combine multiple datasets into one comprehensive dataset by using the UNION or UNION ALL operators. There is a big difference in how these work as well as the final result set that is returned, but basically these commands join multiple datasets that have similar structures into one combined dataset.

UNION

This operation will allow you to join multiple datasets into one dataset and will remove any duplicates that exist. Basically it is performing a DISTINCT operation across all columns in the result set. 

UNION ALL

This operation again allows you to join multiple datasets into one dataset, but it does not remove any duplicate rows. Because this does not remove duplicate rows this process is faster, but if you don't want duplicate records you will need to use the UNION operator instead.

Rules to UNION data

  • Each query must have the same number of columns
  • Each column must have compatible data types
  • Column names for the final result set are taken from the first query
  • ORDER BY and COMPUTE clauses can only be issued for the overall result set and not within each individual result set
  • GROUP BY and HAVING clauses can only be issued for each individual result set and not for the overall result set

Tip: If you don't have the exact same columns in all queries use a default value or a NULL value such as:

SELECT firstName, lastName, company FROM businessContacts
UNION
SELECT firstName, lastName, NULL FROM nonBusinessContacts
or
SELECT firstName, lastName, createDate FROM businessContacts
UNION ALL
SELECT firstName, lastName, getdate() FROM nonBusinessContacts

UNION vs. UNION ALL Examples

Let's take a look at a few simple examples of how these commands work and how they differ. As you will see the final resultsets will differ, but there is some interesting info on how SQL Server actually completes the process.

UNION ALL

In this first example we are using the UNION ALL operator against the Employee table from the AdventureWorks database. This is probably not something you would do, but this helps illustrate the differences of these two operators.

There are 290 rows in table dbo.Employee.

SELECT * FROM HumanResources.Employee
UNION ALL
SELECT * FROM HumanResources.Employee
UNION ALL
SELECT * FROM HumanResources.Employee

When this query is run the result set has 870 rows. This is the 290 rows returned 3 times. The data is just put together one dataset on top of the other dataset.

Here is the execution plan for this query. We can see that the table was queried 3 times and SQL Server did a Concatenation step to concatenate all of the data.

sql server union vs. union all operator query plan

UNION

In this next example we are using the UNION operator against the Employee table again from the AdventureWorks database.

SELECT * FROM HumanResources.Employee
UNION
SELECT * FROM HumanResources.Employee
UNION
SELECT * FROM HumanResources.Employee

When this query is run the result set has 290 rows. Even though we combined the data three times the UNION operator removed the duplicate records and therefore returns just the 290 unique rows.

Here is the execution plan for this query. We can see that SQL Server first queried 2 of the tables, then did a Merge Join operation to combine the first two tables and then it did another Merge Join along with querying the third table in the query. So we can see there was much more worked that had to be performed to get this result set compared to the UNION ALL.

sql server union vs. union all operator query plan

UNION vs. UNION ALL Examples With Sort on Clustered Index Column

If we take this a step further and do a SORT of the data using the Clustered Index column we get these execution plans. From this we can see that the execution plan that SQL Server is using is identical for each of these operations even though the final result sets will still contain 870 rows for the UNION ALL and 290 rows for the UNION.

UNION ALL execution plan with sort on clustered index column

sql server union all with sort query plan

UNION execution plan with sort on clustered index column

sql server union with sort execution plan

UNION vs. UNION ALL Examples With Sort on Non-indexed Column

Here is another example doing the same thing, but this time doing a SORT on a non indexed column. As you can see the execution plans are again identical for these two queries, but this time instead of using a MERGE JOIN, a CONCATENATION and SORT operations are used.

UNION ALL execution plan with sort with a non-indexed column

sql server union all sort non indexed column

UNION execution plan with sort with a non-indexed column

sql server union sort non indexed column
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Greg Robidoux Greg Robidoux is the President and founder of Edgewood Solutions, a technology services company delivering services and solutions for Microsoft SQL Server. He is also one of the co-founders of MSSQLTips.com. Greg has been working with SQL Server since 1999, has authored numerous database-related articles, and delivered several presentations related to SQL Server. Before SQL Server, he worked on many data platforms such as DB2, Oracle, Sybase, and Informix.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Friday, March 25, 2022 - 9:32:29 AM - Joe F Celko Back To Top (89926)
While SQL Server gets the names of the columns and the results set from the first table in the union, the ANSI/ISO standards do not give them names. You have to write "(<union expression>) AS <result table>(column name list>)" which will create a new table with all of its elements named.

Thursday, October 28, 2021 - 9:30:04 AM - David Back To Top (89375)
Continue with "How do I use a template to append tables with same data types and column names?"

If I right click the first table dbo.202005 > Create Table as >
Create to > New Query Editor Window,
or
Select to > New Query Editor Window?

If I save this template in this folder,
C:\Program Files (x86)\Microsoft SQL Server Management Studio 18\Common7\IDE\SqlWorkbenchProjectItems\Sql\Earlier Versions\Create Table
or some folder,

How do I use the above template to append subsequent flat files with same data types in all columns?

Thanks

Thursday, October 28, 2021 - 8:40:51 AM - David Back To Top (89374)
Thanks for your prompt help Mr. Greg R.

It worked after I googled and fixed the error "Conversion failed when converting the nvarchar value 'TA1307000121' to data type int." It took awhile since I don't know which column of the 12 tables (13 columns each table) the value is in.

Then I found this script at https://stackoverflow.com/questions/15757263/find-a-string-by-searching-all-tables-in-sql-server,
and exported the results into Excel and found the tables. I then fixed the columns, start_station_name, start_station_id end_station_name, and end_station_id to nvarchar(100).


For future reference, when importing multiple flat files, is there a way for me to force a table design type/template if I already the my data types in each column? Or I can create a Design for the first table and import other tables afterward using the template?

Thanks again.

Tuesday, October 26, 2021 - 3:53:59 PM - Greg Robidoux Back To Top (89366)
Hi David,

you can just so something like this:

SELECT * INTO #temp FROM dbo.[202005]
UNION ALL
SELECT * FROM dbo.[202006]
UNION ALL
SELECT * FROM dbo.[202007]
UNION ALL
...

then just
SELECT * FROM #temp

Tuesday, October 26, 2021 - 3:22:05 PM - David Back To Top (89365)
I'm a new student of MS SQL Server and appreciate all mssqltips pro advice.

I have 12 tables of same data types and column names in same order. The ride_id is the primary key in all the tables. It's the divvy bike-sharing data here - https://www.kaggle.com/yingwurenjian/chicago-divvy-bicycle-sharing-data

I googled for hours and not sure exactly how to use UNION ALL properly to append the tables AND create and save a the results into a temp table so I can analyze (remove duplicates, DISTINCT COUNT, etc...from there?

This is what I have so far
SELECT *
FROM dbo.[202005]
UNION ALL
SELECT *
FROM dbo.[202006]
UNION ALL
SELECT *
FROM dbo.[202007]
UNION ALL
SELECT *
FROM dbo.[202008]
UNION ALL
SELECT *
FROM dbo.[202009]
UNION ALL
SELECT *
FROM dbo.[202010]
UNION ALL
SELECT *
FROM dbo.[202011]
UNION ALL
SELECT *
FROM dbo.[202012]
UNION ALL
SELECT *
FROM dbo.[202101]
UNION ALL
SELECT *
FROM dbo.[202102]
UNION ALL
SELECT *
FROM dbo.[202103]
UNION ALL
SELECT *
FROM dbo.[202104]

Any advice is appreciated.

Wednesday, December 16, 2020 - 12:55:25 PM - Joe Celko Back To Top (87919)
It is worth mentioning that according to ANSI/ISO standards, the result set has unnamed columns. Some SQL implementations get column names for the final result set rom the first query. Others will use the last query. However, the safest and more standard approach is to explicitly name the columns and the resulting set. Here is a skeleton:

( SELECT c1, c2, c3 FROM T1 WHERE ...
UNION [ALL]
SELECT d1, d2, d3 FROM T2 WHERE ...
UNION [ALL]
SELECT c1, a2, b3 FROM T3 WHERE ...
) AS X (x1, x2. x3)

The phrase we use for the requirement that the columns match by position and data type is called "union compatible" and there are some rules about converting data types to get all of the queries into a union compatible format. These same rules apply to INTERSECT [ALL] and EXCEPT [ALL] operators.

Thursday, December 5, 2019 - 9:56:54 AM - Thomas Franz Back To Top (83296)

Short remark:

You do not need to put a DISTINCT in any of the subqueries, if you use UNION (since the dublettes were removed "outside" by the UNION and removing dublettes twice or three times is slower than doing it only once).

If you know, that there can not be dublettes in your both result sets (e.g. because you used a hardcoded type as in SELECT 'this stuff' as type, col1, col2... or because you query different tables), you should always use UNION ALL. The perfomance gain compared to UNION can be very huge.


Friday, October 13, 2017 - 12:22:21 PM - gosho Back To Top (67291)

Very informative and helpful. Keep up the good work!


Tuesday, February 7, 2017 - 2:22:49 AM - Dipen Kenia Back To Top (46065)

This was very helpful. 

 















get free sql tips
agree to terms