CTE in SQL Server Examples


By:   |   Updated: 2021-05-13   |   Comments (1)   |   Related: More > Common Table Expressions


Problem

As any developer using SQL, writing long queries containing extensive logic is commonplace. This can include many joins, varying levels of aggregation, and filtering across any number of tables. In some cases, a result set cannot be achieved by a single isolated query (SELECT x FROM y WHERE z) and requires a series of queries to be strung together. Additionally, for more complex queries, the best way to write a query is often to break it down into logical steps and build it up piece by piece.

Traditionally, sub-queries are one method to isolate logic into a single SQL statement which can then be added back to the 'main' query that you are trying to build out incrementally. This can be effective, but it is also difficult to read and understand for others.

CTEs offer a logical and legible approach to writing queries that can break the complex queries down to a series of logical steps, help to improve the legibility of your queries, and achieve more complex result sets.

Solution

The below queries will utilize the AdventureWorks database from Microsoft, in addition to SQL Server Management Studio and Microsoft SQL Server Developer Edition.

You can download the latest version of this software for free at the following links:

What is a CTE?

By pure definition, a CTE is a 'temporary named result set'. In practice, a CTE is a result set that remains in memory for the scope of a single execution of a SELECT, INSERT, UPDATE, DELETE, or MERGE statement.

Let's break this down and take a look at the syntax to get a better understanding of what this means and why it is useful.

The basic syntax of a CTE is as follows:

WITH <common_table_expression> ([column names])
AS
(
   <cte_query_definition>
)
<operation>

Broken down – the WITH clause is telling SQL Server we are about to declare a CTE, and the <common_table_expression> is how we are naming the result set for reference later. Hence, a 'named result set'.

The [column names] are where you can alias the column names that will come out of the CTE. This is an optional portion of the syntax, and I often just alias the column within the <cte_query_definition> portion of the syntax.

The <cte_query_definition> is always a SELECT statement. This is where we are defining our result set. You can think of this as a temporary table that can be referenced in a FROM or JOIN clause like any other normal table. However, there are some key differences between a CTE and a temporary table which will be described later. You can also alias your columns in this section to be referenced later on.

The <operation> placeholder is where we will do our actual execution and reference the CTE. As mentioned earlier, this can be a SELECT, INSERT, UPDATE, DELETE, or MERGE T-SQL statement. We will look at some examples of these below.

Let's take a look at a CTE example to drive this home:

WITH Simple_CTE
AS (
   SELECT dd.CalendarYear
      ,fs.OrderDateKey
      ,fs.ProductKey
      ,fs.OrderQuantity * fs.UnitPrice AS TotalSale
      ,dc.FirstName
      ,dc.LastName
   FROM [dbo].[FactInternetSales] fs
   INNER JOIN [dbo].[DimCustomer] dc ON dc.CustomerKey = fs.CustomerKey
   INNER JOIN [dbo].[DimDate] dd ON dd.DateKey = fs.OrderDateKey
   )
SELECT *
FROM Simple_CTE
Table Description automatically generated

In the above example, we have created a CTE that contains a few joins, specifically selected column list, and a derived column 'TotalSale'. Our operation in this example is just another SELECT statement, just to show how the CTE can be used in the simplest form.

CTE with SELECT

Let's take the syntax a bit further. Often, you might find yourself needing to do a multi-tiered aggregation. That is, an aggregation of an aggregation. CTEs can be a great way to write this type of query in a readable way.

WITH Sum_OrderQuantity_CTE
AS (
SELECT ProductKey
,EnglishMonthName
,SUM(OrderQuantity) AS TotalOrdersByMonth
FROM [dbo].[FactInternetSales] fs
INNER JOIN [dbo].[DimDate] dd ON dd.DateKey = fs.OrderDateKey
GROUP BY ProductKey, EnglishMonthName

)
SELECT ProductKey, AVG(TotalOrdersByMonth) AS 'Average Total Orders By Month'
FROM Sum_OrderQuantity_CTE
GROUP BY ProductKey
ORDER BY ProductKey
Result set of multi-tiered aggregate CTE query. Shows two columns, ProductKey and AverageTotalOrdersByMonth

In this SQL code, we are taking the sum of OrderQuantity by product per month to see how much of each item was sold per month. Then, we are averaging this aggregate to see for each product, what is the monthly average quantity sold.

CTE with INSERT

To build on the previous example, rather than just selecting the aggregated data, we can pair an INSERT statement with the subquery to insert data into a new table.

First, build this table so that we have a table to insert into:

CREATE TABLE [dbo].[TestTable](
   [CalendarYear] [smallint] NOT NULL,
   [OrderDateKey] [int] NOT NULL,
   [ProductKey] [int] NOT NULL,
   [TotalSale] [money] NULL,
   [FirstName] [nvarchar](50) NULL,
   [LastName] [nvarchar](50) NULL
)

Then, in the following example we can use a CTE to Insert Into this table.

WITH Simple_CTE
AS (
   SELECT dd.CalendarYear
      ,fs.OrderDateKey
      ,fs.ProductKey
      ,fs.OrderQuantity * fs.UnitPrice AS TotalSale
      ,dc.FirstName
      ,dc.LastName
   FROM [dbo].[FactInternetSales] fs
   INNER JOIN [dbo].[DimCustomer] dc ON dc.CustomerKey = fs.CustomerKey
   INNER JOIN [dbo].[DimDate] dd ON dd.DateKey = fs.OrderDateKey
   )
 
INSERT INTO dbo.TestTable
SELECT * FROM Simple_CTE

Note: we have written this as a SELECT * for simplicity, but often you will want to name each column explicitly in case your underlying schema changes in the future.

CTE with UPDATE

Similar to a CTE with DELETE, we can update a table using the same method.

First, let's create a backup table so that we aren't updating the actual table:

SELECT * 
INTO dbo.DimCustomerBakUpdate 
FROM DimCustomer

Next, we can update this table. In the below example, we are updating all of the records where the AddressLine2 is NULL as specified in the WHERE clause to be a blank string instead.

WITH Update_CTE
AS (
   SELECT *
   FROM dbo.DimCustomerBakUpdate
   WHERE AddressLine2 IS NULL
   )
UPDATE Update_CTE
SET AddressLine2 = ''
Table Description automatically generated
Table Description automatically generated

You can see the result – we created a result set of all records where the AddressLine2 is null, and then applied an update to that specific result set. Again, this is a simplistic example meant to demonstrate the capabilities of the statement.

CTE with DELETE

In addition to selecting, inserting, and updating data, you can use a CTE to delete data from a table.

DELETEs requires that you have only used one table in your CTE, so you can't use any join statements, or you will get an error.

Just like we did the update, let's create another backup table to delete records from:

SELECT * 
INTO dbo.DimCustomerBak 
FROM DimCustomer

Now, we can delete from this table using a CTE. In this example, we will delete all records where the AddressLine2 is null.

WITH MissingAddress_CTE
AS (
   SELECT *
   FROM dbo.DimCustomerBak
   WHERE AddressLine2 IS NULL
   )
DELETE
FROM MissingAddress_CTE

When we select from the table again, you can see there are no longer any records with NULL in the AddressLine2 column.

Graphical user interface, table Description automatically generated with medium confidence

Multiple CTEs in a Single Query

Finally, you are not limited to just writing a single CTE in a query. You can write as many as you want and separate them with a comma:

WITH Sales_Cust_Join_CTE
AS (
   SELECT fs.OrderDateKey
      ,fs.ProductKey
      ,fs.OrderQuantity * fs.UnitPrice AS TotalSale
      ,dc.FirstName
      ,dc.LastName
   FROM [dbo].[FactInternetSales] fs
   INNER JOIN [dbo].[DimCustomer] dc ON dc.CustomerKey = fs.CustomerKey
   )
   ,Date_CTE
AS (
   SELECT DateKey
      ,CalendarYear
   FROM [dbo].[DimDate]
   )
SELECT CalendarYear
   ,ProductKey
   ,SUM(TotalSale) AS TotalSales
FROM Sales_Cust_Join_CTE
INNER JOIN Date_CTE ON Date_CTE.DateKey = Sales_Cust_Join_CTE.OrderDateKey
GROUP BY CalendarYear
   ,ProductKey
ORDER BY CalendarYear ASC
   ,TotalSales DESC

In this example, we create two CTEs with some joins, logic, and selecting only a few specific columns. In our final operation, we join the CTEs together and form an aggregate to show total sales for each product by calendar year.

Table Description automatically generated

A word of caution, if you find yourself with many CTEs, the code can become more challenging to read. So, while they can be a great tool, try not to go overboard with them.

When to use CTEs, and When not to

Though CTEs are a great way to write cleaner code, they should not be used in every scenario. For example, one major limitation is that a CTE is limited to the scope of a single execution. What that means in practice is that if you need to do more than one operation with this result set, you need to declare the CTE multiple times. This is not performant because we are doing the work of the CTE multiple times.

In this scenario it would probably be better to use a temporary table. Temporary tables can be used across multiple queries, and they can be indexed and store column statistics.

Generally speaking, if you need to create a named result set that will be used by multiple subsequent queries and the data is very large, you are better off using a temporary table. If your data is relatively small, and you only need the result set for the context of a single query, you are best off using a CTE as there is less overhead involved.

Read more about temp tables in this article Types of SQL Server Temporary Tables.

If you are familiar with subqueries, you might be curious what the difference is between a subquery and a CTE. CTEs can be recursive CTEs, whereas subqueries cannot. However, from a performance perspective, there is no real difference. The only real difference is readability, and I would always recommend using a CTE over a subquery over a CTE.

Let's take a look at what I mean:

WITH Simple_CTEAS (
   SELECT dd.CalendarYear
      ,fs.OrderDateKey
      ,fs.ProductKey
      ,fs.OrderQuantity * fs.UnitPrice AS TotalSale
      ,dc.FirstName
      ,dc.LastName
   FROM [dbo].[FactInternetSales] fs
   INNER JOIN [dbo].[DimCustomer] dc ON dc.CustomerKey = fs.CustomerKey
   INNER JOIN [dbo].[DimDate] dd ON dd.DateKey = fs.OrderDateKey
   )
SELECT *
FROM Simple_CTE
SELECT *
FROM (
   SELECT dd.CalendarYear
      ,fs.OrderDateKey
      ,fs.ProductKey
      ,fs.OrderQuantity * fs.UnitPrice AS TotalSale
      ,dc.FirstName
      ,dc.LastName
   FROM [dbo].[FactInternetSales] fs
   INNER JOIN [dbo].[DimCustomer] dc ON dc.CustomerKey = fs.CustomerKey
   INNER JOIN [dbo].[DimDate] dd ON dd.DateKey = fs.OrderDateKey
   ) Subquery
 

These two queries return the same data, but which is easier to read? The CTE query allows us to logically arrive at the steps we took to get to our result, whereas the subquery feels backwards and difficult to read.

Next Steps





get scripts

next tip button



About the author
MSSQLTips author Ryan Kennedy Ryan Kennedy is a Solutions Architect for Databricks, specializing in helping clients build modern data platforms in the cloud that drive business results.

View all my tips


Article Last Updated: 2021-05-13

Comments For This Article




Monday, May 17, 2021 - 12:26:12 PM - Hristo Hristov Back To Top (88693)
Great tip and refresher on CTEs!


download














get free sql tips
agree to terms