How to Expand a Range of Dates into Rows with the SQL Server Function GENERATE_SERIES

By:   |   Updated: 2022-09-01   |   Comments (3)   |   Related: More > SQL Server 2022


Problem

A common use case in business intelligence queries is to expand a range of dates from the source system into a full set of rows, with one row per date. The result is easier to query as it can be linked directly to a date dimension. How can we implement this solution?

Solution

There's already a tip explaining how you can implement this: How to Expand a Range of Dates into Rows using a SQL Server Numbers Table. In that article, a solution with a "numbers table" (sometimes referred to as a "tally table") is discussed to expand the rows. In this tip, we'll introduce you to an alternative solution using the GENERATE_SERIES function. This function is presented as part of the SQL Server 2022 preview.

SQL Server 2022 Preview

At the time of writing this tip, SQL Server 2022 is still in preview (currently, CTP 2.1 has been released). This means functionality or features of SQL Server might change, disappear, or be added in the final release.

How to Use GENERATE_SERIES

The GENERATE_SERIES function takes three arguments:

  • Start: this is the first numerical value of the interval. It can be any expression that results in a numeric data type.
  • Stop: this is the last value of the interval.
  • Step: this optional parameter indicates the number of values to increment (or decrement). The default is 1 is start < stop, and -1 if start > stop.

The output of the function is a result set with one single column named value, which contains all the numeric values of the interval defined by the three parameters.

Some examples:

SELECT *
FROM GENERATE_SERIES(1,10);
GENERATE_SERIES(1,10)
SELECT *
FROM GENERATE_SERIES(10,1);
GENERATE_SERIES(10,1)
SELECT *
FROM GENERATE_SERIES(0,10,3);
GENERATE_SERIES(0,10,3)

As you can see in the last example, the stop value isn't necessarily included in the returned interval. If we want to use the function to generate dates, we can do this with DATEADD:

SELECT DATEADD(DAY,value,'2022-01-01')
FROM GENERATE_SERIES(0,364,1);
dateadd GENERATE_SERIES(0,364,1)

For more information, you can check out the tip, My Favorite T-SQL Enhancements in SQL Server 2022, by Aaron Bertrand. His tip mentions a performance issue because the function is not order-preserving, but this has been fixed in CTP 2.1. When you add an ORDER BY on value, there's no sort operator in the plan:

no sort operator in the plan anymore

Keep in mind the syntax has also changed a bit. Explicitly adding START and STOP before the parameter values will result in a syntax error:

syntax error

Expanding Data Ranges with GENERATE_SERIES

Now we know how to use the new GENERATE_SERIES function, we can use it to solve our use case. Let's use the same sample data as in the previous tip:

CREATE TABLE dbo.EmployeeHoliday
(  EmployeeID VARCHAR(10) NOT NULL
  ,StartDate DATE NOT NULL
  ,EndDate DATE NOT NULL);
 
INSERT INTO dbo.EmployeeHoliday
SELECT [EmployeeID] = 'A', [StartDate] = '2020-06-01', [EndDate] = '2020-06-05'
UNION ALL
SELECT 'B', '2020-12-15', '2020-12-31';
sample data

As a reminder, here's the desired result:

exploded date range

To expand the data range, we're generating an interval of 1,000 numbers and turning those into dates, as we've seen in the previous paragraph. Finally, we join this against our sample data. The T-SQL statement becomes:

WITH DateRange AS
(
    SELECT ExplodedDate = DATEADD(DAY,value,'2020-01-01')
    FROM GENERATE_SERIES(0,999)
)
SELECT *
FROM dbo.EmployeeHoliday eh
JOIN DateRange d ON d.ExplodedDate >= eh.[StartDate]
                AND d.ExplodedDate <= eh.[EndDate];

It's clear to see the code has become much shorter. The execution plan is also clearer:

execution plan of GENERATE_SERIES

As a reference, here's the execution plan of the query using common table expressions to generate a numbers table on the fly:

execution plan of old method

This doesn't speak directly about performance, mind you, but using GENERATE_SERIES will make plans that use a numbers table easier to read.

Probably most important, we get our desired result set:

result set with generate_series

We can now join this data set with a date dimension, for example, using the ExplodedDate column. This would allow for more straightforward time-intelligence queries. Let's illustrate with an example. Suppose we want to find all holiday periods that include a specific week. If your data dimension has a column for weeks, you can quickly filter on that column after we've joined the dimension against our expanded result set.

The tip Creating a date dimension or calendar table in SQL Server explains how you can quickly create a date dimension or calendar table. Once we have it, we can join it against the query we used earlier:

WITH DateRange AS
(
    SELECT ExplodedDate = DATEADD(DAY,value,'2020-01-01')
    FROM GENERATE_SERIES(0,999)
)
SELECT DISTINCT eh.EmployeeID, eh.StartDate, eh.EndDate
FROM dbo.EmployeeHoliday eh
JOIN DateRange d ON d.ExplodedDate >= eh.[StartDate]
                AND d.ExplodedDate <= eh.[EndDate]
JOIN dbo.DateDimension dd ON d.ExplodedDate = dd.TheDate
WHERE dd.TheISOweek = 23;

Here we're looking for holiday periods with at least one day of ISO week #23.

The result:

result when searching for a specific week

A DISTINCT clause is used to get only one single row: the holiday period we wished to find.

Conclusion

The new function GENERATE_SERIES will simplify any query that needs an interval of numbers. The function is easy to use and simplifies execution plans that use such a numbers table. There were some performance issues in the initial version, but these seem to be solved in the CTP 2.1 release.

Next Steps





get scripts

next tip button



About the author
MSSQLTips author Koen Verbeeck Koen Verbeeck is a BI professional, specializing in the Microsoft BI stack with a particular love for SSIS.

View all my tips


Article Last Updated: 2022-09-01

Comments For This Article




Tuesday, October 4, 2022 - 11:02:57 PM - Jeff Moden Back To Top (90562)
Yes, I believe it'll matter a lot. Even with just two entries, it looks it'll generate only 22 rows instead of 1000*2 or 2000 rows. Try the following code (again, I don't have 2022 available in any form so I'm winging it a bit).

SELECT eh.*,ExplodedDate = DATEADD(dd,t.N,eh.StartDate)
FROM #EmployeeHoliday eh
CROSS APPLY GENERATE_SERIES(0,DATEDIFF(dd,eh.StartDate,eh.EndDate))t

The code is a bit simpler, to boot.l


Thursday, September 8, 2022 - 4:18:47 AM - Koen Verbeeck Back To Top (90452)
Hi Jeff,

thanks for reading!
Sure, you can limit the number of rows generated using DATEDIFF for example. Will it be much better? For this really really small data set it probably doesn't matter, but for large data sets it most likely will. Calculating the start and end of the range are extra queries that need to run though, but then again, working with hardcoded values typically isn't a good idea :)

Regarding SQL Server 2022, I haven't installed the evaluation version on my machine. I use a VM image that is pre-built in Azure. If you have some Azure credits it's an easy way to play with SQL Server 2022.

Regards,
Koen

Wednesday, September 7, 2022 - 10:32:46 AM - Jeff Moden Back To Top (90449)
Thanks for your time in writing this article up, Koen.

I don't have 2022 to play with (I don't load "Evaluation Versions") but wouldn't it be much better to calculate the number of dates needed and only generate those that generating 999 of them and then selecting from them? In other words, the new function can take an expression like a DATEDIFF() for its second operand, correct?














get free sql tips
agree to terms