How to Expand a Range of Dates into Rows with the SQL Server Function GENERATE_SERIES
A common use case in business intelligence queries is to expand a range of dates from the source system into a full set of rows, with one row per date. The result is easier to query as it can be linked directly to a date dimension. How can we implement this solution?
There's already a tip explaining how you can implement this: How to Expand a Range of Dates into Rows using a SQL Server Numbers Table. In that article, a solution with a "numbers table" (sometimes referred to as a "tally table") is discussed to expand the rows. In this tip, we'll introduce you to an alternative solution using the GENERATE_SERIES function. This function is presented as part of the SQL Server 2022 preview.
SQL Server 2022 Preview
At the time of writing this tip, SQL Server 2022 is still in preview (currently, CTP 2.1 has been released). This means functionality or features of SQL Server might change, disappear, or be added in the final release.
How to Use GENERATE_SERIES
The GENERATE_SERIES function takes three arguments:
- Start: this is the first numerical value of the interval. It can be any expression that results in a numeric data type.
- Stop: this is the last value of the interval.
- Step: this optional parameter indicates the number of values to increment (or decrement). The default is 1 is start < stop, and -1 if start > stop.
The output of the function is a result set with one single column named value, which contains all the numeric values of the interval defined by the three parameters.
SELECT * FROM GENERATE_SERIES(1,10);
SELECT * FROM GENERATE_SERIES(10,1);
SELECT * FROM GENERATE_SERIES(0,10,3);
As you can see in the last example, the stop value isn't necessarily included in the returned interval. If we want to use the function to generate dates, we can do this with DATEADD:
SELECT DATEADD(DAY,value,'2022-01-01') FROM GENERATE_SERIES(0,364,1);
For more information, you can check out the tip, My Favorite T-SQL Enhancements in SQL Server 2022, by Aaron Bertrand. His tip mentions a performance issue because the function is not order-preserving, but this has been fixed in CTP 2.1. When you add an ORDER BY on value, there's no sort operator in the plan:
Keep in mind the syntax has also changed a bit. Explicitly adding START and STOP before the parameter values will result in a syntax error:
Expanding Data Ranges with GENERATE_SERIES
Now we know how to use the new GENERATE_SERIES function, we can use it to solve our use case. Let's use the same sample data as in the previous tip:
CREATE TABLE dbo.EmployeeHoliday ( EmployeeID VARCHAR(10) NOT NULL ,StartDate DATE NOT NULL ,EndDate DATE NOT NULL); INSERT INTO dbo.EmployeeHoliday SELECT [EmployeeID] = 'A', [StartDate] = '2020-06-01', [EndDate] = '2020-06-05' UNION ALL SELECT 'B', '2020-12-15', '2020-12-31';
As a reminder, here's the desired result:
To expand the data range, we're generating an interval of 1,000 numbers and turning those into dates, as we've seen in the previous paragraph. Finally, we join this against our sample data. The T-SQL statement becomes:
WITH DateRange AS ( SELECT ExplodedDate = DATEADD(DAY,value,'2020-01-01') FROM GENERATE_SERIES(0,999) ) SELECT * FROM dbo.EmployeeHoliday eh JOIN DateRange d ON d.ExplodedDate >= eh.[StartDate] AND d.ExplodedDate <= eh.[EndDate];
It's clear to see the code has become much shorter. The execution plan is also clearer:
As a reference, here's the execution plan of the query using common table expressions to generate a numbers table on the fly:
This doesn't speak directly about performance, mind you, but using GENERATE_SERIES will make plans that use a numbers table easier to read.
Probably most important, we get our desired result set:
We can now join this data set with a date dimension, for example, using the ExplodedDate column. This would allow for more straightforward time-intelligence queries. Let's illustrate with an example. Suppose we want to find all holiday periods that include a specific week. If your data dimension has a column for weeks, you can quickly filter on that column after we've joined the dimension against our expanded result set.
The tip Creating a date dimension or calendar table in SQL Server explains how you can quickly create a date dimension or calendar table. Once we have it, we can join it against the query we used earlier:
WITH DateRange AS ( SELECT ExplodedDate = DATEADD(DAY,value,'2020-01-01') FROM GENERATE_SERIES(0,999) ) SELECT DISTINCT eh.EmployeeID, eh.StartDate, eh.EndDate FROM dbo.EmployeeHoliday eh JOIN DateRange d ON d.ExplodedDate >= eh.[StartDate] AND d.ExplodedDate <= eh.[EndDate] JOIN dbo.DateDimension dd ON d.ExplodedDate = dd.TheDate WHERE dd.TheISOweek = 23;
Here we're looking for holiday periods with at least one day of ISO week #23.
A DISTINCT clause is used to get only one single row: the holiday period we wished to find.
The new function GENERATE_SERIES will simplify any query that needs an interval of numbers. The function is easy to use and simplifies execution plans that use such a numbers table. There were some performance issues in the initial version, but these seem to be solved in the CTP 2.1 release.
- Stay tuned for more SQL Server 2022 tips on the MSSQLTips website!
- If you want to try it out, you can create a pre-configured virtual machine in Azure. You can start a free trial here. More info about the SQL Server 2022 release can be found here.
- You can find more SQL Server 2022 tips in this overview.
- Erik Darling has blogged about some problems of GENERATE_SERIES here and here. Some of the issues are resolved in the latest preview.
About the author
View all my tips
Article Last Updated: 2022-09-01