How to Expand a Range of Dates into Rows using a SQL Server Numbers Table


By:   |   Updated: 2020-07-15   |   Comments (2)   |   Related: More > Dates

Problem

I have a common use case where I need to convert a date range into a set of rows in a SQL Server table. For example, I have a record saying an employee took a holiday from 2020-08-01 till 2020-08-20. This single row needs to be transformed into 20 rows, one for each day of vacation. How can I do this in a fast and scalable manner, since my tables contains thousands of records which might result in millions of rows in output?

Solution

Converting date ranges (or any other type of range) into a set of rows is a common requirement. For example, the source system supplies rows with a start and end date for a specific interval, but in your database, you need a transactional table with a row for each day, since this simplifies calculations. You could for example filter out weekends and holidays much easier than when you are using just the start and end date of the interval.

Often, the term "exploding the table" is used, since a small set of ranges can result in a huge output of rows. If you have one rows with 2020-01-01 as the start date and 2020-12-31 as end date, this would already result in 366 rows. Imagine you need to perform a similar calculation for millions of customers. Since the output can be large, it's important the solution is fast and scalable. This rules out loops and cursors in T-SQL, since they are row-based and are not suited for large volumes of rows. Other terminology includes "starbursting" or "unpacking a relation on a time interval".

In this tip, a solution is presented using a "numbers table", sometimes also called a "tally table". For more background information about this type of table, check out these excellent tips by Aaron Bertrand:

The tip SQL Server Function to return a range of dates does something similar, but uses a recursive CTE which is not scalable and also has a limit on the maximum number of recursions.

Sample Data

With the following SQL statement, we can create a simple table for holding the sample data:

CREATE TABLE dbo.EmployeeHoliday
(EmployeeID VARCHAR(10) NOT NULL
,StartDate DATE NOT NULL
,EndDate DATE NOT NULL);

Let's insert 2 rows into that table. Two employees taking a holiday, both with start and end date.

INSERT INTO dbo.EmployeeHoliday
SELECT [EmployeeID] = 'A', [StartDate] = '2020-06-01', [EndDate] = '2020-06-05'
UNION ALL
SELECT 'B','2020-12-15','2020-12-31';
sample data

This is the desired end result, one row for each day in the range of the start and end date:

exploded date range

Creating the SQL Server Numbers Table

There are different methods of creating a numbers table, such as cross joining two large sets together (take a look at Aaron's tips if you're interested in examples), but you can also create a "virtual" table by using common table expressions (CTE). The following syntax is coined by Itzik Ben-Gan, a T-SQL guru:

WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1)
    ,E02(N) AS (SELECT 1 FROM E00 a, E00 b)
    ,E04(N) AS (SELECT 1 FROM E02 a, E02 b)
    ,E08(N) AS (SELECT 1 FROM E04 a, E04 b)
    ,E16(N) AS (SELECT 1 FROM E08 a, E08 b)
    ,E32(N) AS (SELECT 1 FROM E16 a, E16 b)
    ,cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E32)
SELECT N
FROM cteTally
WHERE N <= 10000;

In the first CTE E00, two rows are joined together. In the next CTE E02, this first CTE is cross joined to itself. This gives us 4 rows. This process is repeated a couple of times. The last CTE E32 will return 2^32 rows which is also the highest number an integer can hold in SQL Server. But this CTE only returns rows with the value 1.

only 1 is returned

Using the window function ROW_NUMBER, we can assign numbers to each row. For the ORDER BY in the OVER clause, we use the subquery (SELECT NULL). This tricks SQL Server into not sorting the dataset. If it would, it could be a serious performance issue. Running the entire SQL statement returns a unique sequential number for each row, starting with the number 1:

numbers table

Generating a one million row table takes just a couple of seconds on my machine. But in this use case we need dates, not numbers. We can transform the numbers into dates by using the DATEADD function. The SQL statement now becomes:

WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1)
    ,E02(N) AS (SELECT 1 FROM E00 a, E00 b)
    ,E04(N) AS (SELECT 1 FROM E02 a, E02 b)
    ,E08(N) AS (SELECT 1 FROM E04 a, E04 b)
    ,E16(N) AS (SELECT 1 FROM E08 a, E08 b)
    ,E32(N) AS (SELECT 1 FROM E16 a, E16 b)
    ,cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E32)
SELECT ExplodedDate = DATEADD(DAY,N - 1,'2020-01-01')
FROM cteTally
WHERE N <= 366

This will generate all of the dates of the year 2020 (keep in mind it's a leap year).

2020 dates

Exploding the Range in the SQL Server Numbers Table

Using the numbers table (or dates table more accurately), we can finally "explode" our sample data. We just need to join the tally table to the sample table using a range join.

WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1)
    ,E02(N) AS (SELECT 1 FROM E00 a, E00 b)
    ,E04(N) AS (SELECT 1 FROM E02 a, E02 b)
    ,E08(N) AS (SELECT 1 FROM E04 a, E04 b)
    ,E16(N) AS (SELECT 1 FROM E08 a, E08 b)
    ,E32(N) AS (SELECT 1 FROM E16 a, E16 b)
    ,cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E32)
    ,DateRange AS
(
    SELECT ExplodedDate = DATEADD(DAY,N - 1,'2020-01-01')
    FROM cteTally
    WHERE N <= 366
)
SELECT *
FROM dbo.EmployeeHoliday eh
JOIN DateRange d ON d.ExplodedDate >= eh.[StartDate]
 AND d.ExplodedDate <= eh.[EndDate];

This gives us the desired result set:

exploded sample table

Starting with 2020-01-01 and just taking 366 days is quite limited for a generated date table. You can expand these limits though and this shouldn't affect performance too badly. Let's take 2000-01-01 as starting point and generate 100 years of dates (roughly 365,000 rows):

one century of dates

The query now finishes just under 3 seconds.

Test with More Data

Let's do the same use case, but now using the Fact Internet Sales table of the AdventureWorks 2017 data warehouse. We'll explode the dates between the order date and the due date. The fact table contains 60,398 rows. The exploded result set will contain 785,174 rows. The adapted SQL query:

WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1)
    ,E02(N) AS (SELECT 1 FROM E00 a, E00 b)
    ,E04(N) AS (SELECT 1 FROM E02 a, E02 b)
    ,E08(N) AS (SELECT 1 FROM E04 a, E04 b)
    ,E16(N) AS (SELECT 1 FROM E08 a, E08 b)
    ,E32(N) AS (SELECT 1 FROM E16 a, E16 b)
    ,cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E32)
    ,DateRange AS
(
    SELECT ExplodedDate = DATEADD(DAY,N - 1,'2010-01-01')
    FROM cteTally
    WHERE N <= 5000
)
SELECT f.[SalesOrderNumber], f.[SalesOrderLineNumber], [f].[OrderDate], [f].[DueDate], d.[ExplodedDate]
INTO #Test
FROM [AdventureWorksDW2017].[dbo].[FactInternetSales] f
JOIN DateRange d ON d.ExplodedDate >= f.[OrderDate]
 AND d.ExplodedDate <= f.[DueDate];
 
SELECT COUNT(1) FROM #test;

In this test I'm not returning the rows to SSMS, because displaying all the rows in the grid would take too much time. Rather I'm inserting the data into a temp table and then returning the row count from this temp table. Exploding the Internet Sales Fact table took 20 seconds on my machine:

bigger test

Tip Update - Alternative Approach

In the comments, Jeff Moden suggested an alternate way of solving the problem. Instead of hard-coding the limit to 5,000 rows, the limit is dynamically calculated using the DATEDIFF of the order and due date. This value is applied to a TOP clause, selecting only the number of rows of the tally table that are actually needed. The resulting rows are then joined to the fact table using CROSS APPLY, effectively exploding the data range.

Hereís the adapted code:

WITH
     H1(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) H0(N))
    ,cteTALLY(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM H1 a, H1 b, H1 c, H1 d, H1 e, H1 f, H1 g, H1 h)
SELECT
     f.SalesOrderNumber
    ,f.SalesOrderLineNumber
    ,f.OrderDate
    ,f.DueDate
    ,ExplodedDate = DATEADD(dd,t.N-1,f.OrderDate)
INTO #Test
FROM AdventureWorksDW2017.dbo.FactInternetSales f
CROSS APPLY (
    SELECT TOP (DATEDIFF(dd,f.OrderDate,f.DueDate) + 1)
        N
    FROM cteTally
    ORDER BY N
    ) t;

Letís say for example that one particular row of the Internet Sales Fact table, the number of days between is 5 (which means we need 6 exploded rows, since we need to include the first date). This would result in the following subquery:

SELECT TOP 6
 N
FROM cteTally
ORDER BY N

This would return 6 rows of the tally table (1,2,3,4,5,6). Using CROSS APPLY, these 6 rows are joined to the fact table, repeating the original row of the fact table 6 times, but for each row adding (N-1) days to the order date.

The solution is more efficient, because thereís no work table created in the tempdb database and the tally table isnít materialized in memory. For more information, check out the comments.

Conclusion

In this tip we saw how you can explode a date range using a "numbers" table. It's a common scenario for time series analysis for example. The code can be optimized by using a persisted date table instead of generating one on the fly. The purpose of this tip however was to show you the techniques to quickly generate a large numbers table to improve your T-SQL code by using set-based logic.

Next Steps


Last Updated: 2020-07-15


get scripts

next tip button



About the author
MSSQLTips author Koen Verbeeck Koen Verbeeck is a BI professional, specializing in the Microsoft BI stack with a particular love for SSIS.

View all my tips





Comments For This Article




Wednesday, July 29, 2020 - 3:30:08 PM - Koen Verbeeck Back To Top

Hi Jeff,

long time no see :)
Thank you for your valuable input! I'm glad you took the time to read my article and to write such comprehensive feedback.
To be honest, I hadn't seen the CROSS APPLY trick before, so I'm happy to learn something new. I grabbed my copy of Itzik's book on Window Functions and he does use the TOP clause in the section about the tally table, but not in combination with the CROSS APPLY.

Anyway, your query does run faster. It's weird though that when you run my query and yours in the same batch, the actual execution plan will say my query is about 7% of cost of the total batch, while yours is 93%. Strange.

I've sent an update to the editors to include your query in the article with a bit of an explanation on how it works. For the nitty gritty detail on performance, I reference your comment here. The tip will be updated soon.

Again, thanks for your input and take care!
Koen


Wednesday, July 15, 2020 - 5:43:24 PM - Jeff Moden Back To Top

I couldn't tell if this reply submitted previously and so I'm resubmitting it.  If it's a dupe to my previous submital, please feel free to ignore it.

First, I have to say that I greatly appreciate anyone that takes the time to put an article together to share knowledge and so thank you very much for that, Koen.

Your suggestion of using Itzik Ben-Gans cCTEs (Cascading CTEs not to be confused with rCTEs or Recursive CTEs) is spot on but there are a couple of implemenation problems in your final example.  Let's first see what the problems are...

If we wrap your code in SET STATISTICS but also include IO, we get a hint of the first problem...

--===== If the test table already exists, drop it to make reruns in SSMS easier.
   
DROP TABLE IF EXISTS #Test
;
--===== Run the test code measuring time AND IO
    
SET STATISTICS TIME,IO ON --Added this.
;
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1)
    
,E02(N) AS (SELECT 1 FROM E00 a, E00 b)
    
,E04(N) AS (SELECT 1 FROM E02 a, E02 b)
    
,E08(N) AS (SELECT 1 FROM E04 a, E04 b)
    
,E16(N) AS (SELECT 1 FROM E08 a, E08 b)
    
,E32(N) AS (SELECT 1 FROM E16 a, E16 b)
    
,cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E32)
    
,DateRange AS
(
    
SELECT ExplodedDate = DATEADD(DAY,- 1,'2010-01-01')
    
FROM cteTally
    
WHERE N <= 5000
) --LOOK at the execution plan and see the "Eager Spool" that internally creates 226,159,403 rows!
SELECT f.SalesOrderNumber, f.SalesOrderLineNumber, f.OrderDate, f.DueDate, d.ExplodedDate
INTO #Test
FROM AdventureWorksDW2017.dbo.FactInternetSales f
JOIN DateRange d ON d.ExplodedDate >= f.OrderDate
 
AND d.ExplodedDate <= f.DueDate
;
    
SET STATISTICS TIME,IO OFF --Added this
;
Here are the results from that run...

Table 'Worktable'. Scan count 5000, logical reads 1543429, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'FactInternetSales'. Scan count 1, logical reads 1249, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 34359 ms,  elapsed time = 34368 ms.

(785174 rows affected)

If you look at the execution plan, you find out that the "Worktable" is from an "Eager Spool", which is in TempDB.  It creates an "actual" row count of over 226 Million rows, which you predicted, but we don't actually need to materialize those rows. Also, if we divide the 1543429 logical reads by 128 to convert to MegaBytes read, we find that over 12,058 MegaBytes (more than 12 GigaBytes) had to be read from memory.  As fast as memory is, that's an awful lot of unnecessary memory IO.

The other problems include the facts that your method requires prior esoteric knowledge of what the lowest date in the table might be and the fact that you've included a limit.  5,000 days is only about 13.68 years and there are plenty of date spans that will easily outstripe that.  The big problem with both is that you don't necessarily know what the date spans will be nor what the lowest starting date is.

We can solve all of that.  The key is to not materialize the 5000 rows for cteTally.  In the following, I also use a "Base 16" version of Itzik's fine "Base 2" code just to make the code a whole lot shorter and easier to remember.  That's NOT where the speed comes from, though.  The speed comes from the use of TOP in a CROSS APPLY, which is the preferred implimenation method for Itzik's cCTEs (as you'll see as soon as you run the following code)...

--===== If the test table already exists, drop it to make reruns in SSMS easier.
  
DROP TABLE IF EXISTS #Test
;
--===== Run the test code measuring time AND IO
    
SET STATISTICS TIME,IO ON
;
       
WITH
       H1
(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))H0(N))
,cteTALLY(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM H1 a, H1 b, H1 c, H1 d, H1 e, H1 f, H1 g, H1 h)
 
SELECT f.SalesOrderNumber, f.SalesOrderLineNumber, f.OrderDate, f.DueDate
        
,ExplodedDate = DATEADD(dd,t.N-1,f.OrderDate)
   
INTO #Test
   
FROM AdventureWorksDW2017.dbo.FactInternetSales f
  
CROSS APPLY (SELECT TOP (DATEDIFF(dd,f.OrderDate,f.DueDate)+1) N FROM cteTally ORDER BY N) t
;
    
SET STATISTICS TIME,IO OFF;

Here are the results from that run... yes... it actually did run. :D
Table 'FactInternetSales'. Scan count 9, logical reads 1315, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 452 ms,  elapsed time = 82 ms.

(785174 rows affected)

If you want to take it one step further, create an iTVF (Inline Table Valued Function) and your code ends up looking as simple as the following...

--===== If the test table already exists, drop it to make reruns in SSMS easier.
   
DROP TABLE IF EXISTS #Test
;
--===== Run the test code measuring time AND IO
    
SET STATISTICS TIME,IO ON 
;
 
SELECT f.SalesOrderNumber, f.SalesOrderLineNumber, f.OrderDate, f.DueDate
        
,ExplodedDate = DATEADD(dd,t.N,f.SalesOrderLineNumber)
   
INTO #Test
   
FROM AdventureWorksDW2017.dbo.FactInternetSales f
  
CROSS APPLY dbo.fnTally(0,DATEDIFF(dd,f.OrderDate,f.DueDate)) t
;
    
SET STATISTICS TIME,IO OFF 
;

You can get the function from the following link...

https://www.sqlservercentral.com/scripts/create-a-tally-function-fntally



download


Recommended Reading

Date and Time Conversions Using SQL Server

Format SQL Server Dates with FORMAT Function

Add and Subtract Dates using DATEADD in SQL Server

Creating a date dimension or calendar table in SQL Server

SQL Server Date and Time Functions with Examples





get free sql tips
agree to terms


Learn more about SQL Server tools