Calendar Table in SQL Server to Identify Date Gaps and Islands


By:   |   Updated: 2021-04-28   |   Comments   |   Related: 1 | 2 | 3 | 4 | 5 | More > Dates


Problem

In my previous article I revisited the concept of a calendar table, and explained some ways to use this data for business date calculations. This time, I wanted to explore how you can use the calendar table to simplify generating date ranges, and some query challenges this can help you simplify.

Solution

Recall that our calendar table has a key column called TheDate:

CREATE TABLE dbo.Calendar
(
TheDate date NOT NULL,

CONSTRAINT PK_Calendar PRIMARY KEY (TheDate),

To generate a series of dates from that table, we can just use a closed-ended range, as follows:

DECLARE @Start date = '20200101', @End date = '20200105';
SELECT TheDate 
FROM dbo.Calendar
WHERE TheDate >= @Start
AND TheDate <= @End;

Results:

TheDate
----------
2020-01-01
2020-01-02
2020-01-03
2020-01-04
2020-01-05

And the plan for that query is about as simple as you could imagine:

Clustered index seek for date range query against calendar table

For convenience, though, we can wrap this in an inline table-valued function:

CREATE FUNCTION dbo.GenerateDateSeries 
(
@StartDate date,
@EndDate date
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT TheDate
FROM dbo.Calendar
WHERE TheDate >= @StartDate
AND TheDate <= @EndDate
);
GO

Now the same query is easier to generate (on its own, or to combine with other queries):

DECLARE @Start date = '20200101', @End date = '20200105';
SELECT TheDate 
FROM dbo.GenerateDateSeries (@Start, @End);

Same results, same plan, same performance.

But how else can we use this function?

Filling Date Gaps

One of the more popular reasons for generating a date range is to fill gaps in reports. Let’s create a transactional table in a database called SalesStuff and populate it with fictitious data that contains intentional gaps:

USE SalesStuff;
GO DROP TABLE IF EXISTS dbo.Transactions;
GO CREATE TABLE dbo.Transactions
(
KeyColumn int NOT NULL,
EventTime datetime2(3),
CONSTRAINT PK_Transactions PRIMARY KEY(KeyColumn),
INDEX IX_Transactions_EventTime(EventTime)
);
GO ;WITH x(n) AS
(
SELECT TOP (40000) ROW_NUMBER() OVER (ORDER BY o.object_id)
FROM master.sys.all_objects AS o
CROSS JOIN master.sys.all_objects AS o2
)
INSERT dbo.Transactions(KeyColumn, EventTime)
SELECT TOP (10000) n, TheDate = DATEADD(HOUR, n*2, '20200101')
FROM x
WHERE ((n+1)/10) % 3 = 0
ORDER BY n;

If we just manually look at the transaction table for the first 8 days, we see that there are some days with rows missing:

SELECT KeyColumn, EventTime
FROM dbo.Transactions
WHERE EventTime < '20200109';

Abridged results:

KeyColumn   EventTime
--------- ----------------
1 2020-01-01 02:00
… 8 2020-01-01 16:00
29 2020-01-03 10:00
… 35 2020-01-03 22:00
36 2020-01-04 00:00
37 2020-01-04 02:00
38 2020-01-04 04:00
59 2020-01-05 22:00
60 2020-01-06 00:00
… 67 2020-01-06 14:00
68 2020-01-06 16:00
89 2020-01-08 10:00

You can see that there is no data for January 2nd or January 7th. But if someone were to run a simple report to get daily sales for those 8 days:

DECLARE @Start date = '20200101', @End date = '20200108';
SELECT TheDate = CONVERT(date, EventDateTime), TransactionCount = COUNT(*)
FROM SalesStuff.dbo.Transactions
WHERE EventDateTime >= @Start
AND EventDateTime < DATEADD(DAY, 1, @End)
GROUP BY CONVERT(date, EventDateTime)
ORDER BY TheDate;

The results will only have 6 rows, because days cannot be represented by that query if they are absent:

TheDate      TransactionCount
---------- ----------------
2020-01-01 8
2020-01-03 7
2020-01-04 3
2020-01-05 1
2020-01-06 9
2020-01-08 7

Here is the plan for that query, just for reference. Note the sort operator; even though the index seek returns the EventTime column in order, the convert makes a subsequent sort necessary:

Query plan for simple aggregate query

If we want all the days represented even if there were no sales, we need to find some other way to fill the gaps. This is where a date series generator can come in handy – we can pull all the dates from the date series function, then perform a left join with the transactional data.

DECLARE @Start date = '20200101', @End date = '20200108';
SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn) 
FROM dbo.GenerateDateSeries(@Start, @End) AS f
LEFT OUTER JOIN dbo.Transactions AS t
ON t.EventTime >= f.TheDate
AND t.EventTime < DATEADD(DAY, 1, f.TheDate)
GROUP BY f.TheDate
ORDER BY f.TheDate;

Results show all 8 rows, where absent dates are represented by 0:

TheDate      TransactionCount
---------- ----------------
2020-01-01 8
2020-01-02 0
2020-01-03 7
2020-01-04 3
2020-01-05 1
2020-01-06 9
2020-01-07 0
2020-01-08 7

The plan here is understandably more complex and has a higher estimated subtree cost, but this is still going to be fairly efficient – and note there is no sort operator anymore:

Query plan for function joined to transactional data

The costs change very little for narrow or wide date ranges, but overall costs will likely vary based on size, statistics, and distribution of the transactional table data.

You can also generate more granular series using a single day or range of days, without any additional objects, making it easy to generate hourly reporting with no gaps. Let’s say, for the first day in January, we want a report that has a row for every half hour throughout the day. We can first generate 48 row numbers from the calendar table, then multiply each row number by 30 to get the next 30-minute interval.

DECLARE @minutes int = 30;
DECLARE @Start date = '20200101', @End date = '20200103';
;WITH intervals AS
(
SELECT TOP (24*60/@minutes) rn = ROW_NUMBER() OVER (ORDER BY TheDate)
FROM Calendar.dbo.Calendar
),
ranges AS
(
SELECT TheWindow = DATEADD(MINUTE, (i.rn-1)*@minutes, CONVERT(datetime2(3), f.TheDate))
FROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS f
CROSS JOIN intervals AS i
)
SELECT r.TheWindow, COUNT(t.KeyColumn)
FROM ranges AS r
LEFT OUTER JOIN SalesStuff.dbo.Transactions AS t
ON t.EventTime >= r.TheWindow
AND t.EventTime < DATEADD(MINUTE, @minutes, r.TheWindow)
GROUP BY r.TheWindow
ORDER BY r.TheWindow;

Abridged results:

TheWindow          TransactionCount
---------------- ----------------
2020-01-01 00:00 0
2020-01-01 00:30 0
2020-01-01 01:00 0
2020-01-01 01:30 0
2020-01-01 02:00 1

2020-01-03 21:30 0
2020-01-03 22:00 1
2020-01-03 22:30 0
2020-01-03 23:00 0
2020-01-03 23:30 0

The plan for this query looks like this (with MAXDOP 1):

query plan

While the calendar table is tiny, the scan to get the 48 rows per day is a little wasteful, so feel free to allow for parallelism or to use any technique you may already like for efficient number series generation (this set of posts from Itzik Ben-Gan is excellent).

Identifying Date Gaps and Islands

The above queries help eliminate gaps, but what if we want to highlight just the gaps or islands? Queries get a little more complex but I have used a DENSE_RANK() pattern with great success, and having a calendar table (and a function that draws from that table) helps keep the query logic manageable.

In the transaction data above, we have already identified that there are gaps, but let’s make the data a bit more sparse by deleting data from two of the days:

DELETE SalesStuff.dbo.Transactions 
WHERE EventTime >= '20200104'
AND EventTime < '20200106';

Now we should have grouped data that looks like this (and I’m assigning a label for how I describe each set of rows as they flip between being present and not):

TheDate      TransactionCount   
---------- ----------------
2020-01-01 8 -- island #1
2020-01-02 0 -- gap #1
2020-01-03 0 -- gap #1
2020-01-04 0 -- gap #1
2020-01-05 1 -- island #2
2020-01-06 9 -- island #2
2020-01-07 0 -- gap #2
2020-01-08 7 -- island #3

Now, a query I often use to return both gaps and islands together (and capable of generating exactly the previous result set) looks like the following:

DECLARE @Start date = '20200101',
@End date = '20200108'; ;WITH src AS
(
SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn)
FROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS f
LEFT OUTER JOIN SalesStuff.dbo.Transactions AS t
ON t.EventTime >= f.TheDate
AND t.EventTime < DATEADD(DAY, 1, f.TheDate)
GROUP BY f.TheDate
),
gaps AS
(
SELECT TheDate,
TransactionCount,
gap = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate)
FROM src
WHERE TransactionCount = 0
),
islands AS
(
SELECT TheDate,
TransactionCount,
island = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate)
FROM src WHERE TransactionCount > 0
),
rawdata(TheDate, TransactionCount, Label) AS
(
SELECT TheDate,
TransactionCount,
Label = 'gap ' + RTRIM(DENSE_RANK() OVER (ORDER BY gap))
FROM gaps
UNION ALL
SELECT TheDate,
TransactionCount,
Label = 'island ' + RTRIM(DENSE_RANK() OVER (ORDER BY island))
FROM islands
)
SELECT TheDate,
Label,
TransactionCount
FROM rawdata
ORDER BY TheDate;

This returns the following result:

TheDate      Label      TransactionCount
---------- -------- ----------------
2020-01-01 island 1 8
2020-01-02 gap 1 0
2020-01-03 gap 1 0
2020-01-04 gap 1 0
2020-01-05 island 2 1
2020-01-06 island 2 9
2020-01-07 gap 2 0
2020-01-08 island 3 7

If I want to just identify the gaps, I can take my grouped query from before, and then apply DENSE_RANK() to the result:

DECLARE @Start date = '20200101',
@End date = '20200108'; ;WITH src AS
(
SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn)
FROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS f
LEFT OUTER JOIN SalesStuff.dbo.Transactions AS t
ON t.EventTime >= f.TheDate
AND t.EventTime < DATEADD(DAY, 1, f.TheDate)
GROUP BY f.TheDate
),
gaps AS
(
SELECT TheDate,
gap = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate)
FROM src
WHERE TransactionCount = 0
)
SELECT TheDate,
[Gap #] = DENSE_RANK() OVER (ORDER BY gap)
FROM gaps
ORDER BY TheDate;

Results:

TheDate      Gap #
---------- -----
2020-01-02 1
2020-01-03 1
2020-01-04 1
2020-01-07 2

Similarly, if I want to identify the islands, I can just flip the where clause in the second CTE:

DECLARE @Start date = '20200101',
@End date = '20200108'; ;WITH src AS
(
SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn)
FROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS f
LEFT OUTER JOIN SalesStuff.dbo.Transactions AS t
ON t.EventTime >= f.TheDate
AND t.EventTime < DATEADD(DAY, 1, f.TheDate)
GROUP BY f.TheDate
),
islands AS
(
SELECT TheDate,
island = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate)
FROM src
WHERE TransactionCount > 0
)
SELECT TheDate,
[Island #] = DENSE_RANK() OVER (ORDER BY island)
FROM islands
ORDER BY TheDate;

Results:

TheDate      Island #
---------- --------
2020-01-01 1
2020-01-05 2
2020-01-06 2
2020-01-08 3

If I want to group the gaps and islands together, showing each set on a row, and even highlight streaks and transaction counts across each streak, I can further group as follows:

DECLARE @Start date = '20200101',
@End date = '20200108'; ;WITH src AS
(
SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn)
FROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS f
LEFT OUTER JOIN SalesStuff.dbo.Transactions AS t
ON t.EventTime >= f.TheDate
AND t.EventTime < DATEADD(DAY, 1, f.TheDate)
GROUP BY f.TheDate
),
gaps AS
(
SELECT TheDate,
TransactionCount,
gap = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate)
FROM src
WHERE TransactionCount = 0
),
islands AS
(
SELECT TheDate,
TransactionCount,
island = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate)
FROM src WHERE TransactionCount > 0
),
rawdata(TheDate, TransactionCount, Label) AS
(
SELECT TheDate,
TransactionCount,
Label = 'gap ' + RTRIM(DENSE_RANK() OVER (ORDER BY gap))
FROM gaps
UNION ALL
SELECT TheDate,
TransactionCount,
Label = 'island ' + RTRIM(DENSE_RANK() OVER (ORDER BY island))
FROM islands
)
SELECT WindowStart = MIN(TheDate),
WindowEnd = MAX(TheDate),
TransactionCount = SUM(TransactionCount),
DayStreak = COUNT(*)
FROM rawdata
GROUP BY label
ORDER BY WindowStart;

Results:

WindowStart   WindowEnd    TransactionCount   DayStreak
----------- ---------- ---------------- ---------
2020-01-01 2020-01-01 8 1
2020-01-02 2020-01-04 0 3
2020-01-05 2020-01-06 10 2
2020-01-07 2020-01-07 0 1
2020-01-08 2020-01-08 7 1

You can include the label in the output if it provides a better visual cue than TransactionCount alone.

Of course there are likely many more exotic gaps and islands problems you may need to solve, and there are a lot of great treatments of the subject, including right here on this site.

Conclusion

In the first two parts in this series, I have shown several examples where a calendar table can help solve, or at least simplify, common business problems surrounding dates. In the next part I will demonstrate how to use a calendar table to configure and optimize schedules, forecast conflicts, and deal with other complex scheduling requirements.

Next Steps

For more info on calendar tables and general date handling in SQL Server, see these tips and videos:






get scripts

next tip button



About the author
MSSQLTips author Aaron Bertrand Aaron Bertrand (@AaronBertrand) is a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and also blogs at sqlblog.org.

View all my tips


Article Last Updated: 2021-04-28

Comments For This Article





download














get free sql tips
agree to terms