My Favorite T-SQL Enhancements in SQL Server 2022

By:   |   Updated: 2022-05-25   |   Comments (8)   |   Related: More > SQL Server 2022


Problem

Every SQL Server release has new capabilities that are exciting to some group of customers – sometimes a change is introduced to please sysadmins, sometimes it's for finance, sometimes it's for customers of other platforms, and sometimes it's for developers. Now that the first public preview of SQL Server 2022 is available, I'll talk about some of the other new features in another post, but today I wanted to share a few of my favorite new features that will excite anyone who writes Transact-SQL.

Solution

I like to talk early about T-SQL enhancements because, unlike major features, they are largely baked in by the time the first public beta versions hit the shelves. They also aren't prone to further changes or renames by other Microsoft business units (like marketing).

A few of the most useful changes I've been able to play with in SQL Server 2022 so far:

  • GREATEST / LEAST
  • STRING_SPLIT
  • DATE_BUCKET
  • GENERATE_SERIES

In this tip, I'll explain each one, and show some practical use cases.

GREATEST / LEAST

Having covered them before, these functions are basically MAX and MIN, but across columns instead of across rows. A quick demonstration:

SELECT GREATEST(1, 5), -- returns 5
GREATEST(6, 2), -- returns 6
LEAST (1, 5), -- returns 1
LEAST (6, 2); -- returns 2

In this simple example, the logic is a lot like a CASE expression. Taking just the first one:

SELECT CASE WHEN 1 > 5 THEN 1 ELSE 5 END;

When evaluating two expressions, it really is that simple: is the first one bigger than the second, or not? (However, keep NULL handling in mind; like MIN and MAX, both new functions also ignore NULLs.)

When you introduce a third value, though, it becomes much more complex. While the new syntax will offer:

SELECT GREATEST(1, 5, 3); -- returns 5

How we would write this as a CASE expression in current and older versions is tedious:

SELECT CASE 
WHEN 1 > 5 THEN
CASE WHEN 1 > 3 THEN 1 ELSE 3 END
ELSE
CASE WHEN 5 > 3 THEN 5 ELSE 3 END
END;

And it gets much worse from there, as you can imagine. I'm not even going to try typing out what that spiderweb of CASE expressions would look like when comparing 4 or more values.

But let's look at a real problem, because it's hard to feign complexity when we're talking about things we can easily do in our head. Picture a table like this, that holds a row for each year, and monthly sales figure columns (we'll just have three months to keep it simple):

CREATE TABLE dbo.SummarizedSales
(
Year int,
Jan int,
Feb int,
Mar int --,...
); INSERT dbo.SummarizedSales(Year, Jan, Feb, Mar)
VALUES
(2021, 55000, 81000, 74000),
(2022, 60000, 92000, 86000);

If we want to return the lowest and highest sales figure for each year, we could write nasty CASE expressions (again, just imagine that if we had all the months):

SELECT Year, 
  BestMonth = CASE 
WHEN Jan > Feb THEN
CASE WHEN Jan > Mar THEN Jan ELSE Mar END
ELSE
CASE WHEN Mar > Feb THEN Mar ELSE Feb END
END, WorstMonth = CASE
WHEN Jan < Feb THEN
CASE WHEN Jan < Mar THEN Jan ELSE Mar END
ELSE
CASE WHEN Mar < Feb THEN Mar ELSE Feb END
END FROM dbo.SummarizedSales;

Output:

Year   BestMonth   WorstMonth
---- --------- ----------
2021 81000 55000
2022 92000 60000

There are a couple of other ways to solve this problem, that at least scale better without complicating the code exponentially. Here's one way using UNPIVOT:

SELECT Year, 
BestMonth = MAX(Months.MonthlyTotal),
WorstMonth = MIN(Months.MonthlyTotal)
FROM dbo.SummarizedSales AS s
UNPIVOT
(
MonthlyTotal FOR [Month] IN ([Jan],[Feb],[Mar])
) AS Months
GROUP BY Year;

And here's one using CROSS APPLY:

SELECT Year,
BestMonth = MAX(MonthlyTotal),
WorstMonth = MIN(MonthlyTotal)
FROM
(
SELECT s.Year, Months.MonthlyTotal
FROM dbo.SummarizedSales AS s
CROSS APPLY (VALUES([Jan]),([Feb]),([Mar])) AS [Months](MonthlyTotal)
) AS Sales
GROUP BY Year;

Those are easier to expand to cover more columns, but they're still tedious, and I don't like that both use transpose and grouping operations. Now we can perform this kind of task with ease:

SELECT Year,
BestMonth = GREATEST([Jan],[Feb],[Mar]),
WorstMonth = LEAST ([Jan],[Feb],[Mar])
FROM dbo.SummarizedSales;

STRING_SPLIT

I wrote about the enable_ordinal enhancement to this function in this tip, but I wanted to mention it again because, at the time, I could not confirm the change would make it into SQL Server 2022. Now I can, and I wanted to mention a few use cases where having the ordinal is beneficial:

Determining the nth item in a comma-separated list

I've seen many requests to return the 2nd or 3rd item in a list, which was cumbersome to do with STRING_SPLIT before because the output order was not guaranteed. Instead, you'd see more verbose solutions with OPENJSON or tricks with PARSENAME. With this new parameter, I can simply say:

DECLARE @list nvarchar(max) = N'35, Bugatti, 89, Astley';
SELECT value FROM STRING_SPLIT(@list, N',', 1) WHERE ordinal = 2;
-- output is now guaranteed to be Bugatti

Joining to ordered data based on position in list

Let's say you want to assign new listings to salespeople based on past performance. You have this table:

CREATE TABLE dbo.SalesLeaderBoard
(
SalesPersonID int,
SalesSoFar int
); INSERT dbo.SalesLeaderBoard(SalesPersonID, SalesSoFar)
VALUES(1,2),(2,7),(3,8),(4,5),(5,1),(6,12);

And now you have a set of new listings that have come in, ranked by preference:

DECLARE @NewListings varchar(max) = '81,76, 80';

In this case, we'd want to assign the most preferential listing (81) to salesperson 6, the second listing (76) to salesperson 3, and the third (80) to salesperson 2. Having a meaningful and reliable ordinal output makes this easy:

SELECT Leaders.SalesPersonID, Listing = Listings.value
FROM STRING_SPLIT(@NewListings, ',', 1) AS Listings
INNER JOIN
(
SELECT TOP (3) SalesPersonID,
Ranking = ROW_NUMBER() OVER
(ORDER BY SalesSoFar DESC, SalesPersonID)
-- tie-breaker: seniority ---^^^^^^^^^^^^^
FROM dbo.SalesLeaderBoard
ORDER BY SalesSoFar DESC
) AS Leaders
ON Listings.ordinal = Leaders.Ranking;

Output:

SalesPersonID  Listing
------------- -------
6 81
3 76
2 80

Reconstructing strings and preserving order

Another scenario I've dealt with in a tedious way is reconstructing a string to remove duplicates. Let's say we have a string like this:

Bravo/Alpha/Bravo/Tango/Delta/Bravo/Alpha/Delta

We want to remove duplicates from the list, but also maintain the original order, making the desired output:

Bravo/Alpha/Tango/Delta

With this new functionality we can accurately rebuild the string in a very direct way by taking the first instance of any string in the list, and then using its overall ordinal position to define the ordering used by STRING_AGG:

DECLARE @List nvarchar(max), @Delim nchar(1) = N'/';
SET @List = N'Bravo/Alpha/Bravo/Tango/Delta/Bravo/Alpha/Delta';
SELECT STRING_AGG(value, N'/') WITHIN GROUP (ORDER BY ordinal)
FROM
(
SELECT value, ordinal = MIN(ordinal)
FROM STRING_SPLIT(@List, @Delim, 1)
GROUP BY value
) AS src;

Output:

Bravo/Alpha/Tango/Delta

That is a much simpler approach than any of the awkward solutions I've used in the past.

That all said…

STRING_SPLIT is unfortunately still limited by a single-character delimiter, which I have addressed in this previous tip. But the new enable_ordinal argument does simplify some of the more frequent use cases that have traditionally required tiresome workarounds. It also adds a performance benefit compared to current methods, because the optimizer recognizes that the data is returned sorted. Meaning it won't always need to explicitly add a sort operator in the plan if the data needs to be ordered by ordinal. While the complex example above does require sorting, the following example does not:

DECLARE @List varchar(max) = N'32,27,6,54';
SELECT value FROM STRING_SPLIT(@List, ',', 1) 
ORDER BY ordinal;

Here is the plan:

Plan for STRING_SPLIT with ORDER BY ordinal

DATE_BUCKET

This function collapses a date/time to a fixed interval, eliminating the need to round datetime values, extract date parts, perform wild conversions to and from other types like float, or make elaborate and unintuitive dateadd/datediff calculations (sometimes using magic dates from the past).

The arguments are:

DATE_BUCKET(<datepart>, <bucket_width>, <input date/time> [, <origin>])

The output is a date/time type (based on the input), but at an interval governed by the datepart and bucket_width. For example, if I wanted to simplify the output of a particular column so it just gave me the month boundaries, I might have done this in the past:

SELECT name, modify_date,
MonthModified = DATEADD(MONTH, DATEDIFF(MONTH, '19000101', modify_date), '19000101')
FROM sys.all_objects;

Or this, on SQL Server 2012 or better:

SELECT name, modify_date,
MonthModified = DATEFROMPARTS(YEAR(modify_date), MONTH(modify_date), 1)
FROM sys.all_objects;

Now, in SQL Server 2022, I can do this:

SELECT name, modify_date,
MonthModified = DATE_BUCKET(MONTH, 1, modify_date)
FROM sys.all_objects;

All three of the above queries give me identical results:

name                          modify_date               MonthModified
--------------------------- ----------------------- -----------------------
sp_MSalreadyhavegeneration 2022-04-05 17:46:02.420 2022-04-01 00:00:00.000
sp_MSwritemergeperfcounter 2022-04-05 17:46:15.410 2022-04-01 00:00:00.000
sp_drop_trusted_assembly 2022-04-05 17:45:32.097 2022-04-01 00:00:00.000
sp_replsetsyncstatus 2022-04-05 17:45:41.850 2022-04-01 00:00:00.000
sp_replshowcmds 2022-04-05 17:45:48.197 2022-04-01 00:00:00.000

Additionally, this might have better performance in some cases; since the function is order-preserving, there are cases where a sort can be avoided. Getting out of the system objects business, let's create a simpler table, and compare the plans generated by grouping:

DECLARE @t table(TheDate date PRIMARY KEY);
INSERT @t(TheDate) VALUES('20220701'),('20220702'),('20220703');
SELECT TheMonth = DATEFROMPARTS(YEAR(TheDate), MONTH(TheDate), 1),
TheCount = COUNT(*)
FROM @t GROUP BY DATEFROMPARTS(YEAR(TheDate), MONTH(TheDate), 1); SELECT TheMonth = DATE_BUCKET(MONTH, 1, TheDate),
TheCount = COUNT(*)
FROM @t GROUP BY DATE_BUCKET(MONTH, 1, TheDate);

Here are the plans:

Comparing grouping plans for DATEFROMPARTS and DATE_BUCKET

More importantly, the function allows me to do much more elaborate things, like segmenting data into 5-minute intervals:

DECLARE @Orders table(OrderID int, OrderDate datetime);
INSERT @Orders(OrderID, OrderDate) VALUES   (1,'20220501 00:03'),
(1,'20220501 00:04'), (1,'20220501 00:05'), (1,'20220501 00:06'),
(1,'20220501 00:07'), (1,'20220501 00:10'), (1,'20220501 00:11'); SELECT Interval = DATE_BUCKET(MINUTE, 5, OrderDate),
OrderCount = COUNT(*)
FROM @Orders
GROUP BY DATE_BUCKET(MINUTE, 5, OrderDate);

Output:

Interval                     OrderCount
----------------------- ----------
2022-05-01 00:00:00.000 2
2022-05-01 00:05:00.000 3
2022-05-01 00:10:00.000 2

Want 10-minute intervals? No problem. We can even pass a parameter or variable so we can adjust on the fly:

DECLARE @MinuteWindow tinyint = 10;
SELECT Interval = DATE_BUCKET(MINUTE, @MinuteWindow, OrderDate), 
OrderCount = COUNT(*)
FROM @Orders
GROUP BY DATE_BUCKET(MINUTE, @MinuteWindow, OrderDate);

Output:

Interval                     OrderCount
----------------------- ----------
2022-05-01 00:00:00.000 5
2022-05-01 00:10:00.000 2

Another thing I can do is vastly simplify week boundary calculations. Here is a completely unintuitive and cryptic way to get the previous Saturday (regardless of a user's SET DATEFIRST or SET LANGUAGE settings):

DECLARE @d date = GETDATE(), @PrevSat date;
SET @PrevSat = DATEADD(DAY, -(DATEPART(WEEKDAY, @d) + @@DATEFIRST) % 7, @d);
SELECT @PrevSat;

If we know any Saturday in the past, like January 1st, 2000, we can simplify this as follows, by passing that date into the origin parameter:

DECLARE @KnownSat date = '20000101';
SET @PrevSat = DATE_BUCKET(WEEK, 1, @d, @KnownSat);
SELECT @PrevSat;

This gives the same answer (at the time of writing – Tuesday, May 24th, 2022 – this returned Saturday, May 21st, 2022). And, like above, if we have a bunch of data, we can use this same technique to filter or group based on any known weekday.

DECLARE @LawnServices table(CustomerID int, ServiceDate date);
INSERT @LawnServices(CustomerID, ServiceDate) VALUES (1, '20220501'),
(1, '20220508'), (1, '20220516'), (1, '20220526'), (1, '20220603'),
(2, '20220501'), (2, '20220517'), (2, '20220527'), (1, '20220602'); DECLARE @KnownSat date = '20000101'; SELECT [Week] = DATE_BUCKET(WEEK, 1, ServiceDate, @KnownSat), Services = COUNT(*)
FROM @LawnServices
GROUP BY DATE_BUCKET(WEEK, 1, ServiceDate, @KnownSat);

Output:

Week          Services
---------- ---------
2022-04-30 2
2022-05-07 1
2022-05-14 2
2022-05-21 2
2022-05-28 2

This is a much simpler way to segment data based on a non-standard work week. As another example, our team's on-call schedule at Stack Overflow cycles on Wednesdays, and I've already used this function to map out our future schedule.

GENERATE_SERIES

This function produces a set-based sequence of numeric values. It supplants cumbersome numbers tables, recursive CTEs, and other on-the-fly sequence generation techniques we've all used at one point or another.

The arguments are:

GENERATE_SERIES(START = <start>, STOP = <stop> [, STEP = <step>])

A couple of simple examples:

SELECT value FROM GENERATE_SERIES(START = 1, STOP = 5);
SELECT value FROM GENERATE_SERIES(START = 1, STOP = 32, STEP = 7);

Output:

value
-----
1
2
3
4
5 value
-----
1
8
15
22
29

(Note that it won't include the STOP value, or anything near it, if the next STEP pushes past it.)

In previous versions, to generate a sequence of numbers like this, you would probably use a numbers table, or a recursive CTE like this:

WITH cte(n) AS 
(
SELECT 1 UNION ALL
SELECT n + 1 FROM n WHERE n < 5
)
SELECT value = n /* or ((n-1)*7)+1 */ FROM cte;

GENERATE_SERIES has a clear advantage here in terms of simplicity.

One downside is that this is an operator, not a table-valued function, so there is an unintuitive outcome if you don't name the parameters explicitly:

SELECT * FROM GENERATE_SERIES(1, 5);

The error message you'll see is a bit misleading:

Msg 208, Level 16, State 1, Line 1
Invalid object name 'GENERATE_SERIES'.

This can imply a variety of issues, including that the function isn't there, that they need a dbo. prefix, that they connected to the wrong server, that the upgrade failed, or that it's actually a scalar function.

Another downside is that – at least in current builds – the function is not order-preserving. This means that if you try to sort by value, there is a sort in the plan, whereas that has been guarded against in many cases with the other functions mentioned above. For this example:

SELECT value FROM GENERATE_SERIES(START = 1, STOP = 5) ORDER BY value;

Here is the plan:

Expensive sort in GENERATE_SERIES when ordering by value

Still, this is a versatile function that will simplify code and – even with the sort – performs no worse than existing methods that typically require sorts as well. Microsoft is aware of the issue so, when they fix that, even better!

BONUS

We can combine DATE_BUCKET and GENERATE_SERIES to build a contiguous series of date/time values. I often see people struggle to build a full data set when they are reporting on intervals where not all intervals are populated. For example, I want hourly sales figures across a day but, if we're selling something like cars, not every hour will always contain a sale. Let's say we have this data:

CREATE TABLE dbo.Sales
(
OrderDateTime datetime,
Total decimal(12,2)
); INSERT dbo.Sales(OrderDateTime, Total) VALUES
('20220501 09:35', 21000), ('20220501 09:47', 30000),
('20220501 11:35', 23000), ('20220501 12:55', 32500),
('20220501 12:57', 16000), ('20220501 13:42', 17900),
('20220501 15:05', 20950), ('20220501 15:45', 24700),
('20220501 15:49', 18750), ('20220501 15:51', 21800);

If I want to find the hourly sales for business hours on May 1st, I might write this query:

DECLARE @Start datetime = '20220501 09:00',
@End datetime = '20220501 17:00'; SELECT OrderHour, HourlySales = SUM(Total)
FROM
(
SELECT Total,
OrderHour = DATEADD(HOUR, DATEDIFF(HOUR, @Start, OrderDateTime), @Start)
FROM dbo.Sales
WHERE OrderDateTime >= @Start
AND OrderDateTime < @End
) AS sq
GROUP BY OrderHour;

What I get:

OrderHour           HourlySales
---------------- -----------
2022-05-01 09:00 51000.00
2022-05-01 11:00 23000.00
2022-05-01 12:00 48500.00
2022-05-01 13:00 17900.00
2022-05-01 15:00 86200.00

What I actually want is a row for each hour, even if there were no sales:

OrderHour           HourlySales
---------------- -----------
2022-05-01 09:00 51000.00
2022-05-01 10:00 0.00
2022-05-01 11:00 23000.00
2022-05-01 12:00 48500.00
2022-05-01 13:00 17900.00
2022-05-01 14:00 0.00
2022-05-01 15:00 86200.00
2022-05-01 16:00 0.00

The typical way we'd start is with a simple recursive CTE that builds out all the possible rows in the range, and then performs a left join against the populated data.

DECLARE @Start datetime = '20220501 09:00',
@End datetime = '20220501 17:00'; ;WITH Hours(OrderHour) AS
(
SELECT @Start
UNION ALL
SELECT DATEADD(HOUR, 1, OrderHour)
FROM Hours WHERE OrderHour < @End
),
SalesData AS
(
SELECT OrderHour, HourlySales = SUM(Total)
FROM
(
SELECT Total,
OrderHour = DATEADD(HOUR, DATEDIFF(HOUR, @Start, OrderDateTime), @Start)
FROM dbo.Sales
WHERE OrderDateTime >= @Start
AND OrderDateTime < @End
) AS sq
GROUP BY OrderHour
)
SELECT OrderHour = h.OrderHour,
HourlySales = COALESCE(sd.HourlySales, 0)
FROM Hours AS h
LEFT OUTER JOIN SalesData AS sd
ON h.OrderHour = sd.OrderHour
WHERE h.OrderHour < @End;

The thing I like least about this solution is the awkward dateadd/datediff expression to normalize date/time data to the top of the hour. Functions like SMALLDATETIMEFROMPARTS are clearer in their intent, but even more hassle to construct. Instead, I wanted to use DATE_BUCKET and GENERATE_SERIES to turn this whole query pattern on its head:

DECLARE @Start datetime = '20220501 09:00',
@End datetime = '20220501 17:00'; ;WITH Hours(OrderHour) AS
(
SELECT DATE_BUCKET(HOUR, 1, DATEADD(HOUR, gs.value, @Start))
FROM GENERATE_SERIES
(
START = 0,
STOP = DATEDIFF(HOUR, @Start, @End) – 1
) AS gs
)
SELECT h.OrderHour, HourlySales = COALESCE(SUM(Total),0)
FROM Hours AS h
LEFT OUTER JOIN dbo.Sales AS s
ON h.OrderHour = DATE_BUCKET(HOUR, 1, s.OrderDateTime)
/* -- alternatively:
ON s.OrderDateTime >= h.OrderHour
AND s.OrderDateTime < DATEADD(HOUR, 1, h.OrderHour) */
GROUP BY h.OrderHour;

I see great potential in both functions to help simplify logic and reduce dependencies on helper objects.

Honorable Mentions

There are a few other T-SQL enhancements coming in SQL Server 2022, but I'm going to leave them for Itzik Ben-Gan to tell you about in this article:

  • WINDOW clause
  • NULL treatment clause (IGNORE NULLS | RESPECT NULLS)
Next Steps

Note that many of the features that eventually make it into a major release of SQL Server first appear in Azure SQL Database and/or Azure SQL Edge. In fact, most of the functions above were available in those flavors months ago. You don't necessarily have to wait for a public preview to kick the tires on new syntax. But for SQL Server 2022 specifically, the first builds available for CTP 2.0 are only for Windows.

Now that it's here, though, go grab the CTP, see the "What's New" documentation, and grab Bob Ward's demos.

In the meantime, see these tips and other resources:




Get Started Now - Click here to get your free 14 day trial of SolarWinds Database Insights




get scripts

next tip button



About the author
MSSQLTips author Aaron Bertrand Aaron Bertrand (@AaronBertrand) is a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and also blogs at sqlblog.org.

View all my tips


Article Last Updated: 2022-05-25

Comments For This Article




Thursday, May 26, 2022 - 2:44:05 PM - Martin Smith Back To Top (90117)
@Aaron. That's great news. With all the fixes in place hopefully this will just be the unambiguously best choice for this type of requirement going forward. The actual generation of numbers seems fast enough.

Thursday, May 26, 2022 - 1:59:22 PM - Aaron Bertrand Back To Top (90116)
@Martin they're aware of the spool issue and it should be part of the same set of fixes.

Thursday, May 26, 2022 - 8:13:32 AM - Aaron Bertrand Back To Top (90115)
Hey @Martin, I'm not sure, that didn't come up during the private CTPs.

I'd say post a repro at https://aka.ms/sqlfeedback but that doesn't seem to have worked too well in the past. :-)

https://feedback.azure.com/d365community/idea/cf240013-7c25-ec11-b6e6-000d3a4f0da0

I'll check if it's on anyone's radar.

Wednesday, May 25, 2022 - 6:45:49 PM - Jeff Moden Back To Top (90114)
The stuff in your article is a whole lot better than what's in the official documentation. For example, they don't even mention the ability to generate date sequences using the GENERATE_SERIES() operator never mind how to replace the tricks we've had to do over the last couple of decades. Very nicely done, Aaron.

Wednesday, May 25, 2022 - 4:59:36 PM - Martin Smith Back To Top (90113)
RE: "Microsoft is aware of the issue" with the unnecessary sort in GENERATE_SERIES do you happen to know if they are also aware of unnecessary spools when using to insert to a heap? (presumably for Halloween Protection)?

Wednesday, May 25, 2022 - 11:22:05 AM - Aaron M Bertrand Back To Top (90112)
Thanks mmcdonald, it's true, the output is slightly different, but if you're returning a day boundary in all three cases (midnight / no time component), it's easy enough to explicitly convert 1/3 to date or (2) to datetime, if you _need_ the output to be one or the other.

Wednesday, May 25, 2022 - 10:22:54 AM - mmcdonald Back To Top (90111)
Good Stuff!

DATE_BUCKET:

The first and third (DATE_BUCKET) examples return a datetime whereas the second (DATEFROMPARTS) only returns a date so the example provided works but DATEFROMPARTS is limited compared to the first and the third options. I don't have the pre-release of SQL2022 installed yet, but DATE_BUCKET looks, to be me at least, to be cleaner and easier to work with.

Cheers

Wednesday, May 25, 2022 - 7:06:16 AM - Junior Galvao MVP Back To Top (90109)
Hi,

Great news.

Thanks.














get free sql tips
agree to terms