Overcome Challenges with GENERATE_SERIES in SQL Server

By:   |   Updated: 2022-08-09   |   Comments (2)   |   Related: More > SQL Server 2022


Problem

In an earlier tip, My Favorite T-SQL Enhancements in SQL Server 2022, I talked about several new T-SQL features in the upcoming version of SQL Server 2022. One of the most anticipated additions has been GENERATE_SERIES. We have all resorted to cumbersome workarounds over the years to efficiently generate a sequence of numbers. In the first public CTP, however, the syntax was strict: when you go out of bounds (e.g., forget the START and STOP argument names), the error message was not helpful. In this tip, I describe how they've resolved that issue in CTP 2.1, and what I would have done instead.

Solution

In the first CTP, the syntax for GENERATE_SERIES was:

SELECT value FROM GENERATE_SERIES(START = 1, STOP = 5, STEP = 1);

This produced the numbers 1, 2, 3, 4, 5. Nothing too magical there.

Naming Arguments

The first two arguments are required, but the third argument was (and is still) optional. But, more importantly, the arguments you passed had to be named. If you tried to pass unnamed arguments:

SELECT value FROM GENERATE_SERIES(1, 5, 1);

An error message appeared that could be misleading:

Msg 208, Level 16, State 1
Invalid object name 'GENERATE_SERIES'.

As I mentioned in the previous tip, this might make you think something very different is going wrong, like you spelled the function wrong, or connected to the wrong server. Maybe it's case-sensitive. There is no clue that the arguments had to be named. This is also different from other implementations in recent versions – STRING_SPLIT and STRING_AGG, for example, do not require named parameters. But you may recall that STRING_SPLIT does require a compatibility level update that makes the function name a reserved keyword. The named parameters for GENERATE_SERIES were put there to avoid that conflict and not require an updated compatibility level. The core issue this solves is that the parser could potentially confuse it with your own function called dbo.GENERATE_SERIES.

In any case, I'm not sure about you, but I was annoyed every time I used this operator because I think naming the arguments shouldn't be necessary. I and many others involved in the private previews voiced our concerns about this syntax, and now they have fixed it!

Well, "fixed" is perhaps generous.

They have removed the named parameters, so now the syntax is simply:

SELECT value FROM GENERATE_SERIES(1, 5);     -- 1, 2, 3, 4, 5
SELECT value FROM GENERATE_SERIES(2, 10, 2); -- 2, 4, 6, 8, 10

Named arguments are no longer supported (I thought they may become optional). If you try using them, you get this error message:

Msg 102, Level 15, State 38
Incorrect syntax near '='.

And if you use the correct syntax in a database with compatibility level set lower than 160, you get an error message (just like with STRING_SPLIT in compatibility levels lower than 130):

Msg 208, Level 16, State 1
Invalid object name 'GENERATE_SERIES'.

If you see code samples with the original syntax, hopefully, this explains why they don't work in newer builds.

What I Would Have Done

If I were Microsoft and I were facing this difficult choice of requiring cumbersome syntax or forcing compatibility levels, I would have approached this in a different way. I would have created a system table-valued function (TVF) to act as a wrapper around the new operator. So, leave the named arguments in place, but create a TVF like this:

CREATE FUNCTION sys.generate_series
(
@start int,
@stop int,
@step int
)
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
(
SELECT value
FROM GENERATE_SERIES
(START = @start, STOP = @stop, STEP = @step)
);

As a user, if I wanted to avoid the cumbersome syntax, I could just say:

SELECT value FROM sys.generate_series(1, 5, 1);

Or I could learn to live with the named argument syntax. This would give users a choice and could have been implemented without changing existing code investments or pinning syntax to a compatibility level.

Sadly, I provided this feedback, but it was deemed too late to implement. To be fair, it also introduces a challenge with the optional third argument, since TVFs don't support optional parameters.

In my own codebase – since I can't imagine ever needing a step value other than 1, I was prepared to work around this with my own wrapper function anyway:

CREATE FUNCTION dbo.GenerateContiguousRange
(
@start int,
@stop int
)
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
(
SELECT value
FROM GENERATE_SERIES
(START = @start, STOP = @stop, STEP = 1)
);

Now I could pass one fewer argument:

SELECT value FROM dbo.GenerateRange(1, 5);

This again returns 1, 2, 3, 4, 5.

And since I will almost always want a sequence that starts at 1, I can create an additional function that hardcodes both the START and STEP values:

CREATE FUNCTION dbo.GenerateContiguousSequence
(
@stop int
)
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
(
SELECT value
FROM GENERATE_SERIES
(1, @stop, 1)
);

This would simplify my code even further:

SELECT value FROM dbo.GenerateSequence(5);

Once more, this produces 1, 2, 3, 4, 5.

You could use any of these wrapper TVFs to work around the compatibility level restriction, too. Just put them in a database that is in compatibility level 160 and call those wrapper functions from your databases that aren't. I wrote about this technique in a previous tip, SQL Server Split String Replacement Code with STRING_SPLIT.

What Else I Would Have Done

Since this probably won't be the last time SQL Server introduces a new relational operator with syntax that conflicts with older compatibility levels, an additional thing I would have done in this release is to formally deprecate calling any user-defined functions without the schema. That this is an operator is an important but subtle distinction: not all operators operate like table-value functions, and not all functions are operators, but in this case they are interchangeable.

If I have code that calls GENERATE_SERIES and I expect that to invoke my own function (dbo.generate_series) and not the new built-in one, I have to either explicitly add dbo., or accept what SQL Server might do with the ambiguity (guess and call the "wrong" one, or return an error). Maybe by putting that into compat level restrictions, instead of facing a dilemma with every new T-SQL operator, we could short circuit a lot of these future issues. In this case, a built-in wrapper (sys.generate_series) could simplify things because that would also prevent ambiguity.

Matching Argument Types

Using a wrapper function can also help with another minor limitation in the operator that remains: the types of all three arguments must be compatible. To get values that increment by 0.5, I can say:

SELECT value FROM GENERATE_SERIES(5.0, 7.0, 0.5);

This produces 5.0, 5.5, 6.0, 6.5, 7.0.

However, if I try to use integers for the first two arguments:

SELECT value FROM GENERATE_SERIES(5, 7, 0.5);

I get four error messages in CTP 2.1 (which may be fixed in future builds):

Msg 8116, Level 16, State 3, Line 1
Argument data type numeric is invalid for argument 3 of generate_series function.
Msg 206, Level 16, State 2, Line 1
Operand type clash: int is incompatible with void type
Msg 206, Level 16, State 2, Line 1
Operand type clash: int is incompatible with void type
Msg 206, Level 16, State 2, Line 1
Operand type clash: numeric is incompatible with void type

This is not very logical to me and is once again misleading. The third argument can absolutely be a numeric type, but only if the first two are compatible. The first two arguments drive the return type, so if the operator infers integers, it fails because you can't add 0.5 to an integer. Well, you can, but not correctly:

DECLARE @i int = 5;
SET @i += 0.5;
SELECT @i;

This yields 5, not 5.5, due to SQL Server's strong typing and adherence to integer math.

To work around this, I can create a wrapper function that will accept integers, but still pass numeric values with matching types to the built-in function:

CREATE FUNCTION dbo.GenerateDecimalSeries
(
@start decimal(10,1),
@stop decimal(10,1),
@step decimal(10,1)
)
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
(
SELECT value
FROM GENERATE_SERIES
(@start, @stop, @step)
);

Which I can call this way, relying on implicit conversion of the first two arguments:

SELECT value FROM dbo.GenerateDecimalSeries(5, 7, 0.5);

Now I get the same results (5.0, 5.5, 6.0, 6.5, 7.0) without having to format all the inputs as decimals.

Next Steps

If you have a use for the functionality GENERATE_SERIES provides and want to insulate yourself from some of the syntax challenges, you can use wrapper functions as I've described here.

In the meantime, go grab the CTP, have a play with the new features, and see these tips and other resources:






get scripts

next tip button



About the author
MSSQLTips author Aaron Bertrand Aaron Bertrand (@AaronBertrand) is a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and also blogs at sqlblog.org.

View all my tips


Article Last Updated: 2022-08-09

Comments For This Article




Tuesday, August 23, 2022 - 4:59:35 PM - DWalker Back To Top (90405)
I think this example is not quite valid:

DECLARE @i int = 5;
SET @i += 0.5;
SELECT @i;

Because this works as expected:
DECLARE @i int = 5;
SELECT @i + 0.5;

Your example implicitly casts 5.5 to an integer in the SET statement, so it's not really showing that you can't add a numeric to an integer and get the expected result...

Tuesday, August 23, 2022 - 4:55:51 PM - DWalker Back To Top (90404)
For the 0.5 case, I predict that you could use

Select * from Generate_Series(5.0, 7,0, 0.5)

Or something equivalent.














get free sql tips
agree to terms