Trusting STRING_SPLIT() order in Azure SQL Database
I recently wrote about splitting strings with OPENJSON() to support returning an ordinal position, and it seems like every time I write about this topic, something in our ecosystem changes. Well, something has changed in Azure SQL Database that is going to make a lot of people happy.
This is always a hot topic because folks need to break apart delimited strings, and often they need to rely on the output to be in the right order - or at least have some way to reliably determine the original sequence. As a result, some come up with all kinds of (usually costlier) alternatives to using STRING_SPLIT. Others believe STRING_SPLIT is already going to give the strings back in the same order as they appear in the original string; this likely stems from the documentation originally stating:
- The sort order of the output rows matches the order of the substrings in the input string.
In November 2018, that string was replaced with the following:
- The output rows might be in any order. The order is not guaranteed to match the order of the substrings in the input string.
Several people have argued with me that they can’t imagine why it would work any other way, and that applying ROW_NUMBER() over some constant (like NULL, or @@SPID, or @@TRANCOUNT) should guarantee that the row numbers are applied in input order. I have been reassured by SQL Server database engine experts that this is not a guarantee we should assume and is, in fact, why they changed the documentation in the first place: to warn us against making that assumption.
In a previous tip, A way to improve STRING_SPLIT in SQL Server - and you can help, I asked for your assistance in voting up a suggestion that would solve the problem:
The request is simple: add a column to the output of STRING_SPLIT so we can deterministically order by exactly the input order, or reverse order, or pluck the 3rd element, or match the elements to another list by ordinal position instead of by crossing our fingers.
There have been over 700 votes on that feedback item, but the only indication Microsoft has even seen it is that the status changed to "Under Review."
Then something strange happened. I recently downloaded the latest version of SQL Server Management Studio, and was testing the OPENJSON() function I showed in the previous tip on this topic, and I saw this:
Well, that’s silly, I thought, because of course STRING_SPLIT() takes exactly two arguments. When I hovered over the function name, the help tooltip showed me this third argument (emphasis mine):
enable_ordinal, huh? I thought for sure I was getting punked, and that someone messed with IntelliSense on me.
I tried this code:
SELECT * FROM STRING_SPLIT(N'foo,bar', N',', 1);
Predictably, it yielded:
Msg 8144, Level 16, State 3
Procedure or function STRING_SPLIT has too many arguments specified.
What was happening? Well, the next time I connected to an Azure SQL Database instance, I found out. I tried the query there and, lo and behold, the syntax was accepted by the engine (even if Azure Data Studio’s IntelliSense hasn’t been updated yet):
The estimates are still the same, and it still only supports single-character delimiters, but hey, I’ll take minor improvements anytime.
This is only currently available in Azure SQL Database, and I have not done any performance testing on this (either between STRING_SPLIT with or without the optional enable_ordinal argument, or against other functions that provide the ordinal output in other ways). But if performance is comparable in large-scale tests, this is going to simplify a lot of environments, especially when it makes it to the on-premises editions of SQL Server (and it will).
See these tips and other resources relating to splitting strings in SQL Server:
- Ordered string splitting in SQL Server with OPENJSON
- Handling an unknown number of parameters in SQL Server
- A way to improve STRING_SPLIT in SQL Server - and you can help
- SQL Server Split String Replacement Code with STRING_SPLIT
- SQL Server STRING SPLIT Limitations
- Solve old problems with SQL Server’s new STRING_AGG and STRING_SPLIT functions
- Dealing with the single-character delimiter in SQL Server's STRING_SPLIT function
- Use Table-Valued Parameters Instead of Splitting Strings
- Review the Microsoft documentation which was recently updated with enable_ordinal
About the author
View all my tips
Article Last Updated: 2021-11-11