A way to improve STRING_SPLIT in SQL Server - and you can help

By:   |   Comments (6)   |   Related: > TSQL


Problem

STRING_SPLIT has two glaring limitations that I've written about here before (SQL Server STRING SPLIT Limitations): there is no indication of input order, and the separator is limited to a single character. I think the SQL Server team can fix this not by changing or overloading the existing function but, rather, by adding a new function that addresses these shortcomings.

Solution

Every time I write anything about splitting strings in SQL Server, I promise myself that it will be the last one I need to write. I'm going to stop trying to fool myself with that promise because I will probably continue into retirement.

The inspiration behind this specific tip came to me as I was trying to work around another issue where order mattered (a list of price changes) and where existing workarounds don't solve it easily (since they rely on pattern matching, but prices can repeat).

Imagine needing to pivot this sequence of values in order:

257.55, 324.12, 326.55, 324.12, 337.45

To get this output, and guaranteed to be in the right order, is a lot more cumbersome and inefficient than it might seem:

257.55
324.12
326.55
324.12 -- repeat
337.45

I'm not going to solve that here, as that is an article of its own. My point here is that I once again found myself wishing that STRING_SPLIT could return a key so we could explicitly order the results when we care. I acknowledge that overloading a function is not trivial,  and the risk to backward compatibility is high, so adding a parameter that has to be optional is not a path the product team is likely to take.

However, there is precedent for introducing new functions instead of changing or overloading existing ones: CONCAT. This function was first introduced in SQL Server 2012 as syntactic sugar for messy string concatenation operations. In SQL Server 2017, they added CONCAT_WS (WS = “with separator”) to address the need to define how to concat strings together. They didn't change the original function by adding a parameter, they just created a whole new function.

With this case in mind, why not add a function STRING_SPLIT_WK (WK = “with key”)? The output would simply add an additional column (“key”) that would indicate the sequence of the value in the original string.

So this call:

SELECT * FROM STRING_SPLIT_WK('257.55, 324.12, 326.55, 324.12, 337.45', ',');

Would yield this output:

key    value
---    ------
1      257.55
2      324.12
3      326.55
4      324.12
5      337.45

It doesn't even have to guarantee the values would be returned in that order, because I can explicitly say ORDER BY [key] (or add other functionality, like return only the first, last, or nth price).

As a bonus, they could also address the delimiter length limitation here, supporting 5, or 50, or 255 characters. Users could benefit from the longer delimiter and ignore the key, just like they can use CONCAT_WS for consistency but pass an empty string for the separator when they just want CONCAT behavior.

Summary

There's nothing you can do differently with this information on current versions of SQL Server. But if you'd like to see this functionality in the future (whether by overload or by a new function), please go vote, and state your business case in a comment, on this UserVoice item:

Next Steps

Read on for related tips and other resources involving string splitting:



sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Aaron Bertrand Aaron Bertrand (@AaronBertrand) is a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and also blogs at sqlblog.org.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Thursday, August 13, 2020 - 5:41:13 AM - Domnits Alen Back To Top (86301)
Hi all,
I would add another issue with this function:
Let's say, my values are strings that contain commas inside.
Why can't we add a new function to support quoted values?
SELECT * FROM SPLIT_ARR_Q('"1", "2 - two", "3,4"', ',')

Wednesday, August 12, 2020 - 8:12:03 PM - Aaron Bertrand Back To Top (86296)
Grzegorz, "It is a problem of Azure SQL Database, not mine." If only you were the only SQL Server customer! In the meantime, there are plenty of people who don't have the luxury of just ignoring the cloud.

And Solomon, yes, I agree that in a perfect world everyone would manage and deploy CLR perfectly. Let me know when you find the perfect world, because I have no idea where it is.

Wednesday, August 12, 2020 - 1:35:51 PM - Solomon Rutzky Back To Top (86294)
Hello Aaron and Grzegorz:

1. Perhaps the "WK" actually stands for "Works"? ;-) Or, perhaps STRING_SPLIT_WTF ? But seriously, creating a separate function to get around backwards compatibility is a good idea.

2. Regarding the SQLCLR idea:
a) True, Azure SQL DB not supporting it at all is the real major issue. Of course, for those who have no plans to move to Azure, it's also an irrelevant issue. Sadly, I suspect that it was disabled on Azure due to more of a misunderstanding than it being a proven security issue.

b) It doesn't need to add much complexity to CI / automated environments. There are tools to reduce complexity, such as my open source project "BinaryFormatter' on GitHub (under SqlQuantumLeap). Also, this is one of the benefits of using an existing library such as my SQL# ( https://SQLsharp.com/ ) project. It's a single, self-contained T-SQL install script. Super easy to deploy. It even has a few string splitters to handle various scenarios, such as [String_SplitInts] and [String_SplitIntoIntegers] that convert the split value into BIGINT in the .NET code so it can pass back a table of BIGINT values, which is faster than passing back a table of strings to then convert to INT, etc in T-SQL. It even allows for optionally passing back a single row of NULL if the input is NULL (which I think is silly, but some people are really into that sort of thing). There's no way that an MS-provided additional built-in function would support such customizations, especially not being able to tailor how to handle scenarios such as an empty element (1,,3 -- ignore/skip it, error, return 0) or a non-INT value (1,e,3 -- ignore/skip it, or error), both of which [String_SplitIntoIntegers] (paid version only) does allow you to customize.

c) Only being allowed to deploy an assembly via a VARBINARY literal (i.e. hex bytes) is not only _not_ a deficiency of Azure SQL DB Managed Instance, it is actually the only way that assemblies should ever be deployed. Assemblies should _never_ be deployed via DLL as that's an unnecessary external dependency that greatly complicates automation / CI, and for no actual benefit. SSDT stopped using DLLs many years ago, and anyone not using SSDT can easily convert the compiled DLL into hex bytes using my "BinaryFormatter" utility available on GitHub.

Take care,
Solomon...

Wednesday, August 12, 2020 - 11:53:01 AM - Grzegorz Łyp Back To Top (86292)
This solution is as simple as it sounds.

(1) It is a problem of Azure SQL Database, not mine. In comparison to full SQL Server, Azure SQL Database is just useless toy.
(2) How huge? It is simple to manage with hundreds of customers with different versions of SQL and DBs. We do it without any problems.
(3) If you manage CLR with huge environments, using binaries with Managed Instance is again trivial problem.

CLR was introduced to allow developers to create extended functionality, and especially splitting strings is so trivial that it should be introduced a long time ago, but somehow it is difficult to see the developers needs by MS.

Tuesday, August 11, 2020 - 8:57:46 AM - Aaron Bertrand Back To Top (86276)
But Grzegorz, the solution isn't that simple. Not everyone has the luxury of deploying CLR wherever they want. Also:

(1) CLR is a dead end for anyone who will ever move any of their infrastructure to Azure SQL Database.

(2) It introduces complexity to deployment pipelines (think huge environments where changes to modules - including CLR - are automated, often using replication).

(3) It is a problem even for Managed Instance - yes, it's supported, but you have to deploy from binary, which is not impossible, but another deployment pipeline complication.

I don't see why we can't improve the native offering - this doesn't prevent you from continuing to paint yourself into a corner with CLR. :-)

Tuesday, August 11, 2020 - 6:11:00 AM - Grzegorz Łyp Back To Top (86274)
Solution is simple. Write CLR. We have such since SQL 2005 and it works at the same speed as STRING_SPLIT.














get free sql tips
agree to terms