By: Aaron Bertrand | Comments | Related: 1 | 2 | 3 | 4 | 5 | 6 | > Functions System
Problem
SQL Server 2016 introduced a new built-in function, STRING_SPLIT
, far more efficient
than the functions we've been implementing ourselves all these years. As they
upgrade to newer versions of SQL Server, one project a lot of shops will be taking
on is replacing all those inefficient functions with direct (or indirect) calls
to STRING_SPLIT
. Unfortunately, there are some known limitations that may prevent
this from being an easy, one-for-one swap. I wanted to share some ways you can work
around these issues in order to have a successful deployment.
Solution
Moving to the new, more efficient function certainly sounds straightforward:
replace all calls to dbo.YourTableValuedSplitFunction
with calls to
STRING_SPLIT
.
But there can be some significant barriers preventing you from changing this directly,
including:
Query Access
- You may not be able to change the queries directly due to source control policies, or because they are in external application code which can't be deployed right away (or locked in by a vendor).
- In these cases, your only option is to leave the existing function in place,
but change its body directly.
Compatibility Level
- This is one that catches people by surprise: while compatibility level is irrelevant to most new syntax in SQL Server, this specific function has a minimum requirement of 130. Your database may be in a lower compatibility level for a variety of reasons.
- In this
case, the only way to use
STRING_SPLIT
is to have it run in a database with compat level 130.
Different Behavior
- There are several ways the native function may differ from the function(s) you created:
- The new function has a single output column,
value
. Your existing queries may use a different column name for the individual items coming out, and it may rely on other output columns too, like an index or ordinal position. - The
STRING_SPLIT
function does not eliminate duplicates, return a data type other than strings, or guarantee that the values will be returned in the same order they appear in the list. - Your function may handle leading or trailing delimiters, NULL values, or white space differently, and it may accept multi-character delimiters, which the native function does not.
- In these cases, you will need to create a wrapper around
STRING_SPLIT
to get these behaviors.
Solving the Query Access Problem
As long as you can change the function itself, it isn't such a big deal if you are prevented from changing individual queries that call it. Let's say you have this multi-statement table-valued function:
CREATE FUNCTION dbo.SimpleSplitFunction
(
@List nvarchar(max),
@Delimiter nchar(1)
)
RETURNS @t table (Item nvarchar(max))
AS
BEGIN
SET @List += @Delimiter; ;WITH a(f,t) AS
(
SELECT CAST(1 AS bigint), CHARINDEX(@Delimiter, @List)
UNION ALL
SELECT t + 1, CHARINDEX(@Delimiter, @List, t + 1)
FROM a WHERE CHARINDEX(@Delimiter, @List, t + 1) > 0
)
INSERT @t SELECT SUBSTRING(@List, f, t - f) FROM a OPTION (MAXRECURSION 0); RETURN;
END
GO
And there is a query like this, somewhere, that you can't change:
SELECT Item FROM dbo.SimpleSplitFunction(N'foo,bar', N',');
You can implement STRING_SPLIT
by altering the function as follows:
ALTER FUNCTION dbo.SimpleSplitFunction
(
@List nvarchar(max),
@Delimiter nchar(1)
)
RETURNS @t table (Item nvarchar(max))
AS
BEGIN
INSERT @t(Item) SELECT value FROM STRING_SPLIT(@List, @Delimiter);
RETURN;
END
GO
Even better, though, would be changing it to an inline table-valued function (which requires drop/create):
DROP FUNCTION dbo.SimpleSplitFunction;
GO CREATE FUNCTION dbo.SimpleSplitFunction
(
@List nvarchar(max),
@Delimiter nchar(1)
)
RETURNS TABLE
AS
RETURN (SELECT Item = value FROM STRING_SPLIT(@List, @Delimiter));
GO
In either case, code that currently calls this function doesn't have to change at all.
Solving the Compatibility Level Problem
We have several databases in older compatibility levels, so changing queries
running in the context of those databases to use STRING_SPLIT
is not possible. If
you try to create or alter a function in a lower compatibility level database, you'll
immediately encounter a compile time error:
CREATE DATABASE CompatLevel_120;
GO ALTER DATABASE CompatLevel_120 SET COMPATIBILITY_LEVEL = 120;
GO USE CompatLevel_120;
GO CREATE FUNCTION dbo.SimpleSplitFunction
(
@List nvarchar(max),
@Delimiter nchar(1)
)
RETURNS TABLE
AS
RETURN (SELECT Item=value FROM STRING_SPLIT(@List, @Delimiter));
GO
Error message:
Msg 208, Level 16, State 1, Procedure SimpleSplitFunction, Line 8
Invalid object name 'STRING_SPLIT'.
Assuming there is a database on the instance that is in compat level 130, or
that you can create such a database, there are a couple of ways you can still use
STRING_SPLIT
in that database. First, create the above function (or your variation)
in, say, the Utility
database. Then, either change the function in the original
database as follows, so that it acts as a relay:
USE CompatLevel_120;
GO CREATE FUNCTION dbo.SimpleSplitFunction
(
@List nvarchar(max),
@Delimiter nchar(1)
)
RETURNS TABLE
AS
RETURN (SELECT Item FROM Utility.dbo.SimpleSplitFunction(@List, @Delimiter));
GO
Or drop the original function and add a synonym, which is a slightly simpler way to accomplish the same thing:
USE CompatLevel_120;
GO CREATE SYNONYM dbo.SimpleSplitFunction FOR Utility.dbo.SimpleSplitFunction;
In either case, the calling code doesn't have to change at all, until you
are able to increase the compatibility level and replace these function references
in the queries with simpler STRING_SPLIT
calls.
Solving Behavior Problems
I listed several behavior changes above that you may have coded into your own
functions, but aren't possible with the native STRING_SPLIT. A good compromise
is changing your functions to use STRING_SPLIT
while still supporting those additional
features, some of which are easier to implement than others.
Output a different column name
This one is kind of a softball, and already demonstrated above. If value
isn't
the right name for the output column, your function can just say:
RETURN (SELECT YourColumnName = value FROM STRING_SPLIT(@List, @Delimiter));
Eliminate duplicates
Eliminating duplicates is another common feature of splitting functions, and
this can be accomplished with STRING_SPLIT
simply by using GROUP BY:
RETURN (SELECT YourColumnName = value FROM STRING_SPLIT(@List, @Delimiter) GROUP BY value);
I look at this in more detail in an earlier tip (written before
STRING_SPLIT
), "Removing
Duplicates from Strings in SQL Server."
Output an ordering column
A pretty common enhancement is to indicate the location of an individual string
value from the original list (they may be listed by priority). If you don't
expect duplicates (or only care about the location of the first duplicate),
you can simply use CHARINDEX
:
RETURN
(
SELECT value, ListPos = ROW_NUMBER() OVER (ORDER BY pointer)
FROM
(
SELECT value,
pointer = CHARINDEX(@Delimiter + value + @Delimiter,
@Delimiter + @List + @Delimiter)
FROM STRING_SPLIT(@List, @Delimiter)
) AS s
);
The outer query can then ORDER BY ListPos
or, if you're trying to find
the nth element, WHERE ListPos = n
. You could also add
pointer
to the
output to determine how far into the string a given entity appears.
If you might have duplicates (and you aren't going to eliminate them first), I investigate some other approaches in a previous tip, "Solve old problems with SQL Server's new STRING_AGG and STRING_SPLIT functions."
Output a different data type
If you need to output a specific type, you can simply apply CONVERT(value)
or
TRY_CONVERT(value)
before returning, just make sure the calling code is prepared
to handle any errors (or deal with NULL
).
RETURN (SELECT value = TRY_CONVERT(int, value) FROM STRING_SPLIT(@List, @Delimiter));
Ignore leading/trailing delimiters
Some strings will inevitably come in with empty elements at the beginning or end. The following returns 5 rows, with the first and last being empty strings:
SELECT value FROM STRING_SPLIT(N',foo,bar,blat,',N',');
If you want to make sure only values that aren't empty strings are returned:
RETURN (SELECT value FROM STRING_SPLIT(@List, @Delimiter) WHERE RTRIM (value) > N'');
However, if you're okay with empty strings anywhere except the beginning and end of the list, it gets a little more complicated. In SQL Server 2017 and above, you can use TRIM to first remove any leading/trailing delimiters. This returns 3 rows:
SELECT * FROM STRING_SPLIT(TRIM(N',' FROM N',foo,bar,blat,'),N',');
In your function, this is simply:
RETURN (SELECT value FROM STRING_SPLIT(TRIM(@Delimiter FROM @List), @Delimiter));
In older versions, it's much messier (and I'll confess, I didn't spend a lot of time making this more concise – there are probably shorter paths):
RETURN
(
WITH l(l,d)AS (SELECT LTRIM(RTRIM(@List)), @Delimiter),
x(l,d) AS (SELECT STUFF(l,1, CASE WHEN LEFT(l,1)= d THEN 1 ELSE 0 END, N''), d FROM l),
y(l,d) AS (SELECT LEFT(l,LEN(l) - CASE WHEN RIGHT(l,1)= d THEN 1 ELSE 0 END), d FROM x)
SELECT s.value FROM y CROSS APPLY STRING_SPLIT(y.l, y.d) AS s
);
Ignore white space
All kinds of white space characters can appear before or, more commonly, after a string. STRING_SPLIT will treat a space or tab as a valid element, so the following returns 4 rows, even though these are arguably invalid elements:
SELECT * FROM STRING_SPLIT(CHAR(9) + ',' + CHAR(13) + ',' + CHAR(10) + ',' + CHAR(32),',');
If you want those to be eliminated from the output, in 2017 and above, you can
again use TRIM
to remove carriage returns, line feeds, spaces, and tabs from the
beginning and end of either the list or any element.
RETURN
(
SELECT value FROM
(
SELECT TRIM(CHAR(9)+CHAR(10)+CHAR(13)+CHAR(32) FROM value)
FROM STRING_SPLIT
(
TRIM(CHAR(9)+CHAR(10)+CHAR(13)+CHAR(32) FROM @List),
@Delimiter
)
)AS y (value) WHERE value > N''
);
On older versions, again, this will involve a messy solution, much messier than a single leading or trailing delimiter (because what happens if a string starts with 12 tabs and ends with 47 carriage returns?). And I can't possibly predict all the use cases out there – you may want to treat a tab or line feed as a valid element, or only ignore or accept them when they're the first or last element. So I'll leave this as an exercise for the reader (for now).
Handle multi-character delimiters
Sadly, STRING_SPLIT
can only handle delimiters with one character, and
sometimes that is not enough. I show some ways you can deal with this limitation
in a previous tip, "Dealing
with the single-character delimiter in SQL Server's STRING_SPLIT function."
Conclusion
STRING_SPLIT
is a fantastic addition to modern versions of SQL Server, but in
some ways it is inferior to the solutions you implemented while you were waiting
for a native solution. With some of the suggestions I've listed here, I hope
you're still able to take advantage, even if it involves temporary smoke and
mirrors.
Next Steps
Read on for related tips and other resources involving string splitting:
- SQL Server 2016 STRING_SPLIT Function
- Parsing string data with the new SQL Server 2016 STRING_SPLIT function
- SQL Server CLR and SQL split functions to parse a delimited string
- Split strings the right way – or the next best way
- Performance Surprises and Assumptions : STRING_SPLIT()
- Splitting Delimited Strings Using XML in SQL Server
- Dealing with the single-character delimiter in SQL Server's STRING_SPLIT function
- Removing Duplicates from Strings in SQL Server
- Comparing string splitting / concatenation methods
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips