SQL Server Split String Replacement Code with STRING_SPLIT


By:   |   Updated: 2020-06-01   |   Comments   |   Related: More > Functions - System

Problem

SQL Server 2016 introduced a new built-in function, STRING_SPLIT, far more efficient than the functions we've been implementing ourselves all these years. As they upgrade to newer versions of SQL Server, one project a lot of shops will be taking on is replacing all those inefficient functions with direct (or indirect) calls to STRING_SPLIT. Unfortunately, there are some known limitations that may prevent this from being an easy, one-for-one swap. I wanted to share some ways you can work around these issues in order to have a successful deployment.

Solution

Moving to the new, more efficient function certainly sounds straightforward: replace all calls to dbo.YourTableValuedSplitFunction with calls to STRING_SPLIT. But there can be some significant barriers preventing you from changing this directly, including:

Query Access

  • You may not be able to change the queries directly due to source control policies, or because they are in external application code which can't be deployed right away (or locked in by a vendor).
  • In these cases, your only option is to leave the existing function in place, but change its body directly.

Compatibility Level

  • This is one that catches people by surprise: while compatibility level is irrelevant to most new syntax in SQL Server, this specific function has a minimum requirement of 130. Your database may be in a lower compatibility level for a variety of reasons.
  • In this case, the only way to use STRING_SPLIT is to have it run in a database with compat level 130.

Different Behavior

  • There are several ways the native function may differ from the function(s) you created:
  1. The new function has a single output column, value. Your existing queries may use a different column name for the individual items coming out, and it may rely on other output columns too, like an index or ordinal position.
  2. The STRING_SPLIT function does not eliminate duplicates, return a data type other than strings, or guarantee that the values will be returned in the same order they appear in the list.
  3. Your function may handle leading or trailing delimiters, NULL values, or white space differently, and it may accept multi-character delimiters, which the native function does not.
  • In these cases, you will need to create a wrapper around STRING_SPLIT to get these behaviors.

Solving the Query Access Problem

As long as you can change the function itself, it isn't such a big deal if you are prevented from changing individual queries that call it. Let's say you have this multi-statement table-valued function:

CREATE FUNCTION dbo.SimpleSplitFunction
(
  @List      nvarchar(max),
  @Delimiter nchar(1)
)
RETURNS @t table (Item nvarchar(max))
AS
BEGIN
  SET @List += @Delimiter;   ;WITH a(f,t) AS 
  (
    SELECT CAST(1 AS bigint), CHARINDEX(@Delimiter, @List)
    UNION ALL
    SELECT t + 1, CHARINDEX(@Delimiter, @List, t + 1)
    FROM a WHERE CHARINDEX(@Delimiter, @List, t + 1) > 0
  ) 
  INSERT @t SELECT SUBSTRING(@List, f, t - f) FROM a OPTION (MAXRECURSION 0);   RETURN
END
GO

And there is a query like this, somewhere, that you can't change:

SELECT Item FROM dbo.SimpleSplitFunction(N'foo,bar', N',');			

You can implement STRING_SPLIT by altering the function as follows:

ALTER FUNCTION dbo.SimpleSplitFunction
(
  @List      nvarchar(max),
  @Delimiter nchar(1)
)
RETURNS @t table (Item nvarchar(max))
AS
BEGIN
  INSERT @t(Item) SELECT value FROM STRING_SPLIT(@List, @Delimiter);
  RETURN;
END
GO

Even better, though, would be changing it to an inline table-valued function (which requires drop/create):

DROP FUNCTION dbo.SimpleSplitFunction;
GO CREATE FUNCTION dbo.SimpleSplitFunction
(
  @List      nvarchar(max),
  @Delimiter nchar(1)
)
RETURNS TABLE
AS
  RETURN (SELECT Item = value FROM STRING_SPLIT(@List, @Delimiter));
GO

In either case, code that currently calls this function doesn't have to change at all.

Solving the Compatibility Level Problem

We have several databases in older compatibility levels, so changing queries running in the context of those databases to use STRING_SPLIT is not possible. If you try to create or alter a function in a lower compatibility level database, you'll immediately encounter a compile time error:

CREATE DATABASE CompatLevel_120;
GO ALTER DATABASE CompatLevel_120 SET COMPATIBILITY_LEVEL = 120;
GO USE CompatLevel_120;
GO CREATE FUNCTION dbo.SimpleSplitFunction
(
  @List      nvarchar(max),
  @Delimiter nchar(1)
)
RETURNS TABLE
AS
  RETURN (SELECT Item=value FROM STRING_SPLIT(@List, @Delimiter));
GO

Error message:

Msg 208, Level 16, State 1, Procedure SimpleSplitFunction, Line 8
Invalid object name 'STRING_SPLIT'.

Assuming there is a database on the instance that is in compat level 130, or that you can create such a database, there are a couple of ways you can still use STRING_SPLIT in that database. First, create the above function (or your variation) in, say, the Utility database. Then, either change the function in the original database as follows, so that it acts as a relay:

USE CompatLevel_120;
GO CREATE FUNCTION dbo.SimpleSplitFunction
(
  @List      nvarchar(max),
  @Delimiter nchar(1)
)
RETURNS TABLE
AS
  RETURN (SELECT Item FROM Utility.dbo.SimpleSplitFunction(@List, @Delimiter));
GO

Or drop the original function and add a synonym, which is a slightly simpler way to accomplish the same thing:

USE CompatLevel_120;
GO CREATE SYNONYM dbo.SimpleSplitFunction FOR Utility.dbo.SimpleSplitFunction;

In either case, the calling code doesn't have to change at all, until you are able to increase the compatibility level and replace these function references in the queries with simpler STRING_SPLIT calls.

Solving Behavior Problems

I listed several behavior changes above that you may have coded into your own functions, but aren't possible with the native STRING_SPLIT. A good compromise is changing your functions to use STRING_SPLIT while still supporting those additional features, some of which are easier to implement than others.

Output a different column name

This one is kind of a softball, and already demonstrated above. If value isn't the right name for the output column, your function can just say:

RETURN (SELECT YourColumnName = value FROM STRING_SPLIT(@List, @Delimiter));			

Eliminate duplicates

Eliminating duplicates is another common feature of splitting functions, and this can be accomplished with STRING_SPLIT simply by using GROUP BY:

RETURN (SELECT YourColumnName = value FROM STRING_SPLIT(@List, @Delimiter) GROUP BY value);			

I look at this in more detail in an earlier tip (written before STRING_SPLIT), "Removing Duplicates from Strings in SQL Server."

Output an ordering column

A pretty common enhancement is to indicate the location of an individual string value from the original list (they may be listed by priority). If you don't expect duplicates (or only care about the location of the first duplicate), you can simply use CHARINDEX:

RETURN
  (
    SELECT value, ListPos = ROW_NUMBER() OVER (ORDER BY pointer)
    FROM
    (
      SELECT value,
             pointer = CHARINDEX(@Delimiter + value + @Delimiter,
                                 @Delimiter + @List + @Delimiter)
      FROM STRING_SPLIT(@List, @Delimiter)
    ) AS s
  );

The outer query can then ORDER BY ListPos or, if you're trying to find the nth element, WHERE ListPos = n. You could also add pointer to the output to determine how far into the string a given entity appears.

If you might have duplicates (and you aren't going to eliminate them first), I investigate some other approaches in a previous tip, "Solve old problems with SQL Server's new STRING_AGG and STRING_SPLIT functions."

Output a different data type

If you need to output a specific type, you can simply apply CONVERT(value) or TRY_CONVERT(value) before returning, just make sure the calling code is prepared to handle any errors (or deal with NULL).

RETURN (SELECT value = TRY_CONVERT(int, value) FROM STRING_SPLIT(@List, @Delimiter));			

Ignore leading/trailing delimiters

Some strings will inevitably come in with empty elements at the beginning or end. The following returns 5 rows, with the first and last being empty strings:

SELECT value FROM STRING_SPLIT(N',foo,bar,blat,',N',');			

If you want to make sure only values that aren't empty strings are returned:

RETURN (SELECT value FROM STRING_SPLIT(@List, @Delimiter) WHERE RTRIM (value) > N'');			

However, if you're okay with empty strings anywhere except the beginning and end of the list, it gets a little more complicated. In SQL Server 2017 and above, you can use TRIM to first remove any leading/trailing delimiters. This returns 3 rows:

SELECT * FROM STRING_SPLIT(TRIM(N',' FROM N',foo,bar,blat,'),N',');			

In your function, this is simply:

RETURN (SELECT value FROM STRING_SPLIT(TRIM(@Delimiter FROM @List), @Delimiter));			

In older versions, it's much messier (and I'll confess, I didn't spend a lot of time making this more concise – there are probably shorter paths):

RETURN
  (
   WITH l(l,d)AS (SELECT LTRIM(RTRIM(@List)), @Delimiter),
    x(l,d) AS (SELECT STUFF(l,1, CASE WHEN LEFT(l,1)= d THEN 1 ELSE 0 END, N''), d FROM l),
    y(l,d) AS (SELECT LEFT(l,LEN(l) - CASE WHEN RIGHT(l,1)= d THEN 1 ELSE 0 END), d FROM x)
   SELECT s.value FROM y CROSS APPLY STRING_SPLIT(y.l, y.d) AS s
  );

Ignore white space

All kinds of white space characters can appear before or, more commonly, after a string. STRING_SPLIT will treat a space or tab as a valid element, so the following returns 4 rows, even though these are arguably invalid elements:

SELECT * FROM STRING_SPLIT(CHAR(9) + ',' + CHAR(13) + ',' + CHAR(10) + ',' + CHAR(32),',');			

If you want those to be eliminated from the output, in 2017 and above, you can again use TRIM to remove carriage returns, line feeds, spaces, and tabs from the beginning and end of either the list or any element.

RETURN
  (
   SELECT value FROM
    (
     SELECT TRIM(CHAR(9)+CHAR(10)+CHAR(13)+CHAR(32) FROM value)
     FROM STRING_SPLIT
      (
       TRIM(CHAR(9)+CHAR(10)+CHAR(13)+CHAR(32) FROM @List),
        @Delimiter
      )
    )AS y (value) WHERE value > N''
  );

On older versions, again, this will involve a messy solution, much messier than a single leading or trailing delimiter (because what happens if a string starts with 12 tabs and ends with 47 carriage returns?). And I can't possibly predict all the use cases out there – you may want to treat a tab or line feed as a valid element, or only ignore or accept them when they're the first or last element. So I'll leave this as an exercise for the reader (for now).

Handle multi-character delimiters

Sadly, STRING_SPLIT can only handle delimiters with one character, and sometimes that is not enough. I show some ways you can deal with this limitation in a previous tip, "Dealing with the single-character delimiter in SQL Server's STRING_SPLIT function."

Conclusion

STRING_SPLIT is a fantastic addition to modern versions of SQL Server, but in some ways it is inferior to the solutions you implemented while you were waiting for a native solution. With some of the suggestions I've listed here, I hope you're still able to take advantage, even if it involves temporary smoke and mirrors.

Next Steps

Read on for related tips and other resources involving string splitting:



Last Updated: 2020-06-01


get scripts

next tip button



About the author
MSSQLTips author Aaron Bertrand Aaron Bertrand (@AaronBertrand) is a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and also blogs at sqlblog.org.

View all my tips





Comments For This Article





download


Recommended Reading

Concatenate SQL Server Columns into a String with CONCAT()

SQL Server Rounding Functions - Round, Ceiling and Floor

SQL Server 2016 STRING_SPLIT Function

Using FOR XML PATH and STRING_AGG() to denormalize SQL Server data

Different ways to get random data for SQL Server data sampling





get free sql tips
agree to terms


Learn more about SQL Server tools