Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 
The Trade-off Between SQL Server Security and Performance - Free Webinar
 

Solve old problems with SQL Serverís new STRING_AGG and STRING_SPLIT functions


By:   |   Last Updated: 2018-01-19   |   Comments   |   Related Tips: More > Functions - System

Problem

Throughout my career working with SQL Server, I have solved many problems using custom functions or CLR to ultimately assemble a string from parts or break a string into parts. An example from a previous tip, ďRemoving Duplicates from Strings in SQL Server,Ē showed how to create two T-SQL functions to disassemble and reassemble a multi-token string so that each token was listed just once. Common problems like this can be solved in much less code using new built-in functions that have been added since SQL Server 2016 was released.

Solution

To revisit, the earlier tip had a string like this:

Bravo/Alpha/Bravo/Tango/Delta/Bravo/Alpha/Delta

And the object was to produce a string like this (each unique item ordered by original appearance):

Bravo/Alpha/Tango/Delta

Or this one (each unique item ordered alphabetically):

Alpha/Bravo/Delta/Tango

To get to this result, I created a numbers table, and two user-defined functions, one using FOR XML PATH. These are certainly not overly efficient methods, and you can see how verbose the code is by looking at the previous tip. In SQL Server 2017, we have a much more concise solution (and it can be slightly better even in SQL Server 2016).

SQL Server 2016 added a new STRING_SPLIT() function and, as I have blogged about, this is both more concise and more efficient than elaborate T-SQL functions (and even CLR). Now the code to see the list of items in the original list is this simple:

DECLARE @List nvarchar(MAX), @Delim nchar(1);
SELECT @List  = N'Bravo/Alpha/Bravo/Tango/Delta/Bravo/Alpha/Delta',
       @Delim = N'/';
SELECT value FROM STRING_SPLIT(@List, @Delim);

Producing this output:

bravo

A couple of limitations exist. One limitation is that the delimiter can only be a single character. This means that if your original string uses three pipes or any other complex sequence of characters to delimit the individual strings, youíll have to manipulate that string first (by replacing your character sequence with a single character that canít appear in the data). Iíve shown a few ways to deal with this scenario in another tip, ďDealing with the single-character delimiter in SQL Server's STRING_SPLIT function

The other limitation is that value is the only output column, so there isnít a straightforward way to indicate the order of the output column in the original string. Generally, you will find that they come out in the same order they went in, but this cannot be relied upon (especially if you are then passing that set into another function, which weíll get to in a minute). If maintaining the order in the original string is important, we can solve this by adding our own output column, though we should only do this if we have to (since these things are not free). With this code:

SELECT value, 
    pointer = CHARINDEX(@Delim + value + @Delim, @Delim + @List + @Delim)
FROM STRING_SPLIT(@List, @Delim)
ORDER BY pointer;

Now the output looks like this (and you can see how applying DISTINCT or GROUP BY will eliminate the duplicates easily):

value

(The fact the pointer always reflects the first instance of the value in the original string is irrelevant, as it doesnít really matter which unique value we get.)

Now, SQL Server 2017 adds an aggregate function called STRING_AGG(), which can take this set directly and reassemble these parts back into a single, slash-separated string, without the duplicates. The simplest approach would look like this:

SELECT Reassembled = STRING_AGG(value, @Delim)
  FROM STRING_SPLIT(@List, @Delim);

Results:

reassembled

Of course, this doesnít remove duplicates, and by nothing more than coincidence, reassembles the string in the original order. In order to remove duplicates, weíll need to use a subquery, and we can use WITHIN GROUP to specify alphabetical order:

SELECT Reassembled_Alpha = STRING_AGG(value, @Delim)
  WITHIN GROUP (ORDER BY value)
  FROM (SELECT value FROM STRING_SPLIT(@List, @Delim) GROUP BY value) AS x;

Results:

reassembled alpha

To return the string in original order, it gets a bit more complex:

SELECT Reassembled_Original = STRING_AGG(value, @Delim)
  WITHIN GROUP (ORDER BY pointer)
  FROM (SELECT value,
    pointer = CHARINDEX(@Delim + value + @Delim, @Delim + @List + @Delim)
    FROM STRING_SPLIT(@List, @Delim) GROUP BY value) AS x;

Results:

bravo

Something else that might be interesting is to include the number of values that appeared in the original string Ė there were three Bravos, for example, and two Deltas. This looks very much like the code above, except we add some formatting to STRING_AGG() and an additional COUNT(*) column from the subquery:

SELECT Reassembled_Orig_With_Counts
    = STRING_AGG(value + N' (' + RTRIM(c) + N')', @Delim) WITHIN GROUP (ORDER BY pointer)
  FROM (SELECT value, c = COUNT(*),
    pointer = CHARINDEX(@Delim + value + @Delim, @Delim + @List + @Delim)
    FROM STRING_SPLIT(@List, @Delim) GROUP BY value) AS x;

Results:

alpha

Summary

This is just one example where string splitting and concatenating can come in handy. Depending on your requirements, it might be tough to call the resulting code ďsimple,Ē but these new functions are certainly less verbose and far less tedious than earlier methods. All of my testing suggests that, in most cases, the performance will be better, too, but Iíll leave that as an exercise to the reader (in his or her specific scenario).

Next Steps

Check out these tips and other resources involving splitting and concatenating strings:



Last Updated: 2018-01-19


next webcast button


next tip button



About the author
MSSQLTips author Aaron Bertrand Aaron Bertrand (@AaronBertrand) is a Product Manager at SentryOne, with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and serves as a community moderator for the Database Administrators Stack Exchange.

View all my tips





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Learn more about SQL Server tools