Solve old problems with SQL Server’s new STRING_AGG and STRING_SPLIT functions

By:   |   Comments (1)   |   Related: 1 | 2 | 3 | 4 | 5 | 6 | > Functions System


Problem

Throughout my career working with SQL Server, I have solved many problems using custom functions or CLR to ultimately assemble a string from parts or break a string into parts. An example from a previous tip, “Removing Duplicates from Strings in SQL Server,” showed how to create two T-SQL functions to disassemble and reassemble a multi-token string so that each token was listed just once. Common problems like this can be solved in much less code using new built-in functions that have been added since SQL Server 2016 was released.

Solution

To revisit, the earlier tip had a string like this:

Bravo/Alpha/Bravo/Tango/Delta/Bravo/Alpha/Delta

And the object was to produce a string like this (each unique item ordered by original appearance):

Bravo/Alpha/Tango/Delta

Or this one (each unique item ordered alphabetically):

Alpha/Bravo/Delta/Tango

To get to this result, I created a numbers table, and two user-defined functions, one using FOR XML PATH. These are certainly not overly efficient methods, and you can see how verbose the code is by looking at the previous tip. In SQL Server 2017, we have a much more concise solution (and it can be slightly better even in SQL Server 2016).

SQL Server 2016 added a new STRING_SPLIT() function and, as I have blogged about, this is both more concise and more efficient than elaborate T-SQL functions (and even CLR). Now the code to see the list of items in the original list is this simple:

DECLARE @List nvarchar(MAX), @Delim nchar(1);
SELECT @List  = N'Bravo/Alpha/Bravo/Tango/Delta/Bravo/Alpha/Delta',
       @Delim = N'/';
SELECT value FROM STRING_SPLIT(@List, @Delim);

Producing this output:

bravo

A couple of limitations exist. One limitation is that the delimiter can only be a single character. This means that if your original string uses three pipes or any other complex sequence of characters to delimit the individual strings, you’ll have to manipulate that string first (by replacing your character sequence with a single character that can’t appear in the data). I’ve shown a few ways to deal with this scenario in another tip, “Dealing with the single-character delimiter in SQL Server's STRING_SPLIT function.”

The other limitation is that value is the only output column, so there isn’t a straightforward way to indicate the order of the output column in the original string. Generally, you will find that they come out in the same order they went in, but this cannot be relied upon (especially if you are then passing that set into another function, which we’ll get to in a minute). If maintaining the order in the original string is important, we can solve this by adding our own output column, though we should only do this if we have to (since these things are not free). With this code:

SELECT value, 
    pointer = CHARINDEX(@Delim + value + @Delim, @Delim + @List + @Delim)
FROM STRING_SPLIT(@List, @Delim)
ORDER BY pointer;

Now the output looks like this (and you can see how applying DISTINCT or GROUP BY will eliminate the duplicates easily):

value

(The fact the pointer always reflects the first instance of the value in the original string is irrelevant, as it doesn’t really matter which unique value we get.)

Now, SQL Server 2017 adds an aggregate function called STRING_AGG(), which can take this set directly and reassemble these parts back into a single, slash-separated string, without the duplicates. The simplest approach would look like this:

SELECT Reassembled = STRING_AGG(value, @Delim)
  FROM STRING_SPLIT(@List, @Delim);

Results:

reassembled

Of course, this doesn’t remove duplicates, and by nothing more than coincidence, reassembles the string in the original order. In order to remove duplicates, we’ll need to use a subquery, and we can use WITHIN GROUP to specify alphabetical order:

SELECT Reassembled_Alpha = STRING_AGG(value, @Delim)
  WITHIN GROUP (ORDER BY value)
  FROM (SELECT value FROM STRING_SPLIT(@List, @Delim) GROUP BY value) AS x;

Results:

reassembled alpha

To return the string in original order, it gets a bit more complex:

SELECT Reassembled_Original = STRING_AGG(value, @Delim)
  WITHIN GROUP (ORDER BY pointer)
  FROM (SELECT value,
    pointer = CHARINDEX(@Delim + value + @Delim, @Delim + @List + @Delim)
    FROM STRING_SPLIT(@List, @Delim) GROUP BY value) AS x;

Results:

bravo

Something else that might be interesting is to include the number of values that appeared in the original string – there were three Bravos, for example, and two Deltas. This looks very much like the code above, except we add some formatting to STRING_AGG() and an additional COUNT(*) column from the subquery:

SELECT Reassembled_Orig_With_Counts
    = STRING_AGG(value + N' (' + RTRIM(c) + N')', @Delim) WITHIN GROUP (ORDER BY pointer)
  FROM (SELECT value, c = COUNT(*),
    pointer = CHARINDEX(@Delim + value + @Delim, @Delim + @List + @Delim)
    FROM STRING_SPLIT(@List, @Delim) GROUP BY value) AS x;

Results:

alpha

Here is another example to rollup columns for indexes.

-- Purpose: This query will list all indexes in the database and show index columns and included columns using STRING_AGG (SQL 2017 and later). 
--          Run this in a user database
-- Revision: 2019-05-14
--
SELECT 
   SCHEMA_NAME(ss.SCHEMA_id) AS SCHEMANAME,
   ss.name as TableName, 
   ss2.name as IndexName, 
   ss2.index_id,
   (SELECT STRING_AGG(name,', ') 
    from sys.index_columns a inner join sys.all_columns b on a.object_id = b.object_id and a.column_id = b.column_id and a.object_id = ss.object_id and a.index_id = ss2.index_id and is_included_column = 0
	) as IndexColumns,
   (SELECT STRING_AGG(name,', ') 
    from sys.index_columns a inner join sys.all_columns b on a.object_id = b.object_id and a.column_id = b.column_id and a.object_id = ss.object_id and a.index_id = ss2.index_id and is_included_column = 1
    ) as IncludedColumns
FROM sys.objects SS INNER JOIN SYS.INDEXES ss2 ON ss.OBJECT_ID = ss2.OBJECT_ID 
WHERE ss.type = 'U'
ORDER BY 1, 2, 3   

Summary

This is just one example where string splitting and concatenating can come in handy. Depending on your requirements, it might be tough to call the resulting code “simple,” but these new functions are certainly less verbose and far less tedious than earlier methods. All of my testing suggests that, in most cases, the performance will be better, too, but I’ll leave that as an exercise to the reader (in his or her specific scenario).

Next Steps

Check out these tips and other resources involving splitting and concatenating strings:



sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Aaron Bertrand Aaron Bertrand (@AaronBertrand) is a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and also blogs at sqlblog.org.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Saturday, April 18, 2020 - 9:17:28 PM - Ohad Back To Top (85408)

Wow that's a great tip! Thanks for sharing, I'm sure it will come handy some time in future.















get free sql tips
agree to terms