By: Aaron Bertrand | Comments (1) | Related: 1 | 2 | 3 | 4 | 5 | 6 | > Functions System
Problem
Throughout my career working with SQL Server, I have solved many problems using custom functions or CLR to ultimately assemble a string from parts or break a string into parts. An example from a previous tip, “Removing Duplicates from Strings in SQL Server,” showed how to create two T-SQL functions to disassemble and reassemble a multi-token string so that each token was listed just once. Common problems like this can be solved in much less code using new built-in functions that have been added since SQL Server 2016 was released.
Solution
To revisit, the earlier tip had a string like this:
Bravo/Alpha/Bravo/Tango/Delta/Bravo/Alpha/Delta
And the object was to produce a string like this (each unique item ordered by original appearance):
Bravo/Alpha/Tango/Delta
Or this one (each unique item ordered alphabetically):
Alpha/Bravo/Delta/Tango
To get to this result, I created a numbers table, and two user-defined functions, one using FOR XML PATH. These are certainly not overly efficient methods, and you can see how verbose the code is by looking at the previous tip. In SQL Server 2017, we have a much more concise solution (and it can be slightly better even in SQL Server 2016).
SQL Server 2016 added a new STRING_SPLIT() function and, as I have blogged about, this is both more concise and more efficient than elaborate T-SQL functions (and even CLR). Now the code to see the list of items in the original list is this simple:
DECLARE @List nvarchar(MAX), @Delim nchar(1); SELECT @List = N'Bravo/Alpha/Bravo/Tango/Delta/Bravo/Alpha/Delta', @Delim = N'/'; SELECT value FROM STRING_SPLIT(@List, @Delim);
Producing this output:
A couple of limitations exist. One limitation is that the delimiter can only be a single character. This means that if your original string uses three pipes or any other complex sequence of characters to delimit the individual strings, you’ll have to manipulate that string first (by replacing your character sequence with a single character that can’t appear in the data). I’ve shown a few ways to deal with this scenario in another tip, “Dealing with the single-character delimiter in SQL Server's STRING_SPLIT function.”
The other limitation is that value is the only output column, so there isn’t a straightforward way to indicate the order of the output column in the original string. Generally, you will find that they come out in the same order they went in, but this cannot be relied upon (especially if you are then passing that set into another function, which we’ll get to in a minute). If maintaining the order in the original string is important, we can solve this by adding our own output column, though we should only do this if we have to (since these things are not free). With this code:
SELECT value, pointer = CHARINDEX(@Delim + value + @Delim, @Delim + @List + @Delim) FROM STRING_SPLIT(@List, @Delim) ORDER BY pointer;
Now the output looks like this (and you can see how applying DISTINCT or GROUP BY will eliminate the duplicates easily):
(The fact the pointer always reflects the first instance of the value in the original string is irrelevant, as it doesn’t really matter which unique value we get.)
Now, SQL Server 2017 adds an aggregate function called STRING_AGG(), which can take this set directly and reassemble these parts back into a single, slash-separated string, without the duplicates. The simplest approach would look like this:
SELECT Reassembled = STRING_AGG(value, @Delim) FROM STRING_SPLIT(@List, @Delim);
Results:
Of course, this doesn’t remove duplicates, and by nothing more than coincidence, reassembles the string in the original order. In order to remove duplicates, we’ll need to use a subquery, and we can use WITHIN GROUP to specify alphabetical order:
SELECT Reassembled_Alpha = STRING_AGG(value, @Delim)
WITHIN GROUP (ORDER BY value)
FROM (SELECT value FROM STRING_SPLIT(@List, @Delim) GROUP BY value) AS x;
Results:
To return the string in original order, it gets a bit more complex:
SELECT Reassembled_Original = STRING_AGG(value, @Delim)
WITHIN GROUP (ORDER BY pointer)
FROM (SELECT value,
pointer = CHARINDEX(@Delim + value + @Delim, @Delim + @List + @Delim)
FROM STRING_SPLIT(@List, @Delim) GROUP BY value) AS x;
Results:
Something else that might be interesting is to include the number of values that appeared in the original string – there were three Bravos, for example, and two Deltas. This looks very much like the code above, except we add some formatting to STRING_AGG() and an additional COUNT(*) column from the subquery:
SELECT Reassembled_Orig_With_Counts
= STRING_AGG(value + N' (' + RTRIM(c) + N')', @Delim) WITHIN GROUP (ORDER BY pointer)
FROM (SELECT value, c = COUNT(*),
pointer = CHARINDEX(@Delim + value + @Delim, @Delim + @List + @Delim)
FROM STRING_SPLIT(@List, @Delim) GROUP BY value) AS x;
Results:
Here is another example to rollup columns for indexes.
-- Purpose: This query will list all indexes in the database and show index columns and included columns using STRING_AGG (SQL 2017 and later). -- Run this in a user database -- Revision: 2019-05-14 -- SELECT SCHEMA_NAME(ss.SCHEMA_id) AS SCHEMANAME, ss.name as TableName, ss2.name as IndexName, ss2.index_id, (SELECT STRING_AGG(name,', ') from sys.index_columns a inner join sys.all_columns b on a.object_id = b.object_id and a.column_id = b.column_id and a.object_id = ss.object_id and a.index_id = ss2.index_id and is_included_column = 0 ) as IndexColumns, (SELECT STRING_AGG(name,', ') from sys.index_columns a inner join sys.all_columns b on a.object_id = b.object_id and a.column_id = b.column_id and a.object_id = ss.object_id and a.index_id = ss2.index_id and is_included_column = 1 ) as IncludedColumns FROM sys.objects SS INNER JOIN SYS.INDEXES ss2 ON ss.OBJECT_ID = ss2.OBJECT_ID WHERE ss.type = 'U' ORDER BY 1, 2, 3
Summary
This is just one example where string splitting and concatenating can come in handy. Depending on your requirements, it might be tough to call the resulting code “simple,” but these new functions are certainly less verbose and far less tedious than earlier methods. All of my testing suggests that, in most cases, the performance will be better, too, but I’ll leave that as an exercise to the reader (in his or her specific scenario).
Next Steps
Check out these tips and other resources involving splitting and concatenating strings:
- Rolling up multiple rows into a single row and column for SQL Server data
- Splitting Delimited Strings Using XML in SQL Server
- Dealing with the single-character delimiter in SQL Server's STRING_SPLIT function
- Removing Duplicates from Strings in SQL Server
- Performance Surprises and Assumptions : STRING_SPLIT()
- SQL Server v.Next : STRING_AGG() performance
- Comparing string splitting / concatenation methods
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips