Problem
SQL Server users have been asking for native regular expression support for over two decades. There are third-party Common Language Runtime (CLR) modules that offer this functionality, but these can be complicated to install and simply aren’t possible in some environments. I want to split a string using a regular expression instead of a static string. Will that be possible in SQL Server 2025, without CLR?
Solution
Several of the new Regex features have been described already:
- Basic Regex Emulator for SQL Server
- New SQL Regex Functions in SQL Server 2025 and SSMS
- Introducing Regular Expression (Regex) Support in Azure SQL DB
But, one function that hasn’t been overly flaunted publicly is REGEXP_SPLIT_TO_TABLE. So I thought I would take a closer look.
Let’s say you have a string that could have multiple delimiters, like this:
Gretzky|Orr,Lemieux;Pastrnak,Bucyk|Esposito
Or a unioned set from different sources that each has its own delimiter, like this:
Gretzky|Orr
Lemieux,Pastrnak
Bucyk;Esposito
Using STRING_SPLIT
You can’t pass a LIKE
or other wildcard expression or pattern to STRING_SPLIT
– it requires a single, fixed string. You could do something like this:
SELECT value
FROM STRING_SPLIT(TRANSLATE(input, '|;', ',,'), ',');
Yuck. This falls apart quickly when you have more possible delimiters or if you want to do anything more complex.
Using REGEXP_SPLIT_TO_TABLE
The new function lets you pass in a simpler expression with no translations or counting required:
SELECT value
FROM REGEXP_SPLIT_TO_TABLE(input, '[|;,]');
And what if you want two or three or 17 consecutive delimiters to be ignored and treated as a single delimiter? This requires one additional character:
SELECT value
FROM REGEXP_SPLIT_TO_TABLE(input, '[|;,]+');
String Split on Spaces, Tabs, Carriage Returns and Line Feeds
How about splitting on any white space – spaces, tabs, carriage returns, line feeds, etc.? Sure, you could construct a string that contains all the relevant characters, and transform them all to a single delimiter using TRANSLATE
or REPLACE
. But this is so ugly, I’m not even going to show the code.
Look how easy this is with the new function:
SELECT value
FROM REGEXP_SPLIT_TO_TABLE(input, '\s');
And if you want consecutive white space characters to be lumped together:
SELECT value
FROM REGEXP_SPLIT_TO_TABLE(input, '\s+');
So, for example:
SELECT value
FROM REGEXP_SPLIT_TO_TABLE(N'space: fourspaces: tab: crlf:
end', '\s+');
Yields:
value
----------
space:
fourspaces:
tab:
crlf:
end
Split String on Non-alphanumeric Characters
What about splitting on any non-alphanumeric character? All the white space already mentioned, or any punctuation, or any emojis? You can do this by saying any pattern that does not match the property of being a letter or a number:
SELECT value
FROM REGEXP_SPLIT_TO_TABLE(input, '[^\p{L}\p{N}]+');
So, for example:
SELECT value
FROM REGEXP_SPLIT_TO_TABLE (N'hello👋world🌍 🌏tab: end', '[^\p{L}\p{N}]+');
Yields:
value
----------
hello
world
tab
end
It’s powerful to be able to pre-emptively exclude any character outside of a defined set without even having to list out that defined set. And most of these examples – and so many others that are possible – are things I just don’t want to piece together how to accomplish with STRING_SPLIT
.
A Wrapper Function
In my case, I found it useful to create a multi-purpose function that could perform either type of split. This can be useful if you have specific rules you want to apply to the output and don’t want to duplicate that business logic in multiple functions:
CREATE OR ALTER FUNCTION dbo.split
(
@string nvarchar(4000),
@pattern nvarchar(4000),
@type char(6) /* regexp, legacy */
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT value, ordinal
FROM STRING_SPLIT(@string, @pattern, 1)
WHERE @type = 'legacy'
UNION ALL
SELECT value, ordinal
FROM REGEXP_SPLIT_TO_TABLE(@string, @pattern)
WHERE @type = 'regexp'
);
The union all maybe suggests that SQL Server would waste time trying to evaluate both functions before the predicate is applied, but the execution plan shows that short-circuiting is in full effect:

Caveats
The new function doesn’t support lookaheads, for example, and that’s not Microsoft’s fault. The implementation is based on the ICU (International Components for Unicode) library, which doesn’t support advanced functionality like lookaheads, lookbehinds, or conditionals.
Here is what I wanted to be able to do – ignore delimiters that also happen to exist inside quoted strings, a common problem when dealing with data from external sources:
DECLARE @str nvarchar(4000), @regex nvarchar(4000);
SET @str = N'Aaron,B,"33 Front St, Suite 4B",Whitehorse,YT,Y1A 3BH';
SET @regex = N',(?=(?:[^"]*"[^"]*")*[^"]*$)';
SELECT value FROM REGEXP_SPLIT_TO_TABLE(@str, @regex);
However, when I try that, I receive the following error:
Msg 19300, Level 16, State 2
An invalid Pattern ',(?=(?:[^"]*"[^"]*")*[^"]*$)' was provided.
Error 'invalid perl operator: (?=' occurred during evaluation of the Pattern.
Another issue I came across is that, in spite of the preview documentation implying (though not officially stating directly) that LOB types are supported, if you try, you will get this error:
Msg 19304, Level 16, State 5
Currently, 'REGEXP_SPLIT_TO_TABLE' function does not support NVARCHAR(max)/VARCHAR(max) inputs.
Based on the “Currently,”
bit, I can only assume they’ll fix this one before release, but I don’t have any inside information on that.
Conclusion
This is a welcome addition to SQL Server, and while not perfect, I can already envision how it will help solve some interesting parsing problems I’ve encountered as recently as this year.
Next Steps
Review the following tips and other resources: