String Split and New Regex Functions in SQL

Problem

I have text stored in my SQL Server database, and I want to split it into all its different words. T-SQL has supported the STRING_SPLIT function since SQL Server 2016, but it only allows one single delimiter. There are multiple possible delimiters, such as commas, spaces, carriage returns and so on. It seems quite cumbersome to nest multiple STRING_SPLIT using APPLY. Is there a better option out there that doesn’t involve custom coding with CLR?

Solution

In the new SQL Server 2025 release, support for regular expressions (regex) has been added to the T-SQL query language. There’s already an overview of all these functions available: New SQL Regex Functions in SQL Server 2025 and SSMS. In this tip, we’ll take a closer look at the new function REGEXP_SPLIT_TO_TABLE and how it can help us split text using one or more delimiters. The idea is that the delimiters we need comply with a certain regex pattern.

If you don’t have SQL Server 2025 available (you can download the evaluation edition here and the developer edition here), you can also use Azure SQL DB, Fabric SQL DB or Azure SQL Server Managed Instance to try out the new regex functions.

REGEXP_SPLIT_TO_TABLE Function Syntax

The following syntax is used for the new function:

--MSSQLTips.com
REGEXP_SPLIT_TO_TABLE(string expression, regular expression, [flags])

The first two parameters are mandatory:

string expression, which is the text (this can also be a column) in which you want to search
regular expression, which is the pattern you want to match

The one optional parameter – flags – and its options are described in the tip REGEXP_LIKE Function in SQL Server 2025 (in the section “The Optional Flag Parameter”). Like the REGEXP_MATCHES function, REGEXP_SPLIT_TO_TABLE is a table-valued function, which means it’s called inside the FROM clause and the result of the function – as the name already implies – is a table. For example:

--MSSQLTips.com
SELECT *
FROM REGEXP_SPLIT_TO_TABLE('Hello MSSQLTips readers!',' ');

The result:

result set of simple example with REGEXP_SPLIT_TO_TABLE

The returned table contains two columns:

value: these are the actual split substrings from the original text. If none of the characters inside the text match the regex pattern, it means there’s no delimiter found and the original text is returned.
ordinal: the 1-based index value of each split substring

Splitting strings with REGEXP_SPLIT_TO_TABLE

If you only have one delimiter to split your text, you can use STRING_SPLIT function which was introduced in SQL Server 2016. Keep in mind that if you want an ordinal returned (like in the example above), the optional parameter enable_ordinal was introduced in SQL Server 2022. If you’re on an older version of SQL Server, you can use a tally table to build your own splitter. If you have one single fixed delimiter, but it contains multiple characters – $|$ for example – you can use the work around as detailed in the tip Dealing with the single-character delimiter in SQL Server’s STRING_SPLIT function.

But if you need to split text using multiple possible delimiters, the new REGEXP_SPLIT_TO_TABLE might be useful.

REGEXP_SPLIT_TO_TABLE Example

Let’s take the following sample data which was generated using AI (I asked to generate some text about the usefulness of regex functions in the new SQL Server 2025 release):

--MSSQLTips.com
SELECT paragraph = 'The newly introduced regex functions in SQL Server 2025 vastly expand the expressiveness of T-SQL; developers can now perform complex pattern matching, validation, and data cleansing directly within their queries. Instead of relying on CLR assemblies or pushing logic into application code, these built-in functions allow for concise filters, flexible string parsing, and sophisticated text extraction, all while maintaining excellent performance. This improvement not only reduces friction in ETL pipelines but also simplifies long-standing workarounds that previously required multiple LIKE clauses—or even entire procedural blocks—to achieve comparable results.'
INTO #testregex
UNION ALL
SELECT paragraph = 'With native regex support, SQL Server 2025 empowers analysts to detect anomalies, normalize inconsistent values, and enforce data quality rules with precision and speed. Operations such as extracting key tokens, validating formats like emails or IDs, and splitting or transforming structured text can now be executed inline; this makes queries cleaner, more maintainable, and far easier to extend. Combined with enhanced performance optimizations in the engine, these functions turn the database into a more capable text-processing platform, reducing dependency on external tools and promoting more efficient, centralized data workflows.';
SELECT * FROM #testregex;

Suppose we want to split this text into its individual words. Maybe we want to do some analysis on word count or prepare the data for other kinds of text analysis. The regex pattern [^A-Za-z0-9-] can be used to match any character that is NOT of the following:

A-Z – uppercase characters from the alphabet
a-z – lowercase characters from the alphabet
0-9 – numerical characters
And finally the hyphen, because we don’t want to split words like T-SQL.

This is a simplified regex expression, as it might break apart when special symbols (Unicode) from languages like Chinese or Russian are used. Using the REGEXP_SPLIT_TO_TABLE function and this regex pattern, we can split the text into its individual words:

-- MSSQLTips.com
SELECT r.*
FROM #testregex t
CROSS APPLY REGEXP_SPLIT_TO_TABLE(t.paragraph,'[^A-Za-z0-9-]') r;

Because we need to deal with multiple lines of text, we use CROSS APPLY to call the REGEXP_SPLIT_TO_TABLE function for each row of the #testregex table.

With a bit of pre-processing (removing blank lines for example), we can do a word count of our sample text:

-- MSSQLTips.com
WITH cte_splits AS
(
    SELECT word = r.[value]
    FROM #testregex t
    CROSS APPLY REGEXP_SPLIT_TO_TABLE(t.paragraph,'[^A-Za-z0-9-]') r
)
SELECT
     word
    ,wordcount = COUNT(1)
FROM cte_splits
WHERE TRIM(word) <> ''
GROUP BY word
ORDER BY wordcount DESC;

Using a Stopword List

Using a stopword list might be useful to get a better word count analysis. A stopword list is a list of words that don’t have meaning on their own, like “a”, “is”, “and”, and “the”. A similar concept is used in SQL Server Full-Text Search.

Next Steps

There are other tips about the new T-SQL regex functions in SQL Server 2025:
You can find all SQL Server 2025 related tips in this overview.
For an introduction to Full-text Indexes in SQL Server, check out the tip SQL Server Full Text Indexes.
Aaron Bertrand talks about the shortcomings of STRING_SPLIT here, and lists a bunch of useful resources to learn more.

Koen Verbeeck

Koen Verbeeck is a seasoned business intelligence consultant with over a decade of experience with the Microsoft Data Platform. He holds several certifications, including Azure Data Engineer. He’s a prolific writer, with over 375 articles on technologies such as Microsoft Fabric, SSIS, ADF, SSAS, SSRS, MDS, Power BI, Snowflake and Azure services. He has spoken at various events such as PASS, SQLBits, dataMinds Connect and many others. He frequently delivers educational webinars on MSSQLTips.com. For his efforts, Koen has been awarded the Microsoft MVP data platform award for many years.

MSSQLTips Awards:
- Leadership Award (200+ Tips) – 2021
- Author of the Year – 2014/2020/2022
- Author Contender – 2024/2025