Replace Text using RegEx with SQL Server 2025 REGEXP_REPLACE

Problem

We have a table with a string column that contains free text data, input by our customer service. This text might contain sensitive data, such as zip codes or credit card numbers, that we need to mask due to privacy regulations. The problem is that there are several possible formats making it hard to detect the patterns using regular SQL. Is there a way to solve this in SQL Server?

Solution

SQL Server 2025 will introduce support for regular expressions (also known as regex) in the T-SQL language. There are already several tips published on regex and their functionality in SQL Server 2025:

We’ll add another regex function to our arsenal: REGEXP_REPLACE.

With this function, we can replace a piece of text matching a regex expression with another piece of text. REGEXP_REPLACE can be used for removing or masking of data, but also to standardize formatting.

If you want to try the regex functionality out yourself, you can install the latest SQL Server 2025 preview or you can use an Azure SQL DB or Fabric SQL DB. As with all preview features, functionality and/or behavior might change when the final product is released.

The REGEXP_REPLACE Function

Syntax

This new function allows to search text for a specific regex pattern and replace it with another piece of text:

--MSSQLTips.com
REGEXP_REPLACE(string_expression, regular expression [, replacement string [, start [, occurrence [, flags]]]])

The following two parameters are mandatory:

  • string expression – which is the text in which you want to search. You can use (N)VARCHAR(MAX) as data type, but the input is limited to 2MB.
  • regular expression – which is the pattern you want to match

All the other parameters are optional:

  • replacement string – if a match is found, it’s replaced with this expression. It is possible to reuse groups of the regex expression (subexpressions), which will be explained in a later section.
  • start – which is the starting point of the search. This allows you to skip several characters in the text. It starts counting from 1. If start is bigger than the length of the string, the original string expression is returned. The default is 1.
  • occurrence – this specifies which occurrence needs to be replaced if there are multiple matches. The default is 1.
  • flags – these are the same options as explained in the tip REGEXP_LIKE Function in SQL Server 2025 (in the section “Optional Flag Parameter”).

The function is similar to the well-known REPLACE function, with the biggest exception (aside from the regex support of course) is that REPLACE will replace ALL occurrences, and REGEXP_REPLACE only one specific occurrence.

Replace Text using REGEXP_REPLACE

Let’s illustrate the usage of this new function with an example.

Using the same restaurant review sample data as in the tip SQL Server 2025 REGEXP_COUNT Function to Count Occurrences in Text (download the SQL script here), we can use the following regex expression to match zip codes: \d{5}([-]|\s*)?(\d{4})?.

Combined with the REGEXP_REPLACE function, we can mask all zip codes in the review texts by replacing them with “XXX”:

--MSSQLTips.com
SELECT
     ReviewText
    ,ReviewText_Masked = REGEXP_REPLACE(ReviewText,'\d{5}([-]|\s*)?(\d{4})?','XXX')
FROM dbo.Reviews;

The results:

regexp_replace with the original text on the left, and the same text but with masked zip codes on the right.

As you can see in the screenshot, zip codes have been masked even if they were surrounded by brackets or if they were in a different format.

Keep in mind that only the first occurrence will be replaced. If you need to replace multiple occurrences in a single piece of text, you’ll need to use a work around:

  • you could determine the maximum number of occurrences with REGEXP_COUNT and then use a WHILE loop or dynamic SQL to apply REGEXP_REPLACE on the data
  • another method would be to use REGEXP_INSTR to find all the occurrences of a zip code and split the text into separate strings, where each string contains only one zip code. Then you can use REGEXP_REPLACE to mask the zip code and then concatenate everything back together.

Partial Masking by using Subexpressions

The replacement string can contain numbers in the format of \n (e.g. \1, \2, etc.), where the number n references the n’th parenthesized group (which is a subexpression of the regex pattern).

Let’s illustrate with an example to make this concept more clear. Suppose we have credit cards numbers in the following format: 1234 5678 1234 5678 (4 groups of each 4 numbers, separated with spaces). We want to mask this data and show only the last 4 digits: XXXX XXXX XXXX 5678.

We can do this with REGEXP_REPLACE and subexpressions:

--MSSQLTips.com
SELECT REGEXP_REPLACE('1234 5678 1234 5678' --> string to be searched
                     ,'(\d{4}) (\d{4}) (\d{4}) (\d{4})' --> regex pattern contain 4 subexpressions
                     , 'XXXX XXXX XXXX \4'); --> replacement referencing the 4th matched substring

The result:

credit card masking by showing only the last 4 digits

The difference with the dynamic data masking feature of SQL Server is that REGEXP_REPLACE can be used to mask data that appears somewhere in text. Dynamic data masking on the other hand works on a column containing only the data that needs to be masked. In our credit card example, we would need a column that contains only the credit card number. On the other hand, dynamic data masking only masks data for people that don’t have the security clearance to view the data.

Using a View to Mask Data

If you want to implement this with REGEXP_REPLACE, you would need to do this through a view since replacing the actual values is an all-or-nothing approach. The view could be something like this (in pseudo-code):

--MSSQLTips.com
SELECT
    maskedData = CASE WHEN check_user_permissions
                      THEN REGEXP_REPLACE(myData,...) --> see previous example
                      ELSE myData END
FROM myTable;

Since using regex expressions can be computationally expensive and since REGEXP_REPLACE can only replace one single occurrence at a time, it might be a better idea to protect the data before it even enters the database.

Next Steps

Leave a Reply

Your email address will not be published. Required fields are marked *