REPLACE versus TRANSLATE for SQL Server Text Data Manipulation

By:   |   Updated: 2023-04-19   |   Comments (6)   |   Related: More > TSQL


Problem

When reviewing or demoing your old code, do you ever get embarrassed? Most times, for me, it's caused by a lack of comments. However, I usually apologize when showing queries with a nested group of REPLACE() functions. Even if everyone uses REPLACE() this way, it doesn't look good. If you need to replace several occurrences of text in a column, is there a better way than using several nested REPLACE() functions?

Solution

Could the SQL TRANSLATE() function replace nesting REPLACE() functions? In this article, I'll look at the TRANSLATE() function in detail, including how it's different from REPLACE(). Can TRANSLATE() act as a complete substitute for nested REPLACE() functions? What are the differences between the two? These are just a few questions I'll answer as we go through this tutorial. By the end, you'll know which function to choose moving forward, depending on your query requirements.

Exploring SQL REPLACE()

As the name implies, the REPLACE() function replaces one or a series of characters with another. It's easily one of the simplest yet most powerful functions in SQL. You can also use it as part of an UPDATE statement.

SELECT REPLACE('Luke I am your friend', 'friend', 'father') AS [Name Replacement];

Results:

Name Replacement
---------------------
Luke I am your father

For as long as I've used SQL Server, REPLACE() has existed. It's a favorite among developers when trying to clean up data. We've all been there, especially when users modify data without controls. They could fill out a form indicating someone's name and include a # sign. Their name may have a # sign, but it was likely a mistake. Ideally, controls exist on the front end to stop this from happening, but that's a different conversation.

Something you commonly see is nesting several REPLACE() functions to account for multiple strings to replace on the same column. Often, you want to replace the value with a blank or empty string. For example, let's say we want to replace any non-numeric characters in a phone number. The code below should do the trick.

-- Original Phone Number 1-(555)-555-5555
DECLARE @PhoneNumber NVARCHAR(50) = '1-(555)-555-5555'

SELECT REPLACE(REPLACE(REPLACE(@PhoneNumber,'-',''),'(',''),')','') AS [New Phone Number];

Results:

New Phone Number
-----------------
15555555555

In the example above, there is absolutely nothing wrong with nesting REPLACE(). I've likely used up to 8 or 10 REPLACE() functions when the situation called for it. However, I'll admit it does make your T-SQL hard to read the deeper you go. This pattern reminds me of a nested CASE expression. Is there a better way of doing this to remove the clutter? That's where our next function comes into play.

Exploring SQL TRANSLATE()

Microsoft first introduced the TRANSLATE() function in SQL Server 2017. We could use TRANSLATE() in Oracle long before then. You must ensure your database compatibility level is at 140 or higher to take advantage of it. Microsoft defines TRANSLATE() as a function that returns the string provided as a first argument after some characters specified in the second argument are translated into a destination set of characters specified in the third argument.

When first looking at TRANSLATE(), I had no idea you could swap it for REPLACE() based on the name. I assumed it had something to do with language translation. However, after seeing an example, the confusion cleared up.

What does the basic syntax of TRANSLATE() look like? One crucial point is that the second and third arguments are a list of characters, list being the keyword.

TRANSLATE ( Input String, List of Characters, List of Characters to Replace With )

Let's try and remove all the non-numeric numbers from our phone number.

-- Original Phone Number 1-(555)-555-5555
DECLARE @PhoneNumber NVARCHAR(50) = '1-(555)-555-5555';
SELECT TRANSLATE(@PhoneNumber, '()-', '') AS [New Phone Number];

Results:

The second and third arguments of the TRANSLATE built-in function must contain an equal number of characters.

You can see above that our attempt to replace the list of characters resulted in an error message indicating that the second and third arguments must be the same length. What if we try and use three blanks since the second argument has three characters?

-- Original Phone Number 1-(555)-555-5555
DECLARE @PhoneNumber NVARCHAR(50) = '1-(555)-555-5555';
SELECT TRANSLATE(@PhoneNumber, '()-', '   ') AS [New Phone Number];

Results:

New Phone Number
-------------------
1  555  555 5555

Well, it's doing what I'm telling it to, essentially replacing the characters with spaces. We still need to use a REPLACE() function to remove the blank spaces, granted only one of them. With the TRANSLATE() portion, I will use a less-used replacement than a space.

-- Original Phone Number 1-(555)-555-5555
DECLARE @PhoneNumber NVARCHAR(50) = '1-(555)-555-5555';
SELECT REPLACE(TRANSLATE(@PhoneNumber, '()-', '***'),'*','') AS [New Phone Number];

Remember to accomplish the task with REPLACE(). I needed three, one for the open, one for the closed, and one for the hyphen. Situations like this are where TRANSLATE() shines.

Handling NULLs

How are NULLs handled for each of the functions? Let's start with REPLACE(). If either the value being replaced or the replacement value is NULL, SQL returns a NULL. This also applies if you are nesting REPLACE() functions. You can see the results of the example below.

WITH PhoneNumbers
AS (
   SELECT '1-(555)-555-5555' AS Number
   UNION ALL
   SELECT '1-(555)-555-1234'
   UNION ALL
   SELECT NULL
   )
SELECT REPLACE(REPLACE(REPLACE(Number,'-',''),'(',''),')','') AS [New Phone Number]
FROM PhoneNumbers;
GO

Results:

New Phone Number
------------------
15555555555
15555551234
NULL

My recommendation is to handle the potential for a NULL with either COALESCE() or ISNULL().

Now let's turn our attention to TRANSLATE(). Does translate handle NULLs any differently than REPLACE()? Not really. If the input or replace value is NULL, the result will be a NULL.

WITH cte_ImportantNumbers
AS (
   SELECT '999-99-9999' AS Number
   UNION ALL
   SELECT '000-00-0000'
   UNION ALL
   SELECT NULL
   )
SELECT TRANSLATE(Number, '0123456789',REPLICATE('*',10)) AS [New Important Number]
FROM cte_ImportantNumbers;
GO

Results:

New Important Number
---------------------
***-**-****
***-**-****
NULL

As mentioned previously, remember to create a remediation plan if you have a potential for NULLs. Unless your column doesn't allow NULLs, they are always there waiting to spring into action.

Which One to Choose?

If you want to replace a string with another, stick with REPLACE(), for example, replacing the word "Home" with "Office." However, if you're trying to clean up a list of characters and need to nest multiple REPLACE() functions together, try TRANSLATE(). Since TRANSLATE() didn't exist before SQL Server 2017, you can't use it if you're on an older version. Keep using REPLACE() until you can upgrade. To answer the question from the beginning, can TRANSLATE() replace the REPLACE() function? The answer is kind of, but not entirely.

Key Takeaways

  • REPLACE() and TRANSLATE() are both string functions and substitute characters for some others.
  • TRANSLATE() only works on SQL Server 2017. You need to ensure your database compatibility is set to at least 140.
  • With either REPLACE() or TRANSLATE(), if the string is NULL, the value returned is NULL. There is no difference in how either handle NULLs.
  • Pick whichever function you are the most comfortable working with and get coding.
Next Steps
  • Daniel Calbimonte assembled several articles about SQL Server functions. Here is the link to the one about TRANSLATE().
  • The article said TRANSLATE() was introduced in SQL Server 2017. For a listing of other T-SQL enhancements, please check out this article by Koen Verbeeck.
  • In this article, Aubrey Love compares three additional string functions: STUFF(), REPLACE(), and WRITE(). I had never heard of WRITE() before reading this.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Jared Westover Jared Westover is a passionate technology specialist at Crowe, helping to build data solutions. For the past two decades, he has focused on SQL Server, Azure, Power BI, and Oracle. Besides writing for MSSQLTips.com, he has published 12 Pluralsight courses about SQL Server and Reporting Services. Jared enjoys reading and spending time with his wife and three sons when he's not trying to make queries go faster.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2023-04-19

Comments For This Article




Friday, October 13, 2023 - 11:12:58 AM - Tracy Norris Back To Top (91666)
Excellent explanation for the previously-unheard of TRANSLATE function. Like you, I really dislike nested REPLACE statements. Now looking into the WRITE() function - which I've never heard of either!

Sunday, May 21, 2023 - 11:32:54 PM - Guy Back To Top (91218)
Both the well-explained article and the follow-up performance test are much appreciated. Thank you for taking the time!

Sunday, May 21, 2023 - 10:06:40 AM - Thomas Franz Back To Top (91216)
PS: the mentioned times are the pure CPU times, not the real execution times which were just about 1-2 minutes per run (because of paralellism the CPU time is much higher).

The exeucution plans and plan costs were equal for all 4 tests, except the INSERT everything run in batch mode.

Saturday, May 20, 2023 - 6:30:09 AM - Jared Westover Back To Top (91213)
@Thomas Franz

Thank you for taking the time to put these results together and sharing them with everyone.

Friday, May 19, 2023 - 10:24:56 AM - Thomas Franz Back To Top (91210)
I made a perfomance test on a table with ~85 mio phone numbers (many duplicates, columnstore index, warm cache, idle server, SQL 2022 Developer, multiple runs, inserting into a #temp table, because returning it to SSMS would take forever).

* REPLACE(TRANSLATE(mc.msisdn, '02468', '*****'), '*', '') as trans -- about 11:45 min
* REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(mc.msisdn, '0', ''), '2', ''), '4', ''), '6', ''), '8', '') AS trans -- about 9:00 min
* TRANSLATE(mc.msisdn, '02468', '*****') AS trans -- about 8:30 min
* REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(mc.msisdn, '0', '*'), '2', '*'), '4', '*'), '6', '*'), '8', '*') AS trans -- about 10:30 min

Conclutions:
- a simple translate is the fastest (8:30)
- when you add a replace it becomes the slowest (11:45), there seems to be some sort of problems, when SQL Server has to pass the TRANSLATE result to REPLACE
- REPLACING every even number by an asterisk is on the third place (10:30)
- REPLACING every even number by an empty string is on the second place (9:00), I guess, because the outer replaces works faster, because the input string becomes shorter and shorter after every call

Sunday, April 23, 2023 - 5:28:37 AM - Brent Leslie Back To Top (91127)
Might be good to do a performance comparison between the two as well?