Standardize SQL Server data with text lookup and replace function

By:   |   Comments (7)   |   Related: More > Scripts


Problem

Have you ever had the need to replace multiple words in a data column with a new word or a new phrase for an entire table? A good example would be to update a product catalog with new words either for standardization or just to reflect a new name for the product. A simple technique would be to use the REPLACE command as outlined in this prior tip. This is great if you only need to update a few words, but what if you need to scan the entire product catalog and make several changes across the board, sometimes multiple changes for one data value?

Solution

By using a combination of the text manipulation commands from this prior tip and also using a cursor we can cursor through the data one time and update all words in the data column based on values that exist in a synonyms table to reflect the new way we want the data to be stored. The downside is that we will need to use a cursor, but the upside is that we can scan through all of our data one time instead of having to do multiple table scans to find the word we are looking for and then doing the update.

Setup Test Scenario

The first thing we will do is setup some test tables and a control table for the word changes.

Here is the table that we want to update along with some sample records.

CREATE TABLE dbo.Products 
   (
   ProductID INT IDENTITY(1,1),
   ProductName VARCHAR(100)
   )
GO

INSERT INTO Products (ProductName) VALUES ('Sir Rodney''s Marmalade') 
INSERT INTO Products (ProductName) VALUES ('Sir Rodney''s Scones') 
INSERT INTO Products (ProductName) VALUES ('Jack''s New England Clam Chowder') 
INSERT INTO Products (ProductName) VALUES ('Louisiana Fiery Hot Pepper Sauce')
INSERT INTO Products (ProductName) VALUES ('Louisiana Hot Spiced Okra ')

Here is what are sample data looks like.

ProductName
Sir Rodney's Marmalade
Sir Rodney's Scones
Jack's New England Clam Chowder
Louisiana Fiery Hot Pepper Sauce
Louisiana Hot Spiced Okra

Then we will create the control table for the word changes. So to show how this can be used we need to create a new table called synonyms. We will also create a unique index on the word column. This will allow us to use the index for the data lookup and also make sure that only one row is returned.

CREATE TABLE [dbo].[Synonyms] ( 
    [synonym] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , 
    [word] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL  
) ON [PRIMARY] 
GO 

CREATE UNIQUE INDEX [IX_word] ON [dbo].[Synonyms]([word]) ON [PRIMARY] 
GO 

INSERT INTO synonyms VALUES ('Jam','Marmalade') 
INSERT INTO synonyms VALUES ('Chowda','Chowder') 
INSERT INTO synonyms VALUES ('Wicked Hot','Hot') 
INSERT INTO synonyms VALUES ('King','Sir') 

Here is what the control table data looks like. In this script below, we will look up the Word and replace with the Synonym value.

Synonym Word
Jam Marmalade
Chowda Chowder
Wicked Hot
King Sir

Script to Make Data Changes

Here is the script that will scan through an entire table using a cursor, for each row of data the text is broken down word by word and then looked up in a synonyms table to see if there is a standardized way of storing this word. If found the new value is brought in to create an updated version of our data based on the lookup.

Here is the code that does the conversion.

SET NOCOUNT ON 

DECLARE @word VARCHAR(50),  
    @position INT,  
    @newProductName VARCHAR(500),  
    @oldProductName VARCHAR(500),  
    @newWord VARCHAR(50), 
    @ProductName VARCHAR(500),
    @ProductID INT 

DECLARE load_cursor CURSOR FOR 
    SELECT ProductID, ProductName 
    FROM dbo.Products 

OPEN load_cursor 
FETCH NEXT FROM load_cursor INTO @ProductID, @ProductName 

WHILE @@FETCH_STATUS = 0 
BEGIN 
    SET @oldProductName = @ProductName 
    SET @ProductName = LTRIM(RTRIM(@ProductName)) 
     
    SET @newProductName = @ProductName 

    SET @position = CHARINDEX(' ', @ProductName, 1) 

    BEGIN 
         WHILE @position > 0 
         BEGIN 
              SET @word = LTRIM(RTRIM(LEFT(@ProductName, @position - 1))) 
              IF @word <> '' 
              BEGIN 
                 
                SELECT @newWord = NULL 
                SELECT @newWord = synonym FROM Synonyms WHERE word = @word  
                IF @newWord IS NOT NULL 
                BEGIN 
                     SET @newProductName = REPLACE(@newProductName, @word, @newWord) 
                END 
              END 
              SET @ProductName = RIGHT(@ProductName, LEN(@ProductName) - @position) 
              SET @position = CHARINDEX(' ', @ProductName, 1) 
         END 
          
         SET @word = @ProductName 
         SELECT @newWord = NULL 
         SELECT @newWord = synonym FROM Synonyms WHERE word = @word 
         IF @newWord IS NOT NULL 
              SET @newProductName = REPLACE(@newProductName, @ProductName, @newWord) 
    END 

    IF @oldProductName <> @newProductName 
    BEGIN 
         SELECT @oldProductName AS OldProductName, @newProductName AS NewProductName
         --UPDATE dbo.Products SET ProductName = @newProductName WHERE ProductID = @ProductID  
    END 

    FETCH NEXT FROM load_cursor INTO @ProductID, @ProductName 
END 

CLOSE load_cursor 
DEALLOCATE load_cursor 
GO 

So after creating the table, loading the data and running the script above these are the results we get when using the sample data. As you can see we have updated the data based on the our Word replacements. Also, you will notice that we have updated "Sir Rodney's Marmalade" to "King Rodney's Jam". The process did the replacement for both "Sir" and for "Marmalade".

OldProductName NewProductName
Sir Rodney's Marmalade King Rodney's Jam
Sir Rodney's Scones King Rodney's Scones
Jack's New England Clam Chowder Jack's New England Clam Chowda
Louisiana Fiery Hot Pepper Sauce Louisiana Fiery Wicked Pepper Sauce
Louisiana Hot Spiced Okra Louisiana Wicked Spiced Okra
Next Steps
  • Modify this code to work with your data
  • Make some changes so the final result updates the data instead of just displaying the before and after versions
  • Although the text manipulation commands are limited, get creative with what is available to solve your data problem


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Greg Robidoux Greg Robidoux is the President and founder of Edgewood Solutions, a technology services company delivering services and solutions for Microsoft SQL Server. He is also one of the co-founders of MSSQLTips.com. Greg has been working with SQL Server since 1999, has authored numerous database-related articles, and delivered several presentations related to SQL Server. Before SQL Server, he worked on many data platforms such as DB2, Oracle, Sybase, and Informix.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Friday, June 12, 2020 - 5:39:11 PM - Greg Robidoux Back To Top (86001)

Hi Lena,

can you provide an example of data that you have and what you want to change. 

I tried this and made a change to the first example:

INSERT INTO Products (ProductName) VALUES ('Sir Rodney''s Marmalade') 

INSERT INTO Products (ProductName) VALUES ('Sir Rodney''s Marmalade Sir') 

and it updated both words Sir to King.

-Greg


Friday, June 12, 2020 - 10:07:06 AM - Lena Back To Top (85990)

How do I change this so that it continues to update the rest of the words in the cell before moving to the next row? If there are for instance 5 words in the cell text that should be replaced, this script stops after it encounters the first word.


Wednesday, February 20, 2013 - 2:27:20 PM - Jewel Back To Top (22314)

Thank you so much!  I used this in a slightly different manner.  I used it to replace bad values in a column with nothing.


Friday, March 12, 2010 - 6:28:42 AM - admin Back To Top (5047)

Here is an update to this.  This is updating the table PRODUCTS that has a primary key of PRODUCTID.

 SET NOCOUNT ON

DECLARE 
@word VARCHAR(50
), 
       
@position INT

       
@newProductName VARCHAR(500
), 
       
@oldProductName VARCHAR(500
), 
       
@newWord VARCHAR(50
),
       
@ProductName VARCHAR(500
),
       @ProductID INT
 

DECLARE load_cursor 
CURSOR FOR
       SELECT 
ProductName, ProductID 
       
FROM 
Products

OPEN 
load_cursor
FETCH NEXT FROM load_cursor INTO 
@ProductName, @ProductID 

WHILE @@FETCH_STATUS 
0
BEGIN
       SET 
@oldProductName 
@ProductName
       
SET @ProductName LTRIM(RTRIM(@ProductName
))
       
       
SET @newProductName 
@ProductName

       
SET @position = CHARINDEX(' '@ProductName1
)

       
BEGIN
               WHILE 
@position 
0
               
BEGIN
                       SET 
@word LTRIM(RTRIM(LEFT(@ProductName@position 1
)))
                       
IF @word <> 
''
                       
BEGIN
                               
                               SELECT 
@newWord 
NULL
                               
SELECT @newWord synonym FROM Synonyms WHERE word 
@word 
                               
IF @newWord IS 
NOT NULL
                               
BEGIN
                                       SET 
@newProductName REPLACE(@newProductName@word@newWord
)
                               
END
                       END
                       SET 
@ProductName RIGHT(@ProductNameLEN(@ProductName) - @position
)
                       
SET @position = CHARINDEX(' '@ProductName1
)
               
END
               
               SET 
@word 
@ProductName
               
SELECT @newWord 
NULL
               
SELECT @newWord synonym FROM Synonyms WHERE word 
@word
               
IF @newWord IS 
NOT NULL
                       
SET @newProductName REPLACE(@newProductName@ProductName@newWord
)
       
END

       IF 
@oldProductName <> 
@newProductName
       
BEGIN 
               UPDATE Products SET
ProductName = @newProductName WHERE P
roductID = @ProductID 
       
END

       FETCH 
NEXT FROM load_cursor INTO 
@ProductName, @ProductID 
END

CLOSE 
load_cursor
DEALLOCATE 
load_cursor
GO


Thursday, March 11, 2010 - 8:45:31 PM - jdevm Back To Top (5045)
Unfortunately you have now lost me regarding the 2nd part which mentions updating the cursor

Thursday, March 11, 2010 - 6:54:29 PM - admin Back To Top (5044)

You can change this part of the code to do an update versus just showing the data. 

       IF @oldProductName <> @newProductName
       
BEGIN
               SELECT 
@oldProductName AS OldProductName@newProductName AS 
NewProductName
       
END

You would also need to add in some additional logic to determine which row you are updating.  This could be added to the cursor.


Thursday, March 11, 2010 - 5:56:54 PM - jdevm Back To Top (5043)
How does one change this script to replace the content instead of just updating it on screen?














get free sql tips
agree to terms