Standardize SQL Server data with text lookup and replace function

Problem

Have you ever had the need to replace multiple words in a data column with a new word or a new phrase for an entire table? A good example would be to update a product catalog with new words either for standardization or just to reflect a new name for the product. A simple technique would be to use the REPLACE command as outlined in this prior tip. This is great if you only need to update a few words, but what if you need to scan the entire product catalog and make several changes across the board, sometimes multiple changes for one data value?

Solution

By using a combination of the text manipulation commands from this prior tip and also using a cursor we can cursor through the data one time and update all words in the data column based on values that exist in a synonyms table to reflect the new way we want the data to be stored. The downside is that we will need to use a cursor, but the upside is that we can scan through all of our data one time instead of having to do multiple table scans to find the word we are looking for and then doing the update.

Setup Test Scenario

The first thing we will do is setup some test tables and a control table for the word changes.

Here is the table that we want to update along with some sample records.

CREATE TABLE dbo.Products 
   (
   ProductID INT IDENTITY(1,1),
   ProductName VARCHAR(100)
   )
GO
INSERT INTO Products (ProductName) VALUES ('Sir Rodney''s Marmalade') 
INSERT INTO Products (ProductName) VALUES ('Sir Rodney''s Scones') 
INSERT INTO Products (ProductName) VALUES ('Jack''s New England Clam Chowder') 
INSERT INTO Products (ProductName) VALUES ('Louisiana Fiery Hot Pepper Sauce')
INSERT INTO Products (ProductName) VALUES ('Louisiana Hot Spiced Okra ')

Here is what are sample data looks like.

ProductName
Sir Rodney’s Marmalade
Sir Rodney’s Scones
Jack’s New England Clam Chowder
Louisiana Fiery Hot Pepper Sauce
Louisiana Hot Spiced Okra

Then we will create the control table for the word changes. So to show how this can be used we need to create a new table called synonyms. We will also create a unique index on the word column. This will allow us to use the index for the data lookup and also make sure that only one row is returned.

CREATE TABLE [dbo].[Synonyms] ( 
    [synonym] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , 
    [word] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL  
) ON [PRIMARY] 
GO 
CREATE UNIQUE INDEX [IX_word] ON [dbo].[Synonyms]([word]) ON [PRIMARY] 
GO 
INSERT INTO synonyms VALUES ('Jam','Marmalade') 
INSERT INTO synonyms VALUES ('Chowda','Chowder') 
INSERT INTO synonyms VALUES ('Wicked Hot','Hot') 
INSERT INTO synonyms VALUES ('King','Sir') 

Here is what the control table data looks like. In this script below, we will look up the Word and replace with the Synonym value.

SynonymWord
JamMarmalade
ChowdaChowder
WickedHot
KingSir

Script to Make Data Changes

Here is the script that will scan through an entire table using a cursor, for each row of data the text is broken down word by word and then looked up in a synonyms table to see if there is a standardized way of storing this word. If found the new value is brought in to create an updated version of our data based on the lookup.

Here is the code that does the conversion.

SET NOCOUNT ON 
DECLARE @word VARCHAR(50),  
    @position INT,  
    @newProductName VARCHAR(500),  
    @oldProductName VARCHAR(500),  
    @newWord VARCHAR(50), 
    @ProductName VARCHAR(500),
    @ProductID INT 
DECLARE load_cursor CURSOR FOR 
    SELECT ProductID, ProductName 
    FROM dbo.Products 
OPEN load_cursor 
FETCH NEXT FROM load_cursor INTO @ProductID, @ProductName 
WHILE @@FETCH_STATUS = 0 
BEGIN 
    SET @oldProductName = @ProductName 
    SET @ProductName = LTRIM(RTRIM(@ProductName)) 
     
    SET @newProductName = @ProductName 
    SET @position = CHARINDEX(' ', @ProductName, 1) 
    BEGIN 
         WHILE @position > 0 
         BEGIN 
              SET @word = LTRIM(RTRIM(LEFT(@ProductName, @position - 1))) 
              IF @word <> '' 
              BEGIN 
                 
                SELECT @newWord = NULL 
                SELECT @newWord = synonym FROM Synonyms WHERE word = @word  
                IF @newWord IS NOT NULL 
                BEGIN 
                     SET @newProductName = REPLACE(@newProductName, @word, @newWord) 
                END 
              END 
              SET @ProductName = RIGHT(@ProductName, LEN(@ProductName) - @position) 
              SET @position = CHARINDEX(' ', @ProductName, 1) 
         END 
          
         SET @word = @ProductName 
         SELECT @newWord = NULL 
         SELECT @newWord = synonym FROM Synonyms WHERE word = @word 
         IF @newWord IS NOT NULL 
              SET @newProductName = REPLACE(@newProductName, @ProductName, @newWord) 
    END 
    IF @oldProductName <> @newProductName 
    BEGIN 
         SELECT @oldProductName AS OldProductName, @newProductName AS NewProductName
         --UPDATE dbo.Products SET ProductName = @newProductName WHERE ProductID = @ProductID  
    END 
    FETCH NEXT FROM load_cursor INTO @ProductID, @ProductName 
END 
CLOSE load_cursor 
DEALLOCATE load_cursor 
GO 

So after creating the table, loading the data and running the script above these are the results we get when using the sample data. As you can see we have updated the data based on the our Word replacements. Also, you will notice that we have updated “Sir Rodney’s Marmalade” to “King Rodney’s Jam”. The process did the replacement for both “Sir” and for “Marmalade”.

OldProductNameNewProductName
Sir Rodney’s MarmaladeKing Rodney’s Jam
Sir Rodney’s SconesKing Rodney’s Scones
Jack’s New England Clam ChowderJack’s New England Clam Chowda
Louisiana Fiery Hot Pepper SauceLouisiana Fiery Wicked Pepper Sauce
Louisiana Hot Spiced OkraLouisiana Wicked Spiced Okra

Next Steps

  • Modify this code to work with your data
  • Make some changes so the final result updates the data instead of just displaying the before and after versions
  • Although the text manipulation commands are limited, get creative with what is available to solve your data problem

Leave a Reply

Your email address will not be published. Required fields are marked *