Standardize SQL Server data with text lookup and replace function
Have you ever had the need to replace multiple words in a data column with a new word or a new phrase for an entire table? A good example would be to update a product catalog with new words either for standardization or just to reflect a new name for the product. A simple technique would be to use the REPLACE command as outlined in this prior tip. This is great if you only need to update a few words, but what if you need to scan the entire product catalog and make several changes across the board, sometimes multiple changes for one data value?
By using a combination of the text manipulation commands from this prior tip and also using a cursor we can cursor through the data one time and update all words in the data column based on values that exist in a synonyms table to reflect the new way we want the data to be stored. The downside is that we will need to use a cursor, but the upside is that we can scan through all of our data one time instead of having to do multiple table scans to find the word we are looking for and then doing the update.
Setup Test Scenario
The first thing we will do is setup some test tables and a control table for the word changes.
Here is the table that we want to update along with some sample records.
CREATE TABLE dbo.Products ( ProductID INT IDENTITY(1,1), ProductName VARCHAR(100) ) GO INSERT INTO Products (ProductName) VALUES ('Sir Rodney''s Marmalade') INSERT INTO Products (ProductName) VALUES ('Sir Rodney''s Scones') INSERT INTO Products (ProductName) VALUES ('Jack''s New England Clam Chowder') INSERT INTO Products (ProductName) VALUES ('Louisiana Fiery Hot Pepper Sauce') INSERT INTO Products (ProductName) VALUES ('Louisiana Hot Spiced Okra ')
Here is what are sample data looks like.
|Sir Rodney's Marmalade|
|Sir Rodney's Scones|
|Jack's New England Clam Chowder|
|Louisiana Fiery Hot Pepper Sauce|
|Louisiana Hot Spiced Okra|
Then we will create the control table for the word changes. So to show how this can be used we need to create a new table called synonyms. We will also create a unique index on the word column. This will allow us to use the index for the data lookup and also make sure that only one row is returned.
CREATE TABLE [dbo].[Synonyms] ( [synonym] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [word] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ) ON [PRIMARY] GO CREATE UNIQUE INDEX [IX_word] ON [dbo].[Synonyms]([word]) ON [PRIMARY] GO INSERT INTO synonyms VALUES ('Jam','Marmalade') INSERT INTO synonyms VALUES ('Chowda','Chowder') INSERT INTO synonyms VALUES ('Wicked Hot','Hot') INSERT INTO synonyms VALUES ('King','Sir')
Here is what the control table data looks like. In this script below, we will look up the Word and replace with the Synonym value.
Script to Make Data Changes
Here is the script that will scan through an entire table using a cursor, for each row of data the text is broken down word by word and then looked up in a synonyms table to see if there is a standardized way of storing this word. If found the new value is brought in to create an updated version of our data based on the lookup.
Here is the code that does the conversion.
SET NOCOUNT ON DECLARE @word VARCHAR(50), @position INT, @newProductName VARCHAR(500), @oldProductName VARCHAR(500), @newWord VARCHAR(50), @ProductName VARCHAR(500), @ProductID INT DECLARE load_cursor CURSOR FOR SELECT ProductID, ProductName FROM dbo.Products OPEN load_cursor FETCH NEXT FROM load_cursor INTO @ProductID, @ProductName WHILE @@FETCH_STATUS = 0 BEGIN SET @oldProductName = @ProductName SET @ProductName = LTRIM(RTRIM(@ProductName)) SET @newProductName = @ProductName SET @position = CHARINDEX(' ', @ProductName, 1) BEGIN WHILE @position > 0 BEGIN SET @word = LTRIM(RTRIM(LEFT(@ProductName, @position - 1))) IF @word <> '' BEGIN SELECT @newWord = NULL SELECT @newWord = synonym FROM Synonyms WHERE word = @word IF @newWord IS NOT NULL BEGIN SET @newProductName = REPLACE(@newProductName, @word, @newWord) END END SET @ProductName = RIGHT(@ProductName, LEN(@ProductName) - @position) SET @position = CHARINDEX(' ', @ProductName, 1) END SET @word = @ProductName SELECT @newWord = NULL SELECT @newWord = synonym FROM Synonyms WHERE word = @word IF @newWord IS NOT NULL SET @newProductName = REPLACE(@newProductName, @ProductName, @newWord) END IF @oldProductName <> @newProductName BEGIN SELECT @oldProductName AS OldProductName, @newProductName AS NewProductName --UPDATE dbo.Products SET ProductName = @newProductName WHERE ProductID = @ProductID END FETCH NEXT FROM load_cursor INTO @ProductID, @ProductName END CLOSE load_cursor DEALLOCATE load_cursor GO
So after creating the table, loading the data and running the script above these are the results we get when using the sample data. As you can see we have updated the data based on the our Word replacements. Also, you will notice that we have updated "Sir Rodney's Marmalade" to "King Rodney's Jam". The process did the replacement for both "Sir" and for "Marmalade".
|Sir Rodney's Marmalade||King Rodney's Jam|
|Sir Rodney's Scones||King Rodney's Scones|
|Jack's New England Clam Chowder||Jack's New England Clam Chowda|
|Louisiana Fiery Hot Pepper Sauce||Louisiana Fiery Wicked Pepper Sauce|
|Louisiana Hot Spiced Okra||Louisiana Wicked Spiced Okra|
- Modify this code to work with your data
- Make some changes so the final result updates the data instead of just displaying the before and after versions
- Although the text manipulation commands are limited, get creative with what is available to solve your data problem
About the author
View all my tips