Fuzzy String Matching Algorithms in SQL Server

Problem

Fuzzy string matching is defined as a method for finding approximate matches between strings, checking their similarity, and calculating their difference. This can be used to identify character corruption due to spelling errors, transposed characters, missing characters, or abbreviations. This method uses algorithms to detect similar sounding strings. Microsoft is introducing fuzzy string matching in SQL Server 2025. Is it possible to have similar features in previous versions of SQL Server?

Solution

Let’s see how to build Fuzzy String Matching functions using T-SQL.

Fuzzy String-Matching Functions

EDIT_DISTANCE: This function tells how many changes need to be completed to turn one word into another one, by adding, deleting, or swapping letters. Lower is better, and lower than 2 indicates a probable small typo or abbreviation.
EDIT_DISTANCE_SIMILARITY: This function provides a score from 0 to 100, showing how alike two words are, where 0 is for totally different and 100 is the same. High is better, greater than 90 indicates probably a typo, and 70 to 89 indicates similarity but with noticeable differences.
JARO_WINKLER_DISTANCE: This function is similar to EDIT_DISTANCE, but also supplies whether the two words start the same. Lower is better.
JARO_WINKLER_SIMILARITY: This function is like the previous one, but it gives a score between 0 and 1, where 1 means a perfect match. High is better, greater than 0.9 indicates a very strong match, between 0.80 and 0.89 indicates possibly a variation or typo, and between 0.70 and 0.79 indicates that it might be related but not guaranteed.

Fuzzy String Matching Algorithms in SQL Server

Damerau–Levenshtein Distance

In information theory and computer science, the Damerau–Levenshtein distance is a string metric for measuring the edit distance between two sequences. Informally, the Damerau–Levenshtein distance between two words is the minimum number of operations required to change one word into the other. These operations consist of insertions, deletions, or substitutions of a single character, or transposition of two adjacent characters.

Jaro–Winkler Similarity

In computer science and statistics, the Jaro–Winkler similarity is a string metric measuring an edit distance between two sequences. It is a variant of the Jaro Distance metric. It is very useful for comparing short strings such as names.

SQL Solution for Fuzzy String Matching

I am going to use a user defined table type:

-- MSSQLTips (TSQL)
CREATE TYPE [dbo].[uttMtxIndexed] AS TABLE(
    [lin] [smallint] NULL,
    [col] [smallint] NULL,
    [val] [float] NULL
)
GO

Let´s create two user defined functions, one for the Damerau-Levenshtein algorithm and another for the Jaro-Winkler distance algorithm.

Damerau-Levenshtein Algorithm with T-SQL

Below is for the Damerau-Levenshtein algorithm:

-- =============================================
-- Author:        SCP - MSSQLTips
-- Create date:   20250307
-- Description:   Damerau-Levenshtein algorithm
-- =============================================
CREATE OR ALTER FUNCTION [dbo].[ufnFuzzyDamerauLevenshtein] 
                (@Word1 nvarchar(120)
                ,@Word2 nvarchar(120))
RETURNS int
AS
BEGIN
    DECLARE  @L1 int = LEN(@Word1)
            ,@L2 int = LEN(@Word2)
            ,@d AS dbo.uttMtxIndexed
            ,@i int
            ,@j int
            ,@cost int
            ,@del int
            ,@ins int
            ,@subs int
            ,@min int;
 
    SET @i = 0;
    WHILE @i <= @L1 BEGIN
        SET @j = 0;
        WHILE @j <= @L2 BEGIN
            INSERT INTO @d
                VALUES (@i,@j,0);
 
            SET @j += 1;
        END
 
        SET @i += 1;
    END
 
    UPDATE         @d
        SET         val = lin
        WHERE     col = 0;
 
    UPDATE         @d
        SET         val = col
        WHERE     lin = 0;
 
    SET @i = 1;
    WHILE @i <= @L1 BEGIN
        SET @j = 1;
        WHILE @j <= @L2 BEGIN
            IF SUBSTRING(@Word1,@i,1) = SUBSTRING(@Word2,@j,1)
                SET @cost = 0;
            ELSE
                SET @cost = 1;
 
            SET @del = (SELECT         val 
                            FROM     @d
                            WHERE     lin = @i - 1 AND
                                     col = @j) + 1;
 
            SET @ins = (SELECT         val 
                            FROM     @d
                            WHERE     lin = @i AND
                                     col = @j - 1) + 1;
 
            SET @subs = (SELECT         val 
                            FROM     @d
                            WHERE     lin = @i - 1 AND
                                     col = @j - 1) + @cost;
 
            IF @del < @ins
                SET @min = @del
            ELSE
                SET @min = @ins;
 
            IF @min > @subs
                SET @min = @subs;
 
            UPDATE         @d
                SET         val = @min
                WHERE     lin = @i AND
                         col = @j;
 
            IF  @i > 1 AND 
                @j > 1 AND
                SUBSTRING(@Word1,@i,1) = SUBSTRING(@Word2,@j - 1,1) AND
                SUBSTRING(@Word1,@i - 1,1) = SUBSTRING(@Word2,@j,1) BEGIN
 
                DECLARE @Val int =  
                    (SELECT         val 
                        FROM     @d
                        WHERE     lin = @i - 2 AND
                                 col = @j - 2) + 1;
 
                IF @min > @Val
                    SET @min = @Val;
 
                UPDATE         @d
                    SET         val = @min
                    WHERE     lin = @i AND
                             col = @j;
            END
 
            SET @j += 1;
        END
 
        SET @i += 1;
    END
 
    DECLARE @Distance int =
        (SELECT         val 
            FROM     @d
            WHERE     lin = @L1 AND
                     col = @L2);
 
    RETURN @Distance;
END
GO

Jaro-Winkler Distance Algorithm with T-SQL

Below is for the Jaro-Winkler distance algorithm:

-- =============================================
-- Author:        SCP - MSSQLTips
-- Create date:   20250307
-- Description:   Jaro-Winkler distance algorithm
-- =============================================
CREATE OR ALTER FUNCTION [dbo].[ufnFuzzyJaroWinkler]
                (@Word1 nvarchar(120)
                ,@Word2 nvarchar(120))
RETURNS decimal(12,6)
AS
BEGIN
    IF @Word1 IS NULL OR @Word2 IS NULL
        RETURN -1;
 
    DECLARE  @L1 int = LEN(@Word1)
            ,@L2 int = LEN(@Word2)
            ,@matches int = 0
            ,@Transp int = 0
            ,@PrefixLen int = 0
            ,@PrefixMax int = 4
            ,@i int
            ,@j int
            ,@W1Matched nvarchar(100)
            ,@W2Matched nvarchar(100)
            ,@Jaro decimal(12,6)
            ,@JaroWinkler decimal(12,6)
            ,@Scaling decimal(12,6) = 0.1;
 
    IF @L1 = 0 AND @L2 = 0 
        RETURN 1;
 
    IF @L1 = 0 OR @L2 = 0 
        RETURN 0;
 
    DECLARE @matchWindow int = (GREATEST(@L1,@L2) / 2) - 1;
 
    -- similar function for GREATEST:
    -- DECLARE @matchWindow int = 0.5 * (@L1 + @L2 + ABS(@L1 - @L2));
 
    -- similar function for LEAST:
    -- DECLARE @matchWindow int = 0.5 * (@L1 + @L2 - ABS(@L1 - @L2));
    
    SET @W1Matched = REPLICATE(' ',@L1);
    SET @W2Matched = REPLICATE(' ',@L2);
 
    SET @i = 1;
    WHILE @i <= @L1 BEGIN
        SET @j = GREATEST(1,@i - @matchWindow);
        WHILE @j <= LEAST(@L2,@i + @matchWindow) BEGIN
            IF SUBSTRING(@Word2,@j,1) = SUBSTRING(@Word1,@i,1) AND 
               SUBSTRING(@W2Matched,@j,1) = ' ' BEGIN
                SET @W1Matched = STUFF(@W1Matched,@i,1,SUBSTRING(@Word1,@i,1));
                SET @W2Matched = STUFF(@W2Matched,@j,1,SUBSTRING(@Word2,@j,1));
                SET @matches += 1;
                BREAK
            END
 
            SET @j += 1;
        END
 
        SET @i += 1;
    END
 
    IF @matches = 0 
        RETURN 0;
 
    SET @j = 1;
    SET @Transp = 0;
    SET @i = 1;
    WHILE @i <= @L1 BEGIN
        IF SUBSTRING(@W1Matched,@i,1) <> ' ' BEGIN
            WHILE SUBSTRING(@W2Matched,@j,1) = ' ' 
                SET @j += 1;
 
            IF SUBSTRING(@W1Matched,@i,1) <> SUBSTRING(@W2Matched,@j,1)
                SET @Transp += 1;
 
            SET @j += 1;
        END
 
        SET @i += 1;
    END
 
    SET @Transp /= 2;
 
    SET @Jaro =  (1.0 / 3.0) * 
                ((@matches * 1.0 / @L1) + 
                 (@matches * 1.0 / @L2) + 
                ((@matches - @Transp) * 1.0 / @matches));
 
    SET @PrefixLen = 0;
    WHILE @PrefixLen < @PrefixMax AND 
          @PrefixLen < @L1 AND 
          @PrefixLen < @L2 AND 
          SUBSTRING(@Word1,@PrefixLen + 1,1) = SUBSTRING(@Word2,@PrefixLen + 1,1) BEGIN
        SET @PrefixLen += 1;
    END
 
    SET @JaroWinkler = @Jaro + (@PrefixLen * @Scaling * (1 - @Jaro));
 
    RETURN @JaroWinkler
END
GO

Table Value Function to Compare Words

Here is a table value function to compare two words:

-- =============================================
-- Author:        SCP - MSSQLTips
-- Create date:   20250624
-- Description:   Fuzzy match words
-- =============================================
CREATE OR ALTER   FUNCTION     [dbo].[tvfFuzzyWords] 
                (@Word1 nvarchar(250)
                ,@Word2 nvarchar(250))
RETURNS             @Solut     
    TABLE        (Word1 nvarchar(250)
                ,Word2 nvarchar(250)
                ,EditDistance int
                ,EditDistanceSimilarity decimal(12,4)
                ,JaroWinklerSimilarity decimal(12,4)
                ,JaroWinklerDistance AS (1 - JaroWinklerSimilarity)
                ,Word1Len AS (LEN(Word1))
                ,Word2Len AS (LEN(Word2)))
WITH EXECUTE AS CALLER 
AS
BEGIN
 
    INSERT INTO     @Solut
                (Word1
                ,Word2
                ,EditDistance
                ,JaroWinklerSimilarity
                ,EditDistanceSimilarity)
        VALUES    (@Word1
                ,@Word2
                ,[dbo].[ufnFuzzyDamerauLevenshtein] (@Word1,@Word2) 
                ,[dbo].[ufnFuzzyJaroWinkler] (@Word1,@Word2) 
                ,(1.0 - [dbo].[ufnFuzzyDamerauLevenshtein] (@Word1,@Word2) * 1.0 / GREATEST(LEN(@Word1),LEN(@Word2))) * 100);
 
    RETURN;
END
GO

Examples Using Fuzzy String Matching

Below we build a table and add some data for testing.

DECLARE      @WordPairs 
    TABLE   (WordID int IDENTITY(1,1) PRIMARY KEY
            ,WordUK nvarchar(50)
            ,WordUS nvarchar(50)
            ,WordLenUK AS LEN(WordUK)
            ,WordLenUS AS LEN(WordUS)
            ,EditDistance int
            ,EditDistanceSimilarity decimal(12,4)
            ,JaroWinklerSimilarity decimal(12,4)
            ,JaroWinklerDistance AS (1 - JaroWinklerSimilarity));
 
INSERT INTO  @WordPairs 
            (WordUK, WordUS) 
    VALUES  ('Colour', 'Color'),
            ('Flavour', 'Flavor'),
            ('Centre', 'Center'),
            ('Theatre', 'Theater'),
            ('Organise', 'Organize'),
            ('Analyse', 'Analyze'),
            ('Catalogue', 'Catalog'),
            ('Programme', 'Program'),
            ('Metre', 'Meter'),
            ('Honour', 'Honor'),
            ('Neighbour', 'Neighbor'),
            ('Travelling', 'Traveling'),
            ('Grey', 'Gray'),
            ('Defence', 'Defense'),
            ('Practise', 'Practice'), -- Verb form in UK
            ('Practice', 'Practice'), -- Noun form in both
            ('Aluminium', 'Aluminum'),
            ('Cheque', 'Check'); -- Bank cheque vs. check
 
UPDATE       @WordPairs
    SET      [EditDistance] = [dbo].[ufnFuzzyDamerauLevenshtein] (WordUK,WordUS);
 
UPDATE       @WordPairs
    SET      [EditDistanceSimilarity] = (1.0 - [EditDistance] * 1.0 / GREATEST([WordLenUK],[WordLenUS])) * 100
            ,[JaroWinklerSimilarity] = [dbo].[ufnFuzzyJaroWinkler] ([WordUK],[WordUS]);
 
SELECT       *
    FROM     @WordPairs 
    ORDER BY 2,3;

When executed, the results are:

Fuzzy String Matching to Find Missing Characters

-- MsSqlTips (TSQL)
SELECT * FROM [dbo].[tvfFuzzyWords] ('Sun','Saturn');
GO

Results:

This shows that to convert SUN into SATURN, we need to add 3 letters (A, T, and R) and 50% is similar.

-- MsSqlTips (TSQL)
SELECT * FROM [dbo].[tvfFuzzyWords] ('envronment','environment');
GO

Results:

Just one insertion is needed, the letter “i”, 10 out of 11 characters match (90.9), and 0.9788 means that most letters are in the right place.

Fuzzy String Matching to Find Abbreviations

-- MsSqlTips (TSQL)
SELECT * FROM [dbo].[tvfFuzzyWords] ('dept','department');
GO

Results:

Six characters need to be added, which is a big change, and 40% is similar. The Jaro-Winkler similarity of 0.86 is high and shows the strings match at the beginning, so in some way, they are related.

Fuzzy String Matching to Find Transposed Characters

-- MsSqlTips (TSQL)
SELECT * FROM [dbo].[tvfFuzzyWords] ('adn','and');
GO

Results:

Only one change is needed: swap “d” and “n”. Two out of 3 characters are fine (66.6) and show it as a small mistake, and Jaro-Winkler shows that maybe not that similar.

Fuzzy String Matching to Find Spelling Errors

-- MsSqlTips (TSQL)
SELECT * FROM [dbo].[tvfFuzzyWords] ('recieve','receive');
GO

Results:

Only one change is needed: swap “I” and “e”. Six out of 7 letters are fine (85.7). Jaro-Winkler is high when the words start out the same and are mostly identical.

Next Steps

There are a lot of examples of applications of these features on the Internet including:

Resources

Community

Subscribe