SQL Server SOUNDEX Function
The SOUNDEX function is used to measure the similarity of strings. The function returns 4 characters: 1 letter and 3 numbers. This function classifies the string and uses a code to compare the sound of a string in a similar way to English. It can be used as a simplistic way of comparing two strings to see how similar they may sound.
- expression - this is the character string or expression to evaluate the sound.
Letter to Number Conversion for SOUNDEX
Here is a list of the number values that are returned based on the corresponding letter returned by SOUNDEX. We will walk through some examples below.
|1||b, f, p, v|
|2||c, g, j, k, q, s, x, z|
Letters SOUNDEX Ignores
- The following letters are ignored: a, e, I, o, u, h, w and y.
Simple SOUNDEX Example
The following example will show the SOUNDEX of the word JOHN.
SELECT SOUNDEX('JOHN') as string
J500 means that the first letter is J from JOHN. Then the O and H are ignored (according to rules above) and finally the N is a 5 as shown in the table above. As we do not have more characters, the rest is ignored and 0 is returned for positions three and four.
SOUNDEX Example with Longer Word
In the next example we have the word "Consideration".
SELECT SOUNDEX('Consideration') as string
The value returned is C523 as shown below.
- C is the first letter
- o is ignored
- n is 5
- s is 2
- i is ignored
- d is 3
- we now have the 4 character SOUNDEX code, so the rest of the characters are ignored.
Using DIFFERENCE and SOUNDEX
SOUNDEX is closely related to the DIFFERENCE function. The DIFFERENCE function compares the output of SOUNDEX and returns a number of letters and numbers that are common. The following example illustrates this.
SELECT DIFFERENCE('JANE','JAKE') as difference, SOUNDEX('JANE') as string1, SOUNDEX('JAKE') as string2
JANE and JAKE sound similar. The values returned are J500 and J200. Note that they have 3 values in common and therefore the DIFFERENCE shows a value of 3.