Name Parsing for up to Two Different Persons from One Name String with SQL Server T-SQL

By:   |   Comments (3)   |   Related: 1 | 2 | 3 | > Functions System


Problem

My company receives name strings in a variety of different formats. One of our partners sends us name strings that can contain names for either one or two persons. In addition, the same partner also sends us name strings in a variety of different formats. Are there any special SQL Server T-SQL tips for parsing names when the input string can contain names for either one or two persons? Can you also show how to handle different input formats when parsing either one or two names from an input source string?

Solution

This is the third tip in a three-part series on name parsing. Each tip in the series addresses increasingly more robust approaches to name parsing.

  • The first tip reviews SQL Server T-SQL string functions to highlight how they can be used for name parsing applications.
  • The second tip focuses on how to process names provided by different partners in different formats into one consistent format across all partners.
  • This tip drills down on how to extract names for up to two different persons from a single input source string. This tip also illustrates how to process names submitted in a wide variety of different formats.

All three tips illustrate different approaches to parsing an input name source into substrings. If you have a name source with a value of Rod D Sardo SR, then its substrings are Rod, D, Sardo, and SR.

The parsing of substrings from an input source string involves two steps. First, find the spaces between the substrings. Second, use the locations of spaces to assist with extracting the substrings. When extracting substrings, you may also need to perform some special cleaning, such as removing commas or excess spaces that are not part of a substring. While the native T-SQL string functions do enable applications to perform these two steps, you can easily encounter a situation when finding spaces between substrings results in very complex expressions that are difficult to author and edit. You can circumvent the use of these complex expressions through nested sub-queries (see the second tip) or custom user-defined functions as demonstrated in this tip.

After extracting substrings, you can devise CASE statements to assign substrings to the parts of a name, such as first name, middle name, last name, and suffix. Examples of these CASE statements are presented in all three tips. It is possible to design CASE statements for each name part that processes substrings from differently formatted input sources. A single case statement for a name part can handle strings from differently formatted input source strings. The more diverse the formatting for your input name sources, the more likely it is that you will require highly complex CASE statements to assign substrings to name parts. For this reason, you may care to segment your input name sources into sets that are relatively similar in format, which, in turn, will permit simpler CASE statements for assigning substrings to names parts. This tip and the prior one on name parsing illustrate how to combine the result sets from different sets of case statements into one consistent set of name parts for two or more sets of differently formatted name sources.

What Specifically Does This Tip Show You How To Do

The following screen shot shows the result set from the SELECT statement at the end of the script for this tip, which is really the consolidated result sets from three separate SELECT statements combined into one via UNION ALL operators. While only 12 input name sources are parsed, notice the wide variation in the name formats. These name parsing examples show the range of name formats covered in this tip.

  • Some input sources have name parts for two persons, but others have name parts for only one person.
  • A couple of the two-person input name sources list the last name first, but all the one-person input sources and other two-person sources list the last name last.
  • Middle names are present for some, but not all input strings. In one case (Addie Mae Hinkle Martin) two substrings (Mae and Hinkle) are designated as co-equal parts of the middle name.
  • Suffix name parts are included in some, but not all input name sources. However, when a suffix is present, it applies only to the last name.
Name Parsing for up to Two Different Persons from One Name String

The script for this example commences by inserting the 12 NameString column values appearing in the preceding screen shot into a temporary table named ##NameStrings. This table has just one column. Two subsequent script sections find the location of spaces in the input sources and then use the space locations to extract successive substrings within the name source on each row of the ##NameStrings table.

The tip script contains three separate sets of CASE statements that do the final step in name parsing. The separate sets have their result sets combined via UNION ALL statements. One set of CASE statements is for input sources for just one person listing the first name first (no ampersand and no comma in the NameString row). Another set is for input sources that contain the names of two persons where the first name appears first (an ampersand, but no comma), and the final set of CASE statements is for input sources in which the last name appears first (an ampersand and a comma).

User-defined Functions for Finding the Location of Spaces in an Input String

This tip, relative to the two prior tips, illustrates a new way of determining the space locations between substrings in an input name source. For the name sources for which this tip demonstrates parsing, there can be up to seven space locations between substrings. While it is easy to determine where one or two spaces are in a name source, the task grows more tedious for a larger number of up to seven spaces. As a result, we present a couple of user-defined functions to handle the task of extracting spaces and demonstrate their use.

The following script shows a pair of CREATE FUNCTION statements. The first function, fnFirstSpaceLoc, returns the location of the first space in a varchar field value passed to it as an argument. The function relies on the CHARINDEX built-in T-SQL function to return a smallint value denoting the location of the first space in the varchar field value. The second function, fnNextSpaceLoc, locates the first space location from any location in a varchar field value. The function takes two arguments: the varchar field value and the location from which to start looking for the next space. The fnNextSpaceLoc function returns a value of 0 or greater. If there are no spaces after the starting location for a search or the initial search location is not greater than 0, the return value is 0. Otherwise, the return value is the location of the first space after the starting location.

CREATE FUNCTION [dbo].[fnFirstSpaceLoc] 
(
 @namestring varchar(max)
)
RETURNS smallint
AS
BEGIN
 DECLARE 
 @spaceLoc as smallint

 SET @spaceLoc = CHARINDEX(' ',@nameString,1)

 RETURN @spaceLoc

END

GO

CREATE FUNCTION [dbo].[fnNextSpaceLoc] 
(
 @namestring varchar(max),
 @priorSpaceLoc smallint
)
RETURNS smallint
AS
BEGIN
 DECLARE 
 @spaceLoc as smallint

 SET @spaceLoc = 
  CASE
   WHEN @priorSpaceLoc > 0 THEN
    CHARINDEX(' ',@namestring,@priorSpaceLoc+1)
   ELSE 0
  END 

 RETURN @spaceLoc

END

GO 
 

The next script segment illustrates how to use the fnFirstSpaceLoc and fnNextSpaceLoc functions. The code reads name source strings from the ##NameStrings temporary table. This table contains the NameString column from the preceding screen shot. The code also creates a new temporary table named ##NameStrings_with_SpaceLocs. This new temporary table contains original NameString column along with seven additional columns. These columns hold the space locations for up to seven spaces within the name source string on a row. If there is no space for a location, such as the third space, then then value in the result set is 0.

The SELECT list from this code excerpt starts with the NameString column from its source table.

Next, the fnFirstSpaceLoc function is invoked. Notice that its argument is merely the column name from the source function.

Then, the fnNextSpaceLoc function is invoked to return the value of the SecondSpaceLoc in the result set. This function takes two arguments. The first argument is the NameString value for the current row from the source. The second argument is the value returned from the fnFirstSpaceLoc function.

The fnNextSpaceLoc function is invoked five more times to return values for the ThirdSpaceLoc through the SeventhSpaceLoc columns in the result set. For each of these successive function calls, the first argument is always NameString. However, the second argument changes depending on which column space location is being returned. For the ThirdSpaceLoc column, the fnNextSpaceLoc function's second argument is dbo.fnNextSpaceLoc(NameString, dbo.fnFirstSpaceLoc(NameString)). In other words, it is the next space after the SecondSpaceLoc. The second argument for the fourth space location is the next space location after the ThirdSpaceLoc column. The SQL for this is dbo.fnNextSpaceLoc(NameString, dbo.fnNextSpaceLoc(NameString, dbo.fnFirstSpaceLoc(NameString))). The search for each successive space location depends on the location of the preceding space location.

IF OBJECT_ID('tempdb..##NameStrings_with_SpaceLocs') IS NOT NULL
DROP TABLE ##NameStrings_with_SpaceLocs

SELECT  
 NameString,
 dbo.fnFirstSpaceLoc(NameString
 ) FirstSpaceLoc,
 dbo.fnNextSpaceLoc(NameString, 
  dbo.fnFirstSpaceLoc(NameString)
 ) SecondSpaceLoc,
 dbo.fnNextSpaceLoc(NameString, 
  dbo.fnNextSpaceLoc(NameString, 
   dbo.fnFirstSpaceLoc(NameString))
 ) ThirdSpaceLoc,
 dbo.fnNextSpaceLoc(NameString, 
  dbo.fnNextSpaceLoc(NameString, 
   dbo.fnNextSpaceLoc(NameString, 
    dbo.fnFirstSpaceLoc(NameString)))
 ) FourthSpaceLoc,
 dbo.fnNextSpaceLoc(NameString, 
  dbo.fnNextSpaceLoc(NameString, 
   dbo.fnNextSpaceLoc(NameString, 
    dbo.fnNextSpaceLoc(NameString, 
     dbo.fnFirstSpaceLoc(NameString))))
 ) FifthSpaceLoc,
 dbo.fnNextSpaceLoc(NameString,
  dbo.fnNextSpaceLoc(NameString, 
   dbo.fnNextSpaceLoc(NameString, 
    dbo.fnNextSpaceLoc(NameString, 
     dbo.fnNextSpaceLoc(NameString, 
      dbo.fnFirstSpaceLoc(NameString)))))
 ) SixthSpaceLoc,
 dbo.fnNextSpaceLoc(NameString,
 dbo.fnNextSpaceLoc(NameString,
  dbo.fnNextSpaceLoc(NameString, 
   dbo.fnNextSpaceLoc(NameString, 
    dbo.fnNextSpaceLoc(NameString, 
     dbo.fnNextSpaceLoc(NameString, 
      dbo.fnFirstSpaceLoc(NameString))))))
 ) SeventhSpaceLoc
INTO ##NameStrings_with_SpaceLocs
FROM ##NameStrings  

Extracting substrings from Name Sources

Given the locations of the space locations within a name source, you are ready to apply another script to extract the substrings from within a name source field value. Each substring within the name source field values in our tip sample is separated from another substring by a space (and sometimes a comma too). This tip sample includes name source field values that can have up to eight substrings. The FirstSpaceLoc separates the first and second substrings in a field. The last space separates the next to the last and last substrings within a name source field value. The script for extracting substrings uses as its source the ##NameStrings_with_SpaceLocs temporary table created and populated in the preceding code excerpt. The code for extracting substrings uses the LEFT function for extracting the first substring and the RIGHT function for extracting eighth sub-string if there is one. Within our sample each name source field value has at least three substrings with spaces delimiting the substrings. Each of the remaining substrings after the first is returned via a CASE statement that checks the values of appropriate space locations to make sure that there are spaces delimiting a substring.

  • The second sub-string is extracted by one of two approaches. If there is no subsequent substring, then the code takes all the characters after the first space location. If there are three or more substrings within a NameString field value, the code extracts the characters between the first and second space locations.
  • The third through the seventh substrings are all extracted via a similar approach. For example, if the space locations before and after a potential substring are 0, then the substring is set to an empty string (''). Otherwise, the substring is set equal to the number of characters between the leading and trailing space locations from one after the leading space location.
  • The eighth substring is an empty string unless the seventh space location is not zero. When there is a non-zero seventh space location, a Right function extracts all remaining characters after the seventh space location and assigns them to the eighth substring.

The code for extracting substrings uses REPLACE functions for the first and second substring values. The REPLACE functions removes commas in a substring by replacing them with an empty substring. As mentioned in the two prior name parsing tips, there is often value in wrapping basic name parsing code in other functions such as REPLACE functions to remove unwanted characters from substrings and RTRIM and LTRIM functions to remove leading and trailing blanks at the beginning or end of a substring.

IF OBJECT_ID('tempdb..##NameStrings_with_SpaceLocs_with_SubStrings') IS NOT NULL
DROP TABLE ##NameStrings_with_SpaceLocs_with_SubStrings

SELECT 
   NameString
  ,REPLACE(LEFT(NameString, FirstSpaceLoc-1),',','') FirstString
  ,REPLACE(CASE
   WHEN SecondSpaceLoc = 0 THEN RIGHT(NameString,LEN(NameString)-FirstSpaceLoc)
   ELSE SUBSTRING(NameString, FirstSpaceLoc+1,SecondSpaceLoc - FirstSpaceLoc)
   END,',','') SecondString
  ,CASE
   WHEN SecondSpaceLoc = 0 AND ThirdSpaceLoc = 0 THEN ''
   WHEN ThirdSpaceLoc = 0 THEN RIGHT(NameString,LEN(NameString)-SecondSpaceLoc)
   ELSE SUBSTRING(NameString, SecondSpaceLoc+1,ThirdSpaceLoc - SecondSpaceLoc)
   END ThirdString
  ,CASE
   WHEN ThirdSpaceLoc = 0 AND FourthSpaceLoc = 0 THEN '' 
   WHEN FourthSpaceLoc = 0 THEN RIGHT(NameString,LEN(NameString)-ThirdSpaceLoc)
   ELSE SUBSTRING(NameString, ThirdSpaceLoc+1,FourthSpaceLoc - ThirdSpaceLoc)
   END FourthString
  ,CASE
   WHEN FourthSpaceLoc = 0 AND FifthSpaceLoc = 0 THEN '' 
   WHEN FifthSpaceLoc = 0 THEN RIGHT(NameString,LEN(NameString)-FourthSpaceLoc)
   ELSE SUBSTRING(NameString, FourthSpaceLoc+1,FifthSpaceLoc - FourthSpaceLoc)
   END FifthString
  ,CASE
   WHEN FifthSpaceLoc = 0 AND SixthSpaceLoc = 0 THEN '' 
   WHEN SixthSpaceLoc = 0 THEN RIGHT(NameString,LEN(NameString)-FifthSpaceLoc)
   ELSE SUBSTRING(NameString, FifthSpaceLoc+1,SixthSpaceLoc - FifthSpaceLoc)
   END SixthString
  ,CASE
   WHEN SixthSpaceLoc = 0 AND SeventhSpaceLoc = 0 THEN '' 
   WHEN SeventhSpaceLoc = 0 THEN RIGHT(NameString,LEN(NameString)-SixthSpaceLoc)
   ELSE SUBSTRING(NameString, SixthSpaceLoc+1,SeventhSpaceLoc - SixthSpaceLoc)
   END SeventhString
  ,CASE
   WHEN SeventhSpaceLoc = 0 THEN '' 
   ELSE RIGHT(NameString,LEN(NameString)-SeventhSpaceLoc)
   END EighthString,
   FirstSpaceLoc, SecondSpaceLoc, ThirdSpaceLoc,
   FourthSpaceLoc, FifthSpaceLoc, SixthSpaceLoc,
   SeventhSpaceLoc

INTO ##NameStrings_with_SpaceLocs_with_SubStrings   
FROM ##NameStrings_with_SpaceLocs 

The following screen shot shows an excerpt from the result set for the preceding query statement. Notice that each successive substring within a NameString value is a assigned to a successive column in the result set. Spaces act as delimiters between substrings. Also, if an ampersand (&) is present in a NameString value, then it is treated as a substring.

Parsing Names without a Comma and without an Ampersand

Parsing Names without a Comma and without an Ampersand

The following T-SQL code excerpt shows how to combine substrings into a name when there is no ampersand and no comma in the original source strings. Notice that the FROM clause references the ##NameStrings_with_SpaceLocs_with_SubStrings temporary table, which was populated by the preceding code excerpt that extracted substrings. Additionally notice there is no INTO keyword immediately prior to the FROM clause to populate a temporary table with the result set from the SELECT statement. The result set from this SELECT statement is the first of three that are combined by UNION ALL operators. The WHERE clause immediately following the FROM clause restricts the action of the SELECT to just those source strings with no ampersands and no commas. Other WHERE clauses for the following two SELECT statements target different subsets of source strings for name parsing. Here's a sample name source parsed by the code in this section: Rod D Sardo SR.

Aside from the NameString field, the SELECT list designates seven columns to populate its result set. These columns designate three name parts for the first person and three more name parts for the second person. In between the columns for name parts of the first and second person, there is a column for a name suffix, such as JR. In the sample of names for this tip, a suffix applies only to the first person. Because all two-person name sources include an ampersand in the sample of names for this tip, all name parts for the second person are set to empty strings.

The first name for the first person in the sample of source strings selected by the WHERE clause is always the first substring, FirstString. This column in the result set is named FirstName_1. The remaining three name parts for the first person are assigned via CASE statements. The precise design of the CASE statements for the middle name, last name, and suffix for the first person is a simple matter of software engineering design. The CASE statements must be designed so that they extract substrings and assign them to name parts given the content of the source name fields and the target name parts.

For the particular subset of name sources corralled by the WHERE clause for the first SELECT statement for name parsing, parsing is based on spaces between substrings, the content of substrings, the number of non-zero length substrings. You can fine-tune the design of these statements by trial and error to see which formulations are discovered first and which are most robust for the name sources and substrings that you encounter in your name parsing activities. While the samples below test the actual content of substrings to assist in name parsing this may not be strictly necessary when dealing with name sources that include suffixes, such as JR or SR. However, checking the content of substrings adds an extra dimension of flexibility in parsing that goes beyond the location of non-zero length substrings and the spaces between them.

The CASE statement for the middle name of the first person, MiddleName_1, allows for the possibility of a middle name such as Mae Hinkle, which is comprised of two distinct substrings. There are as many other ways of handling a person with two substring candidates for a middle name as you can imagine. For example, the parsing code can force the middle name to be an empty string ('') when there is more than one sub-string candidate for the name. Alternatively, your code can insert a hyphen between the two substrings (Mae-Hinkle). Another approach is to exclude the first substring from assignment to the middle name (Hinkle). The latter approach may be able to handle the assignment of a maiden name or a previously married last name.

-- Code allows for 2 middle names and special
-- value-based Suffix processing

SELECT
   NameString
  ,FirstString FirstName_1
  ,CASE
   WHEN SecondSpaceLoc = 0 THEN ''
   WHEN SecondSpaceLoc > 0
     AND ThirdSpaceLoc = 0
     AND ThirdString NOT IN ('JR', 'SR', 'II','III')
     THEN SecondString
   WHEN SecondSpaceLoc > 0
     AND ThirdSpaceLoc = 0
     AND ThirdString IN ('JR', 'SR', 'II','III')
     THEN ''
   WHEN ThirdSpaceLoc > 0
     AND FourthSpaceLoc = 0
     AND FourthString IN ('JR', 'SR', 'II','III')
     THEN SecondString
   WHEN ThirdSpaceLoc > 0
     AND FourthSpaceLoc = 0
     AND FourthString NOT IN ('JR', 'SR', 'II','III')
     THEN SecondString + ' ' + ThirdString  
   END MiddleName_1
  ,CASE
   WHEN SecondSpaceLoc = 0 THEN SecondString
   WHEN SecondSpaceLoc > 0 
     AND 
     ThirdSpaceLoc = 0
     AND 
     ThirdString NOT IN ('JR', 'SR', 'II','III')
     THEN ThirdString
   WHEN SecondSpaceLoc > 0 
     AND 
     ThirdSpaceLoc = 0
     AND 
     ThirdString IN ('JR', 'SR', 'II','III')
     THEN SecondString
   WHEN ThirdSpaceLoc > 0 
     AND 
     FourthSpaceLoc = 0
     AND 
     FourthString NOT IN ('JR', 'SR', 'II','III')
     THEN FourthString      
   WHEN ThirdSpaceLoc > 0 
     AND 
     FourthSpaceLoc = 0
     AND 
     FourthString IN ('JR', 'SR', 'II','III')
     THEN ThirdString
   END LastName_1,
   CASE
   WHEN SecondSpaceLoc > 0 
     AND 
     ThirdSpaceLoc = 0
     AND 
     ThirdString IN ('JR', 'SR', 'II','III')
     THEN ThirdString
   WHEN ThirdSpaceLoc > 0 
     AND 
     FourthSpaceLoc = 0
     AND 
     FourthString IN ('JR', 'SR', 'II','III')
     THEN FourthString
   ELSE ''
   END Suffix
  ,'' FirstName_2
  ,'' MiddleName_2
  ,'' LastName_2
   
FROM ##NameStrings_with_SpaceLocs_with_SubStrings
WHERE
  CHARINDEX(',',NameString,1) = 0
  AND
  CHARINDEX('&',NameString,1) = 0

Parsing Names with an Ampersand and no Comma

One especially unique aspect of the name parsing for this tip is the parsing of two names from one source string. For the name sources used in this tip, the inclusion of an ampersand always denotes a name for two persons. However, I want to point out that as unique as the feature is, the code to implement the feature is not especially unique in structure. The most important aspect of the code to implement the feature is the requirement to include filters for ampersands in the CASE statements when assigning substrings to name parts. Another special aspect of the code to assign names for two persons from one name source is the requirement to assign substrings to the name parts for a second person. The prior example always forced the name parts for the second person to an empty string, but the following code uses CASE statements to assign substrings to name parts for the second person. Here's an example of a name source parsed by the following code: Terry M & Jennifer Batter.

The WHERE clause in the following SELECT statement restricts processing to name sources with an ampersand and no comma. You already know that the inclusion of an ampersand is to focus on name sources for more than one person. The exclusion of a comma in name sources is another WHERE clause restriction. This restriction for the name sources in the names used for this tip means that that the first name for the first person is always the first substring. Therefore, the code to assign a substring to the first person is identical to the prior code excerpt that assigned a first name to a single person.

I want to emphasize that there is no general standard for designing the CASE statements that assign substrings to name parts. Also, the design of the statements is not necessarily more difficult when assigning substrings to names for two persons instead of just one person. While there are more substrings to process when assigning substrings to two persons, there are also more degrees of freedom in assigning substrings for two persons than just one person. As with all name parsing exercises, you should process name sources in groups that have similar features. Apply different parsing rules to each group of similar name sources. This selective grouping feature is implemented by the WHERE clause that excludes all name sources with a comma, but includes all name sources with an ampersand.

-- extracting names for two persons from sources 
-- with an an ampersand and no comma

SELECT 
   NameString
  ,FirstString FirstName_1
  ,CASE
   WHEN SecondString = '&'
     AND FifthString = ''
    THEN ''
   WHEN SecondString <> '&' 
     OR ThirdString = '&' 
    THEN SecondString
   WHEN SecondString = '&'
     AND SixthString = ''
    THEN ''
   END MiddleName_1
  ,CASE   
   WHEN FifthString = ''
    THEN FourthString
   WHEN FifthString = '&'
    THEN ThirdString
   WHEN SixthString = ''
    AND FifthString <> ''
   THEN FifthString 
   END LastName_1
  ,CASE
   WHEN FifthString = '&'
    THEN FourthString
   ELSE ''  
   END Suffix
  ,CASE
   WHEN SecondString = '&'
    THEN ThirdString
   WHEN ThirdString = '&'
    THEN FourthString
   WHEN FifthString = '&'
    THEN SixthString 
   END FirstName_2
  ,CASE
   WHEN SecondString = '&'
    AND FifthString <> ''
    AND SixthString = ''
    THEN FourthString
   WHEN FifthString = '&'
    THEN SeventhString
   ELSE ''
   END MiddleName2
  ,CASE
   WHEN SecondString = '&'
    AND FifthString = ''
    THEN FourthString
   WHEN SecondString = '&'
    AND FifthString <> ''
    AND SixthString = ''
    THEN FifthString
   WHEN ThirdString = '&'
    --AND FifthString = ''
    THEN FifthString
   WHEN FifthString = '&'
    AND FifthString <> ''
    THEN EighthString
   ELSE ''
   END LastName2  
FROM ##NameStrings_with_SpaceLocs_with_SubStrings
WHERE
  CHARINDEX(',',NameString,1) = 0
  AND
  CHARINDEX('&',NameString,1) > 0


Parsing Names with an Ampersand and a Comma

The last code excerpt from this tip handles name sources for two persons where the last name for the first person is the first substring. In the name sources for this tip, the second substring can be either a suffix or the first name for the first person's name in the name source. If the second substring denotes a suffix then it is followed by a comma. Otherwise, the first substring is followed by a comma. The assignments for the first person's first name, last name, and suffix, includes these relationships. Here's an example of a name source parsed by this code: Thompson III, Charles A & Elizabeth V.

Although the code for the last subset of name sources accommodates persons with a suffix in their name, the code does not include any special handling for suffix strings, such as JR or SR. All assignments to name parts for the first and second person is based exclusively on the location of a comma, the location of an ampersand, and whether selected substrings are empty or not.

I want to emphasize that this code could easily fail to parse another subset of name sources -- even if they did include both an ampersand and a comma. However, if you have been following the code excerpts to this point you should feel empowered to design your own custom solution for another subset of name sources that require different parsing rules. Alternative rules are relatively easy to specify and evaluate using basic unit testing practices.

-- extracting names for two persons from sources 
-- with an an ampersand and a comma


SELECT
  
   NameString
  ,CASE
   WHEN CHARINDEX(',',NameString,1) > FirstSpaceLoc
   THEN ThirdString
   ELSE SecondString
   END FirstName_1
  ,CASE
   WHEN CHARINDEX('&',NameString,1) > ThirdSpaceLoc
   AND CHARINDEX(',',NameString,1) < FirstSpaceLoc
   THEN ThirdString
   WHEN CHARINDEX('&',NameString,1) < ThirdSpaceLoc
   AND SixthString = ''
   THEN ''   
   ELSE FourthString
   END MiddleName_1 
  ,FirstString LastName_1
  ,CASE
   WHEN CHARINDEX(',',NameString,1) > FirstSpaceLoc
   THEN SecondString
   ELSE ''
   END Suffix
  ,CASE
   WHEN CHARINDEX('&',NameString,1) < ThirdSpaceLoc
   AND SixthString = ''
   THEN FourthString
   WHEN CHARINDEX('&',NameString,1) > ThirdSpaceLoc
   AND SeventhString = ''
   Then FifthString
   ELSE SixthString
   END FirstName_2
  ,CASE
   WHEN CHARINDEX('&',NameString,1) < ThirdSpaceLoc
   AND SixthString = ''
   THEN FifthString
   WHEN CHARINDEX('&',NameString,1) > ThirdSpaceLoc
   AND SeventhString = ''
   THEN SixthString
   ELSE SeventhString
   END MiddleName_2
  ,FirstString LastName_2 
FROM ##NameStrings_with_SpaceLocs_with_SubStrings
WHERE
  CHARINDEX(',',NameString,1) > 0
  AND
  CHARINDEX('&',NameString,1) > 0

Next Steps

This tip is the final one in a three-part series on name parsing with T-SQL. The examples within the tip series become successively more complex to start you at a beginner level, but to still consider advanced topics by the end. This concluding tip highlights how to parse name sources that contain names for either one or two persons along with other advanced name parsing techniques.

The series presents an overview of name parsing by considering the fundamental steps in the process. Within the context of the series, the process involves: finding the delimiters between substrings in a source, extracting the substrings based primarily on the delimiters in a name source, and assigning the substrings to name parts. There are multiple approaches to undertaking each of these tasks, and the series demonstrates some alternative ways of implementing the different elements of name parsing.

As with the prior two name parsing tips, there is some sample code for you to download and try. When you download this tip's script file, please understand that it is not a script that you can just run without some initial setup. At the very least, you will have run the CREATE FUNCTION statements within the script from the context of whatever database you are using to run the code. Also, you should comment out the CREATE FUNCTION statements after they operate initially so that you do not attempt to re-create a previously existing user-defined function when running the script to parse names.

If you find some of the approaches difficult to follow -- especially the use of string functions, consider reviewing the initial tip in the series. If you liked the coverage of how to combine parsed names from different parsing rules into one series, take a look at the second tip which presents another example of this kind of solution.

While each tip in the series comes with sample code and there are many different types of name sources parsed in the code samples, it is likely that some developers will encounter name sources with features that are not covered by the three-part series. If you learn the techniques presented in the series, you will be able to tackle a huge selection of name parsing tasks simply by adapting the techniques illustrated in the series. Therefore, start right away learning the techniques so that you can move on to creating your own personal solutions to your custom name parsing requirements.



sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Rick Dobson Rick Dobson is an author and an individual trader. He is also a SQL Server professional with decades of T-SQL experience that includes authoring books, running a national seminar practice, working for businesses on finance and healthcare development projects, and serving as a regular contributor to MSSQLTips.com. He has been growing his Python skills for more than the past half decade -- especially for data visualization and ETL tasks with JSON and CSV files. His most recent professional passions include financial time series data and analyses, AI models, and statistics. He believes the proper application of these skills can help traders and investors to make more profitable decisions.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Wednesday, September 17, 2014 - 11:21:43 PM - Gene Wirchenko Back To Top (34573)
Ah, that is not my blog. Mine is http://genew.ca/ and is not so focused on computers.

Wednesday, September 17, 2014 - 6:04:02 AM - Rick Dobson Back To Top (34558)

I enjoyed reading your blog on what's in a name and related issues.  My name parsing experience is from the business perspective of how to designate a name that identifies a person well enough so that mail can be correctly routed to the person.

 

As someone with an unusual name who gets a driver's license, my name has been transformed so that it fits the requirements and experience of license driver issue representatives.  For example, my first name of Ricardo has been transformed to Richard, but is most commonly changed to Richardo.  While I personally do not prefer Richardo, I have found it easier to recognize Richardo as one of my identifiers rather than insist to the world that I be called by preferred first name of Rick.


Tuesday, September 16, 2014 - 4:14:04 PM - Gene Wirchenko Back To Top (34552)
I see nothing in the above about handling names where the family name comes first. Names can be complicated beasts. Bear in mind this: http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/














get free sql tips
agree to terms