SQL Server SUBSTRING Examples

By:   |   Updated: 2022-01-24   |   Comments (3)   |   Related: > TSQL


Problem

In this tip, we'll take a look at the SUBSTRING function, which is used to extract a string of characters out of a larger string. We'll cover the syntax of the SQL SUBSTRING function and demonstrate its usage with some examples.

Solution

A common scenario in database development – and in programming in general – is to do text manipulation. One of the more common use cases is when you need to extract a piece of a string. For example, a word out of a sentence, or the filename of an object out of a file path.

Microsoft SQL Server SUBSTRING Syntax

The SUBSTRING function has the following syntax:

SUBSTRING ( expression ,start , length )  

For example, SELECT SUBSTRING('Hello World',1,5) will yield the result "Hello". Keep in mind "start" and "length" can be expressions themselves as well, as long as they return a positive integer, negative integer or a bigint.

  • expression - is of the following data types: character (char, varchar, nchar or nvarchar), binary (binary or varbinary), text, ntext or image. If the expression returns NULL, SUBSTRING will also return NULL.
  • start - specifies the starting point of the first character you want to return. SQL Server starts counting at 1. So SUBSTRING('MSSQLTips',3,3) returns 'SQL', SUBSTRING('MSSQLTips',6,3) returns 'Tip'. If you have a length below 1, SQL Server will count back although it will not return blanks. SUBSTRING('MSSQLTips',0,5) returns 'MSSQ' (so only 4 characters), while SUBSTRING('MSSQLTips',-4,5) returns the empty string ''. If "start" is bigger than length, the empty string is also returned. SUBSTRING('MSSQLTips',15,5) returns ''. If "start" is NULL, the result is also NULL.
  • length - specifies the number of characters you want to return. If (length + start) is longer than the number of characters in "expression", the entire expression is returned. If the length is NULL, NULL is returned. If the length is negative, an error is returned.
Invalid length parameter passed to the substring function. error

Microsoft SQL Server String Functions

Let's suppose you want to select a substring from the start of the expression, or from the end. From the start, you can just enter 1 for the start parameter, but you can also use the shorthand function LEFT. The following two SQL statements are functionally equivalent:

the left function

If you want to start from the end of the expression, you can use the RIGHT function. It is functionally equivalent to SUBSTRING(expression, LEN(expression), -length), but of course this is not valid syntax for SQL Server. The RIGHT function can be "rewritten" as a LEFT function by using REVERSE, which returns the expression in reverse order.

right function

Keep in mind every character counts when using the substring functions, this also includes white space and non-printable characters. To clean up your text, you can use the LTRIM, RTRIM or TRIM functions. The TRIM function (introduced in SQL Server 2017) is a bit more versatile because it removes from both ends of the string at once, but it can also remove other characters than spaces. If you're using a version lower than SQL Server 2017, you can implement similar behavior using the REPLACE function.

Parsing Phone Number in SQL Server

The phone numbers in the US and Canada consists of 10 digits: 3 digits for the area code, 3 digits for the central office code and 4 digits for the line number. An example is (415) 555 - 2671. In this example, we want to extract the three parts of the phone number.

First, we can get rid of all the extra characters using the TRIM and REPLACE functions:

SELECT
     PhoneNumber    = '(415) 555 - 2671'
    ,CleanedNumber  = REPLACE( REPLACE(  TRIM(' (' FROM '(415) 555 - 2671') ,') ','') ,' - ','')
cleaning phone number

Now that only the digits remain, we can extract the different parts using SUBSTRING:

WITH CTE_Cleaning AS
(
    SELECT
         PhoneNumber    = '(415) 555 - 2671'
        ,CleanedNumber  = REPLACE(
                            REPLACE(  TRIM(' (' FROM '(415) 555 - 2671')
                                    ,') ','')
                            ,' - ','')
)
SELECT
     [PhoneNumber]
    ,AreaCode           = SUBSTRING([CleanedNumber],1,3)
    ,CentralOfficeCode  = SUBSTRING([CleanedNumber],4,3)
    ,LineNumber         = SUBSTRING([CleanedNumber],7,4)
FROM [CTE_Cleaning];

The result:

parsing phone number

Parsing an Email Address in SQL Server

Telephone numbers have a fixed structure, but email addresses are a bit more tricky to parse since you don't know their exact length upfront. An email address has the following format:

<recipient>@domain

Where domain = <domain name>.<top-level domain>

In this example, we're assuming there's only one @ symbol present in the email address. Technically, you can have multiple @ symbols, where the last one is the delimiter between the recipient and the domain. This is for example a valid email address: "user@company"@company.com. This is out of scope for this tip.

Using the CHARINDEX function, we can find the position of the @. The recipient can then be found by taking the start of the email address right until the @ symbol. The domain is everything that comes after the @ symbol. If you also want to extract the top-level domain, you cannot search for the first dot starting from the right, since some top-level domains have a dot, for example co.uk. Let's see how we can parse the email address [email protected].

WITH cte_sample AS
(
    SELECT email = '[email protected]'
)
SELECT
     email
    ,recipient  = SUBSTRING(email,1,CHARINDEX('@',email,1) - 1)
    ,fulldomain = SUBSTRING(email,CHARINDEX('@',email,1) + 1,LEN(email))
    ,domainname = SUBSTRING( email
                            ,CHARINDEX('@',email,1) + 1 -- start is one char after the @
                            ,     -- starting position of charindex is the position of @
                                  CHARINDEX('.',email,CHARINDEX('@',email,1))  
                                - CHARINDEX('@',email,1)
                            -- length is the position of the first dot after the @ - position of the @
                            )
    ,toplevel   = SUBSTRING( email
                            ,CHARINDEX('.',email,CHARINDEX('@',email,1)) + 1 -- position of first dot after @
                            ,LEN(email)
                            )
FROM [cte_sample];

For both the full domain as for the top-level domain, we specified LEN(email) which is too long. However, if the length specified for SUBSTRING is longer than there are characters, everything is returned until the last character. This avoids having us writing a more complicated expression to calculate the correct length, while the only thing we want is the substring from a certain position right until the end of the string.

Extracting File Name from a File Path in SQL Server

When you want to extract the file name from a file path, you have to start from the right. There can be any number of subfolder between the root and your file:

/rootfolder/subfolder1/subfolder2/…/subfolderN/myfile.csv

(a lot of programming languages and non-Windows operating systems use forward slashes in their file paths. Windows uses backwards slashes – which is typically a reserved symbol in many programming languages – but it can accept forward slashes as well)

If we want to extract the file name, we can use the following SQL statement, where we use the REVERSE function to flip the string around, since CHARINDEX can only search from left to right:

WITH cte_sample AS
(
    SELECT filepath = '/rootfolder/subfolder/myfile.csv'
)
SELECT
     filepath
    ,fullfilename   = REVERSE(SUBSTRING(REVERSE(filepath),1,CHARINDEX('/',REVERSE(filepath)) - 1))
FROM [cte_sample];

The result:

extracted file name

If you want the file name without the extension, we have to start the search after the first dot from the right. We can modify the SQL statement as follows:

WITH cte_sample AS
(
    SELECT filepath = '/rootfolder/subfolder/myfile.csv'
)
SELECT
     filepath
    ,fullfilename           = REVERSE(SUBSTRING(REVERSE(filepath),1,CHARINDEX('/',REVERSE(filepath)) - 1))
    ,filenamenoextension    = REVERSE(SUBSTRING(
                                         REVERSE(filepath)
                                        ,CHARINDEX('.',REVERSE(filepath)) + 1
                                        ,CHARINDEX('/',REVERSE(filepath)) - CHARINDEX('.',REVERSE(filepath)) -1
                                      )
                              )
FROM [cte_sample];

The start position for the SUBSTRING function is the position of the first dot +1 extra position shifted to the left (or right when the string is reversed). The length is the position of the first slash minus the position of the first dot minus one (to remove the slash itself). This results in the following:

file name without extension

Parsing an URL is similar to parsing a file path, and various use cases are described in the tip Parsing a URL with SQL Server Functions.

Prefix Extraction in SQL Server

Suppose we have a free text column containing the full names of people. We want to extract the prefixes, such as Mr., Mrs., Ms., Dr. etc. Since it's nearly impossible to write one SQL statement that is able to parse all known prefixes, let's assume the prefix is always abbreviated with a dot followed by a space (between the prefix and the first name). First, we're going to check if there even is a prefix. Using the CHARINDEX function, we can find if there's a dot followed by a space, then we use the LEFT function to extract the prefix. If there's no prefix, CHARINDEX returns zero.

WITH cte_sampledata AS
(
    SELECT FullName = 'Ms. Doubtfire'
    UNION ALL
    SELECT 'Dr. Zeuss'
    UNION ALL
    SELECT 'Ir. Koen Verbeeck'
    UNION ALL 
    SELECT 'Mr. Bruce Wayne'
    UNION ALL
    SELECT 'Selina Kyle'
)
SELECT
     FullName
    ,Prefix = IIF(   CHARINDEX('. ',FullName) = 0
                    ,'No Prefix Found'
                    ,LEFT(FullName,CHARINDEX('. ',FullName))
                )
FROM cte_sampledata;

This returns the following results:

prefix extraction

If you only want a selected list of prefixes to extract, you can use a lookup table as an alternative. In the following piece of code, a lookup table is added (using a common table expression, but in reality you probably want to persist this table) and it is matched against the full name using a LEFT OUTER JOIN. In the code, we search for the first space in the full name. If there's no prefix, most likely the first name (or a part of the first name) is returned. In this case, the join doesn't return a match.

WITH cte_sampledata AS
(
    SELECT FullName = 'Ms. Doubtfire'
    UNION ALL
    SELECT 'Dr. Zeuss'
    UNION ALL
    SELECT 'Ir. Koen Verbeeck'
    UNION ALL 
    SELECT 'Mr. Bruce Wayne'
    UNION ALL
    SELECT 'Selina Kyle'
)
, cte_lookuplist AS
(
    SELECT Prefix = 'Ms.'
    UNION ALL 
    SELECT 'Dr.'
    UNION ALL
    SELECT 'Ir.'
    UNION ALL
    SELECT 'Mr.'
)
SELECT
     s.FullName
    ,Prefix = ISNULL(CONVERT(VARCHAR(20),l.Prefix),'No Prefix Found')
FROM cte_sampledata        s
LEFT JOIN cte_lookuplist   l ON l.[Prefix] = LEFT(s.FullName,ABS(CHARINDEX(' ',s.FullName)-1));

The check with CHARINDEX to see if there's an actual space is dropped, but if there's a name without a space, CHARINDEX will return zero. Subtracted by one, the length is negative and LEFT will return the error previously mentioned. To resolve this issue, the ABS function is used. In the unlikely case the name does not have a space, the first character will be returned.

The code has the following output:

prefix extraction with lookup table

Suffix Extraction in SQL Server

Extracting suffixes from a name is similar as extracting prefixes, but there are some differences. Of course, suffixes come after a name, which means it can be preceded with any number of spaces. A suffix doesn't necessarily end with a dot. You have Jr. and Sr., but also PhD, the 3rd and Roman numerals. Since a suffix can contain dots itself (M.B.A. or D.D. for example) and possibly spaces as well (e.g. when III as written out as "the third"), it becomes a bit harder to define a cut-off point to extract the suffix. If we make the assumption there are no spaces and the suffix is always at the end of the string, we can use a combination of REVERSE and LEFT to extract the suffix. REVERSE is used since in SQL Server we cannot tell CHARINDEX to start searching from the end of an expression.

WITH cte_sampledata AS
(
    SELECT FullName = 'Indiana Jones Jr.'
    UNION ALL
    SELECT 'Marshall Mathers 3rd'
    UNION ALL
    SELECT 'Martin Luther King Sr.'
    UNION ALL 
    SELECT 'King Albert I'
)
 
SELECT
     s.FullName
    ,Suffix = REVERSE(LEFT(REVERSE(s.FullName),CHARINDEX(' ',REVERSE(s.FullName))-1))
FROM cte_sampledata s;

The problem with this solution is that if there is no suffix, most likely the last name (or part thereof) is returned.

suffix with left and reverse

Working with a lookup list of known suffixes might yield better results in this case. If we use the LIKE operation in the LEFT JOIN, we can even match suffixes with spaces in them.

WITH cte_sampledata AS
(
    SELECT FullName = 'Indiana Jones Jr.'
    UNION ALL
    SELECT 'Marshall Mathers 3rd'
    UNION ALL
    SELECT 'Martin Luther King Sr.'
    UNION ALL 
    SELECT 'King Albert I'
    UNION ALL
    SELECT 'Bruce Wayne'
    UNION ALL
    SELECT 'Alec Baldwin the third'
)
, cte_lookuplist AS
(
    SELECT Suffix = 'Jr.'
    UNION ALL 
    SELECT 'Sr.'
    UNION ALL
    SELECT 'I'
    UNION ALL
    SELECT 'II'
    UNION ALL
    SELECT 'III'
    UNION ALL
    SELECT '3rd'
    UNION ALL
    SELECT 'the third'
)
SELECT
     s.FullName
    ,Suffix = ISNULL(l.Suffix,'N/A')
FROM cte_sampledata s
LEFT JOIN cte_lookuplist   l ON s.FullName LIKE '%' + l.Suffix;

The result:

suffix with lookup table

It's clear working with a suffix table can be more flexible than writing an expression using SUBSTRING, but you have to maintain a list of known suffixes and all of their variations. To avoid case sensitivity issues, you can use the UPPER function on both sides of the LEFT JOIN to make everything upper case. Also be aware that using roman numerals can lead to issues. I, II and III will all be matched by I, as well as any name ending in i (if there's no case sensitivity).

Extracting Values from a Comma Separated List in SQL Server

Let's assume we have a comma separated list and we want to retrieve a specific item from that list, for example the 2nd item. The first and last item are easier, as you can just search for the first comma and extract from either the start or the end of the string, as has been demonstrated in the previous paragraphs. But when you need an item of which you don't know the exact position, it gets trickier.

The problem is SQL Server currently doesn't support this scenario with the built-in functions.

You could use STRING_SPLIT to split out the list into its individual items, but unfortunately the function only returns the members of the list but not an ordinal column (a column containing the index of each member). Also, it doesn't guarantee any ordering, so you cannot calculate the ordinal column yourself. The following example splits a couple of arrays using STRING_SPLIT.

WITH sampleData AS
(
    SELECT myArray = 'one,two,three,four,five'
    UNION ALL
    SELECT myArray = 'one,two,three,four'
    UNION ALL
    SELECT 'Hello,Readers,Of,The,MSSQLTips,Website,!'
)
SELECT
     [myArray]
    ,value
FROM [sampleData]
CROSS APPLY STRING_SPLIT([myArray],',');
STRING_SPLIT function

It seems everything is returned in the correct order (again, this is not guaranteed), but we have no way of telling SQL Server we want the second element of each list. We could try to add an index column using ROW_NUMBER(), but what will we sort on?

Another option would be to use CHARINDEX, but unfortunately, we cannot specify we need a specific occurrence of the expression we're seeking. For example, it's not possible to specify we need the 2nd occurrence of the comma delimiter. CHARINDEX only allows you to set a starting position.

So let's use a ready-to-use function that splits a list and returns an index column as well, laid out in this excellent solution by Jeff Moden. The function heavily uses "Tally" tables or number tables. Examples of such tables can be found in the tip How to Expand a Range of Dates into Rows using a Numbers Table. I'd recommend to fully read Jeff's article, as it demonstrates a very clever use of tally tables, SUBSTRING and CHARINDEX to get the job done.

We can use the table function on a set of sample data by using CROSS APPLY.

WITH sampleData AS
(
    SELECT myArray = 'one,two,three,four,five'
    UNION ALL
    SELECT myArray = 'one,two,three,four'
    UNION ALL
    SELECT 'Hello,Readers,Of,The,MSSQLTips,Website,!'
    UNION ALL
    SELECT 'one,two'
    UNION ALL
    SELECT 'one'
)
SELECT
     s.[myArray]
    ,d.[ItemNumber]
    ,d.[Item]
FROM sampleData s
CROSS APPLY [dbo].[DelimitedSplit8K](s.myArray,',') d;

We get the following result:

using the delimited split function

Extracting the item from the list is as simple as filtering on the ItemNumber. In the next SQL snippet, we extract the second item of each list. Observer the list with one item does not return a result.

WITH sampleData AS
(
    SELECT myArray = 'one,two,three,four,five'
    UNION ALL
    SELECT myArray = 'one,two,three,four'
    UNION ALL
    SELECT 'Hello,Readers,Of,The,MSSQLTips,Website,!'
    UNION ALL
    SELECT 'one,two'
    UNION ALL
    SELECT 'one'
)
SELECT
     s.[myArray]
    ,d.[ItemNumber]
    ,d.[Item]
FROM sampleData s
CROSS APPLY [dbo].[DelimitedSplit8K](s.myArray,',') d
WHERE d.[ItemNumber] = 2;

You can of course parameterize this filter.

Median Search in SQL Server

Let's assume we have again a comma separated list and we want to retrieve the middle value of this list. For example, if we have the string 'one,two,three,four,five', we want to return the string 'three'. In the case of 'one,two,three,four', there's no discrete median value (in mathematics this would be 2.5 but we cannot apply this to strings). So let's make the assumption we will return the string right before the median, in this case 'two'.

There are a couple of steps we need to take to be able to extract the middle value out of a list:

  1. First, we need to determine how many items the list has. We can do this by counting the number of delimiters. A list of 10 items has 9 delimiters.
  2. Once we now the number of items, we can determine which is the middle member. This can be done by assigning an index to each member and calculating the medium of those indexes. For example, for a list of 7 members we have the following indexes: [1,2,3,4,5,6,7]. The median of those values is 4.
  3. Once we know the index of the median member, we can extract it using the solution provided in the previous paragraph.

Let's see how we can translate all those steps to T-SQL. Counting the number of items can be done by getting the full length of the list and subtracting it with the list with all of the delimiters removed. The difference gives us the number of times the delimiter appeared in the string. This technique can also generalized: you can use it to find out how many occurrences a certain word occurs in a text.

The following SQL statement calculates the number of items for 3 sample lists:

WITH sampleData AS
(
    SELECT myArray = 'one,two,three,four,five'
    UNION ALL
    SELECT myArray = 'one,two,three,four'
    UNION ALL
    SELECT 'Hello,Readers,Of,The,MSSQLTips,Website,!'
)
SELECT
     myArray
    ,numberOfMembers = LEN(myArray) - LEN(REPLACE(myArray,',','')) + 1
FROM sampleData;

The result:

counting number of members

To calculate the medium, we can repeat the list for each member of the list. Each repetition gets its own unique number. This can be again be done using a "tally table" or numbers table:

WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1)
    ,E02(N) AS (SELECT 1 FROM E00 a, E00 b)
    ,E04(N) AS (SELECT 1 FROM E02 a, E02 b)
    ,E08(N) AS (SELECT 1 FROM E04 a, E04 b)
    ,E16(N) AS (SELECT 1 FROM E08 a, E08 b)
    ,cteTally AS (SELECT N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E16)
    ,sampleData AS
(
    SELECT myArray = 'one,two,three,four,five'
    UNION ALL
    SELECT myArray = 'one,two,three,four'
    UNION ALL
    SELECT 'Hello,Readers,Of,The,MSSQLTips,Website,!'
)
    ,countmembers AS
(
    SELECT
         myArray
        ,numberOfMembers = LEN(myArray) - LEN(REPLACE(myArray,',','')) + 1
    FROM sampleData
)
SELECT N, myArray
FROM  cteTally      t
JOIN  countmembers  c ON t.N <= c.[numberOfMembers]
ORDER BY myArray, N;

This gives us the following result set:

exploded lists

Using the unique row numbers from the previous step, we can now calculate the median index using the PERCENTILE_DISC function. We use the discrete function, as it will always return an index that exists in the set of values. For example, for the list of indexes [1,2,3,4] it will return 2 and not 2.5.

WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1)
    ,E02(N) AS (SELECT 1 FROM E00 a, E00 b)
    ,E04(N) AS (SELECT 1 FROM E02 a, E02 b)
    ,E08(N) AS (SELECT 1 FROM E04 a, E04 b)
    ,E16(N) AS (SELECT 1 FROM E08 a, E08 b)
    ,cteTally AS (SELECT N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E16)
    ,sampleData AS
(
    SELECT myArray = 'one,two,three,four,five'
    UNION ALL
    SELECT myArray = 'one,two,three,four'
    UNION ALL
    SELECT 'Hello,Readers,Of,The,MSSQLTips,Website,!'
)
    ,countmembers AS
(
    SELECT
         myArray
        ,numberOfMembers = LEN(myArray) - LEN(REPLACE(myArray,',','')) + 1
    FROM sampleData
)
    ,explodedArray AS
(
    SELECT N, myArray
    FROM  cteTally      t
    JOIN  countmembers  c ON t.N <= c.[numberOfMembers]
)
SELECT DISTINCT
     myArray
    ,medianMemberIndex = PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY N) OVER (PARTITION BY myArray)
FROM explodedArray;

We can verify the calculation works:

returning middle value index

We can combine this SQL statement with the solution of the paragraph. The final SQL statement looks like this:

WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1)
    ,E02(N) AS (SELECT 1 FROM E00 a, E00 b)
    ,E04(N) AS (SELECT 1 FROM E02 a, E02 b)
    ,E08(N) AS (SELECT 1 FROM E04 a, E04 b)
    ,E16(N) AS (SELECT 1 FROM E08 a, E08 b)
    ,cteTally AS (SELECT N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E16)
    ,sampleData AS
(
    SELECT myArray = 'one,two,three,four,five'
    UNION ALL
    SELECT myArray = 'one,two,three,four'
    UNION ALL
    SELECT 'Hello,Readers,Of,The,MSSQLTips,Website,!'
    UNION ALL
    SELECT 'one,two'
    UNION ALL
    SELECT 'one'
)
    ,countmembers AS
(
    SELECT
         myArray
        ,numberOfMembers = LEN(myArray) - LEN(REPLACE(myArray,',','')) + 1
    FROM sampleData
)
    ,explodedArray AS
(
    SELECT N, myArray
    FROM  cteTally      t
    JOIN  countmembers  c ON t.N <= c.[numberOfMembers]
)
    ,medianposition AS
(
    SELECT DISTINCT
         myArray
        ,medianMemberIndex = PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY N) OVER (PARTITION BY myArray)
    FROM explodedArray
)
SELECT
     m.myArray
    ,m.medianMemberIndex
    ,d.[Item]
FROM [medianposition]                               m
CROSS APPLY [dbo].[DelimitedSplit8K](m.myArray,',') d
WHERE d.[ItemNumber] = m.[medianMemberIndex];

In the sample data, two arrays with respectively one and two members are added again to demonstrate the edge cases. The calculation of the starting position now uses the ISNULL function to check for NULL values. This will ensure the SQL code returns an item when there's only one item in the list.

median member extracted

Performance Impact of using SQL Server SUBSTRING

Using SUBSTRING in the SELECT clause will not have a noticeable performance impact. But using SUBSTRING – or any other function – in a WHERE clause or JOIN can lead to issues when the function prevents an index from being used. This is the general problem of SARGable queries, which is explained in the following article: Non-SARGable Predicates.

In short, this is bad:

WHERE SUBSTRING(mycolumn,start,length) = 'someText'

But this is better:

WHERE mycolumn = ‘someOtherText'

In the second example, if there's an index on the mycolumn column, it can actually be used by the engine. Let's illustrate this with an example. In the AdventureWorks2017 data warehouse, we're going to search for all employees for which the central office code is not 555:

non-sargable query

Assume the employee table is very large and we want this query to run as efficient as possible, so we create an index on the Phone column.

CREATE INDEX  IX_Employee_Phone ON [AdventureWorksDW2017].[dbo].[DimEmployee](Phone);

However, if we run the query, the index is not used:

No index seek

If we checked the first characters of the phone number, we could use the following format:

WHERE Phone LIKE '555%'

In this case, the index will be used. But we're searching for a piece of text in the middle of our column. How can we solve this? We can add a persisted calculated column to the table which pre-calculates the central office code:

ALTER TABLE [AdventureWorksDW2017].[dbo].[DimEmployee]
ADD [PhoneCentralOfficeCode] AS SUBSTRING(Phone,5,3) PERSISTED;

Then we can add an index on this column (we're going to create a covering index including all of the columns from the SELECT clause, otherwise the optimizer chooses for a clustered index scan of the primary key since the Employee table is quite small):

CREATE NONCLUSTERED INDEX [IX_Employee_PhoneCentralOfficeCode] ON [AdventureWorksDW2017]. [dbo].[DimEmployee]
(
   [PhoneCentralOfficeCode] ASC
)
INCLUDE([EmployeeKey],[FirstName],[LastName],[Phone]);

When we now run the following query, we can see the index is being used:

SELECT
     [EmployeeKey]
    ,[FirstName]
    ,[LastName]
    ,[Phone]
FROM [AdventureWorksDW2017].[dbo].[DimEmployee]
WHERE [PhoneCentralOfficeCode] = '555';
index seek on calculated persisted column
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Koen Verbeeck Koen Verbeeck is a seasoned business intelligence consultant at AE. He has over a decade of experience with the Microsoft Data Platform in numerous industries. He holds several certifications and is a prolific writer contributing content about SSIS, ADF, SSAS, SSRS, MDS, Power BI, Snowflake and Azure services. He has spoken at PASS, SQLBits, dataMinds Connect and delivers webinars on MSSQLTips.com. Koen has been awarded the Microsoft MVP data platform award for many years.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2022-01-24

Comments For This Article




Friday, August 14, 2020 - 1:32:41 PM - Jeff Moden Back To Top (86306)
@Koen,

Thanks for the great article and thanks for teaching me something new. When I saw the FROM in the TRIM, I thought "What the hell is that all about"?

Turns out I learned two things from that. Of course, the first thing is that particular form of parameters for the TRIM function but... the best thing I learned from that is this...

I learned how to use TRIM functions long before I ever learned T-SQL. With that, I've NEVER looked at the documentation for the TRIM function in T-SQL.

What I learned from that was that I violated my own rule... if you've not read the documentation nor tried some of the examples even on what looks like something totally obvious based on prior experience with other things, you've made a mistake and need to go read the documentation.

As a good, long-distance friend told me way back in 2007 (and are pretty good words to live by), "A Developer must not guess (or assume, in this case)... A Developer must KNOW!"

And thank you for the honorable mention.

Friday, August 14, 2020 - 1:22:58 PM - Jeff Moden Back To Top (86305)
@jim

There's no reason to RTRIM a phone number because the actual number will ultimately be "slammed to the left" and the substrings that Koen used will pick out the pieces correctly IF a 10 digit number was used in the original string.

I'd also wait to do the trim until the REPLACEs where done.

Thursday, August 13, 2020 - 2:49:08 AM - jim Back To Top (86298)
hi,
nice post but IMHO,
the right sql for Parsing Phone Number in SQL Server is below
SELECT
PhoneNumber = '(415) 555 - 2671'
,CleanedNumber = replace(REPLACE(rEPLACE(ltrim(rTRIM('((415) 555 - 2671')) ,') ','') ,' - ',''), '(', '')

thx
jim

p.s.
I put rtrim(ltrim( because of my sql version (2016)














get free sql tips
agree to terms