Learn more about SQL Server tools

 
 

Tutorials          DBA          Dev          BI          Career          Categories          Events          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Events

DBA    Dev    BI    Categories

 

Parsing a URL with SQL Server Functions


By:   |   Read Comments (2)   |   Related Tips: More > Functions - User Defined UDF

Problem

In my environment I have a few tables with thousands of rows that have URLs in particular columns.  We are going through some changes in the environment and need to be able to parse out the data (base URL, file name, query parameters) from the specific columns then move the parsed data to another database design.  Moving the data is no problem.  I plan on using SQL Server 2005 Integration Services and I have a good sense of how to use that tool.  By the way, all of the URLs are in varchar fields if that makes any difference.  Can you provide any recommendations on how to parse that data?

Solution

Although there are probably a few different ways to approach this task, let's break it apart and see how we can code each one of these scenarios with T-SQL commands in SQL Server 2005.  I assume you will be parsing the data from a single column to three new columns in a new database\table design.  Since you are familiar with SQL Server 2005 Integration Services, I will not dig into those details and just focus on how to capture the data that you need.  So let's take a look at each scenario and give an explanation on the T-SQL coding in SQL Server 2005.

Example Table and Data

Here an example table and example data for this tip:

USE zCustomerDB
GO

CREATE TABLE ExampleURLs (
URL_ID INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
CompleteURL VARCHAR(500))
GO
INSERT INTO [zCustomerDB].[dbo].[ExampleURLs] ([CompleteURL])
VALUES ('http://www.test.com/page.asp?tip=1156')
GO
INSERT INTO [zCustomerDB].[dbo].[ExampleURLs] ([CompleteURL])
VALUES ('http://www.test.com/page.asp?tip=1299')
GO
INSERT INTO [zCustomerDB].[dbo].[ExampleURLs] ([CompleteURL])
VALUES ('http://www.test.com/page.asp?tip=1262')
GO
INSERT INTO [zCustomerDB].[dbo].[ExampleURLs] ([CompleteURL])
VALUES ('http://www.test.com/page.asp?tip=1240')
GO
INSERT INTO [zCustomerDB].[dbo].[ExampleURLs] ([CompleteURL])
VALUES ('http://www.test.com/page.asp?tip=1226')
GO

Scenario 1 - Base URL

As an example, let's say the data you are working with looks like 'http://www.mssqltips.com/sqlservertip/1156/opportunities-with-sql-server-2005-express-edition/' and you need to capture the 'www.mssqltips.com' portion of the column.  Below is an example of what that code could look like.  This code is fairly straight forward with a single SUBSTRING command performing all of the heavy lifting.  The REVERSE command is used to find the last slash so the entire directory structure is returned.  The CHARINDEX command is used to determine the position of characters such as '//' in 'http://' and '/' in the URL directory path.

SELECT URL_ID,
CompleteURL, -- URL
(CHARINDEX('//', CompleteURL, 1) + 1), -- Position of the double slashes
CHARINDEX('/', REVERSE (CompleteURL), 1), -- Position of the last single slash
SUBSTRING(CompleteURL,
(
CHARINDEX('//', CompleteURL, 1) + 2),
CHARINDEX('/', REVERSE (CompleteURL), 1) ) -- Final string
FROM zCustomerDB.dbo.ExampleURLs
GO

Scenario 2 - File Name

In this scenario let's capture file name which is the 'tip.asp' portion of the 'http://www.mssqltips.com/sqlservertip/1156/opportunities-with-sql-server-2005-express-edition/' URL.  This code makes additional usage of the REVERSE and SUBSTRING commands, but the subtle portion of this code is positioning each portion of the SUBSTRING correctly.  This is accomplished by using the '+1' in the starting position (second position) of the SUBSTRING command and using a '-1' in the length position (third position) of the SUBSTRING command.  In this circumstance the CHARINDEX command is used to determine the position of the '?' and '/' characters.  Here is an example of what that code could look like:

SELECT URL_ID,
REVERSE (CompleteURL), -- Backwards version of the URL
CHARINDEX('/', REVERSE (CompleteURL), 1), -- Position of the slash
CHARINDEX('?', REVERSE (CompleteURL), 1), -- Position of the question mark
((CHARINDEX('/', REVERSE (CompleteURL), 1)) - (CHARINDEX('?', REVERSE (CompleteURL), 1))-1), -- Length
REVERSE(RTRIM(SUBSTRING(REVERSE (CompleteURL),
(
CHARINDEX('?', REVERSE (CompleteURL), 1)+1),
((
CHARINDEX('/', REVERSE (CompleteURL), 1)) - (CHARINDEX('?', REVERSE (CompleteURL), 1))- 1)))) -- Final parsed value
FROM zCustomerDB.dbo.ExampleURLs
GO

Scenario 3 - Query Parameters

In the final example capturing the query parameters is based on finding the correct position for the question mark '?' in the URL.  With the 'http://www.mssqltips.com/sqlservertip/1156/opportunities-with-sql-server-2005-express-edition/' sample data, you would need to capture the 'tip=1156' portion of the column.  Here is an example of what that code could look like by leveraging the REVERSE, CHARINDEX and SUBSTRING commands once again in a little different context:

SELECT URL_ID,
REVERSE (CompleteURL), -- Backwards version of the URL
CHARINDEX('?', REVERSE (CompleteURL), 1), -- Position of the question mark
REVERSE(RTRIM(SUBSTRING(REVERSE (CompleteURL), 1, CHARINDEX('?', REVERSE (CompleteURL), 1) - 1))) -- Final parsed value
FROM zCustomerDB.dbo.ExampleURLs
GO
Next Steps


Last Update:





About the author
MSSQLTips author Jeremy Kadlec Since 2002, Jeremy Kadlec has delivered value to the global SQL Server community as an Edgewood Solutions SQL Server Consultant, MSSQLTips.com co-founder and Baltimore SSUG co-leader.

View all my tips





More SQL Server Solutions




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


Get free SQL tips:

*Enter Code refresh code     



Thursday, September 04, 2014 - 3:08:51 AM - SnowwalkR Back To Top

Example 1 has a mistake. The SUBSTRING function has length as 3rd argument, not end position. And not every URL has a // or even a / in it.

To test that the SELECT works, add some rows:

INSERT INTO ExampleURLs SELECT 'www.mssqltips.com/sqlservertip/1325/parsing-a-url-with-sql-server-functions/';

INSERT INTO ExampleURLs SELECT 'www.mssqltips.com';

And the code to show the base URL:

;WITH Url AS
(
SELECT
    CompleteURL, -- URL
    CASE WHEN (CHARINDEX('//', CompleteURL, 1) + 2) > 2 THEN (CHARINDEX('//', CompleteURL, 1) + 2) ELSE 0 END FirstPos,
    /* Position of the double slashes. Not every Url has a // in it,  */
    CHARINDEX('/', CompleteURL, CASE WHEN (CHARINDEX('//', CompleteURL, 1) + 2) > 2 THEN (CHARINDEX('//', CompleteURL, 1) + 2) ELSE 0 END ) SecondPos,
    /* Position of the first single slash AFTER double slash */
    LEN(CompleteURL) + 1 TotalLength  
    /* url may or may not have a /, add 1 because substring may start from 0 */
FROM ExampleURLs
)
SELECT
    SUBSTRING(CompleteURL, FirstPos, CASE WHEN SecondPos > 0 THEN SecondPos ELSE TotalLength END - FirstPos  )
FROM Url;

Not pretty code...


Friday, December 27, 2013 - 4:36:22 PM - Paul Harvey Back To Top

ALTER FUNCTION dbo.spExtractDomainFromURL ( @strURL NVARCHAR(1000) )
RETURNS NVARCHAR(100)
--Posted at http://stackoverflow.com/a/20808097/391101
AS
    BEGIN
        --Strip Protocol
        SELECT  @strURL = SUBSTRING(@strURL, CHARINDEX('://', @strURL) + 3, 999)

        -- Strip www subdomain
        IF LEFT(@strURL, 4) = 'www.'
            SELECT  @strURL = SUBSTRING(@strURL, 5, 999)

        -- Strip Path
        IF CHARINDEX('/', @strURL) > 0
            SELECT  @strURL = LEFT(@strURL, CHARINDEX('/', @strURL) - 1)

        --Unless you iterate through a list of TLDs, you can't differentiate between subdomain.example.com and example.com.au
        --I decided against this because it's slower, and the TLD list requires maintenance

        RETURN @strURL
    END


Learn more about SQL Server tools