By: Jeremy Kadlec | Comments (2) | Related: > Functions User Defined UDF
Problem
In my environment I have a few tables with thousands of rows that have URLs in particular columns. We are going through some changes in the environment and need to be able to parse out the data (base URL, file name, query parameters) from the specific columns then move the parsed data to another database design. Moving the data is no problem. I plan on using SQL Server Integration Services and I have a good sense of how to use that tool. By the way, all of the URLs are in varchar fields if that makes any difference. Can you provide any recommendations on how to parse that data?
Solution
Although there are probably a few different ways to approach this task, let's break it apart and see how we can code each one of these scenarios with T-SQL commands in SQL Server. I assume you will be parsing the data from a single column to three new columns in a new database\table design. Since you are familiar with SQL Server Integration Services, I will not dig into those details and just focus on how to capture the data that you need. So let's take a look at each scenario and give an explanation on the T-SQL coding in SQL Server.
Example Table and Data
Here an example table and example data for this tip:
USE zCustomerDB GO CREATE TABLE ExampleURLs ( URL_ID INT IDENTITY(1,1) PRIMARY KEY NOT NULL, CompleteURL VARCHAR(500)) GO INSERT INTO [zCustomerDB].[dbo].[ExampleURLs] ([CompleteURL]) VALUES ('http://www.test.com/page.asp?tip=1156') GO INSERT INTO [zCustomerDB].[dbo].[ExampleURLs] ([CompleteURL]) VALUES ('http://www.test.com/page.asp?tip=1299') GO INSERT INTO [zCustomerDB].[dbo].[ExampleURLs] ([CompleteURL]) VALUES ('http://www.test.com/page.asp?tip=1262') GO INSERT INTO [zCustomerDB].[dbo].[ExampleURLs] ([CompleteURL]) VALUES ('http://www.test.com/page.asp?tip=1240') GO INSERT INTO [zCustomerDB].[dbo].[ExampleURLs] ([CompleteURL]) VALUES ('http://www.test.com/page.asp?tip=1226') GO
Scenario 1 - Base URL
As an example, let's say the data you are working with looks like 'http://www.mssqltips.com/sqlservertip/1156/opportunities-with-sql-server-2005-express-edition/' and you need to capture the 'www.mssqltips.com' portion of the column. Below is an example of what that code could look like. This code is fairly straight forward with a single SUBSTRING command performing all of the heavy lifting. The REVERSE command is used to find the last slash so the entire directory structure is returned. The CHARINDEX command is used to determine the position of characters such as '//' in 'http://' and '/' in the URL directory path.
SELECT URL_ID, CompleteURL, -- URL (CHARINDEX('//', CompleteURL, 1) + 1), -- Position of the double slashes CHARINDEX('/', REVERSE (CompleteURL), 1), -- Position of the last single slash SUBSTRING(CompleteURL, (CHARINDEX('//', CompleteURL, 1) + 2), CHARINDEX('/', REVERSE (CompleteURL), 1) ) -- Final string FROM zCustomerDB.dbo.ExampleURLs GO
Scenario 2 - File Name
In this scenario let's capture file name which is the 'tip.asp' portion of the 'http://www.mssqltips.com/sqlservertip/1156/opportunities-with-sql-server-2005-express-edition/' URL. This code makes additional usage of the REVERSE and SUBSTRING commands, but the subtle portion of this code is positioning each portion of the SUBSTRING correctly. This is accomplished by using the '+1' in the starting position (second position) of the SUBSTRING command and using a '-1' in the length position (third position) of the SUBSTRING command. In this circumstance the CHARINDEX command is used to determine the position of the '?' and '/' characters. Here is an example of what that code could look like:
SELECT URL_ID, REVERSE (CompleteURL), -- Backwards version of the URL CHARINDEX('/', REVERSE (CompleteURL), 1), -- Position of the slash CHARINDEX('?', REVERSE (CompleteURL), 1), -- Position of the question mark ((CHARINDEX('/', REVERSE (CompleteURL), 1)) - (CHARINDEX('?', REVERSE (CompleteURL), 1))-1), -- Length REVERSE(RTRIM(SUBSTRING(REVERSE (CompleteURL), (CHARINDEX('?', REVERSE (CompleteURL), 1)+1), ((CHARINDEX('/', REVERSE (CompleteURL), 1)) - (CHARINDEX('?', REVERSE (CompleteURL), 1))- 1)))) -- Final parsed value FROM zCustomerDB.dbo.ExampleURLs GO
Scenario 3 - Query Parameters
In the final example capturing the query parameters is based on finding the correct position for the question mark '?' in the URL. With the 'http://www.mssqltips.com/sqlservertip/1156/opportunities-with-sql-server-2005-express-edition/' sample data, you would need to capture the 'tip=1156' portion of the column. Here is an example of what that code could look like by leveraging the REVERSE, CHARINDEX and SUBSTRING commands once again in a little different context:
SELECT URL_ID, REVERSE (CompleteURL), -- Backwards version of the URL CHARINDEX('?', REVERSE (CompleteURL), 1), -- Position of the question mark REVERSE(RTRIM(SUBSTRING(REVERSE (CompleteURL), 1, CHARINDEX('?', REVERSE (CompleteURL), 1) - 1))) -- Final parsed value FROM zCustomerDB.dbo.ExampleURLs GO
Next Steps
- Fire up SQL Server 2005 Management Studio, create the table, load the data and start running through this code. It is fairly simple, but could serve as a good first means to learn about manipulating strings.
- The next time you need to parse out portions of a URL or another string, consider some of the techniques used to accomplish the task in this tip. The SUBSTRING function is typically heavily used and the REVERSE came in handy this time based on the needs. So get creative!
- For additional text manipulation tips, check out:
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips