Problem
I need to extract all URLs from a source file such as an email, a web page or just text files. In this tip I will present a SQL Server stored procedure you can use to extract all URLs from a file.
Solution
The solution involves creating a T-SQL stored procedure in the SQL Server master database, called dbo.usp_extract_links, that accepts a single file name with the full path parameter and lists the URL links that exists in it.
Here is the stored procedure logic:
- The procedure creates a global temporary table that will store the input file contents.
- The procedure executes a dynamic T-SQL BULK INSERT statement that populates a global temporary table.
- The procedure opens a cursor that loops over the lines of the global temporary table.
- The procedure scans each line for a URL link (a href= tag value).
- If such value exists, it is stored in a table variable. If not, the procedure advances the cursor to the next loop iteration.
- When the loop ends, the table variable holds the URL link values.
- The global table is dropped and the cursor is deallocated.
- The table variable contents are displayed as output.
SQL Server Stored Procedure to Extract URL from Source File
-- ====================================================================================
-- Author:Eli Leiba
-- Create date: 2020-10-08
-- Description:extracts the URL Links from a source web (html) file
-- Result is the URL list
-- ====================================================================================
CREATE PROCEDURE dbo.usp_extract_links (@htmfile nvarchar(3000))
AS
BEGIN
DECLARE @bulkinstsql varchar (1000), @currline nvarchar(3000), @res nvarchar(2000), @link nvarchar(1000)
DECLARE @P1 int, @P2 int
DECLARE @RESTABLE TABLE (id INT identity, URLlink nvarchar(1000))
CREATE TABLE ##htmlTable (detline nvarchar(3000))
SET @bulkinstsql = 'BULK INSERT ##htmlTable from ' +'''' + @htmfile + '''';
EXEC (@bulkinstsql)
DECLARE c_line cursor for select detLine from ##htmlTable
OPEN c_line
FETCH NEXT FROM c_line INTO @currline
WHILE @@FETCH_STATUS = 0
BEGIN
SET @link = NULL;
SET @P1 = CHARINDEX ('href=', @currline,1)
IF (@p1 = 0) SET @link = null
ELSE
BEGIN
SET @res = SUBSTRING (@currline, @P1 + 6, len (@currline) - @P1 - 7)
SET @P2 = CHARINDEX ('>', @RES,1)
IF @p2 = 0 SET @link = null
ELSE
BEGIN
SELECT @link = substring (@res, 1, @p2 - 2)
END
END
IF @link is NOT NULL insert into @RESTABLE values (@link)
FETCH NEXT FROM c_line INTO @currline
END
CLOSE c_line
DEALLOCATE c_line
DROP table ##htmlTable
SELECT * from @RESTABLE
END
GO
Sample Stored Procedure Execution
Below is sample code from a web page. I saved the web page source code in an html file called sample.htm in c:\sqltips directory.
<!DOCTYPE html> <html class="no-js" lang="en"> <head> <meta http-equiv="Content-Language" content="en-us" /> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252" /> <link rel="SHORTCUT ICON" href="/favicon.ico" /> <meta content="width=device-width, initial-scale=1.0" name="viewport" /> <title>Join the MSSQLTips SQL Server Community</title> <meta name="description" content="Signup for the MSSQLTips.com newsletter and get free SQL Server tips delivered to you." /> <link rel="canonical" href="https://www.mssqltips.com/get-free-sql-server-tips/" /> <link rel="stylesheet" href="/css/foundation-minimized.css"> <link rel="stylesheet" href="/css/app.css"> </head> <body style="background-color:white;"> <a name="top"></a> <div class="sticky"> <nav class="top-bar" data-topbar role="navigation" "> <ul class="title-area"> <li class="name"> <a href="https://www.mssqltips.com/"><img alt="mssqltips logo" src="/images/mslogo_bw_89x45-2.png"></a> </li> <li class="toggle-topbar menu-icon"><a href="#"><span>Menu</span></a></li> </ul> <section class="top-bar-section"> <ul class="right"> <li class="has-dropdown"><a onclick="_gaq.push(['_trackEvent', 'hmenu', 'beginner', 'click']);" href="/sql-server-tutorials/">Beginner</a> <ul class="dropdown"> <li><a onclick="_gaq.push(['_trackEvent', 'hmenu', 'beginner', 'click']);" href="https://www.mssqltips.com/sqlservertip/1948/what-is-sql-server/">What is SQL Server?</a></li> <li><a onclick="_gaq.push(['_trackEvent', 'hmenu', 'beginner', 'click']);" href=" https://www.mssqltips.com/tutorial/sql-server-101-tutorial-outline-and-overview/">SQL Server 101</a></li> <li><a onclick="_gaq.push(['_trackEvent', 'hmenu', 'beginner', 'click']);" href="https://www.mssqltips.com/sqlservertip/6422/sql-server-concepts/">SQL Server Concepts</a></li> <li><a onclick="_gaq.push(['_trackEvent', 'hmenu', 'beginner', 'click']);" href="https://www.mssqltips.com/sqlservertip/6429/sql-server-download-quick-links/">SQL Server Download Links</a></li> <li><a onclick="_gaq.push(['_trackEvent', 'hmenu', 'beginner', 'click']);" href="https://www.mssqltips.com/tutorial/my-first-sql-server-business-intelligence-project-tutorial/">My First BI Project </a></li> <li><a onclick="_gaq.push(['_trackEvent', 'hmenu', 'beginner', 'click']);" href="https://www.mssqltips.com/sqlservertip/2677/sql-server-career-planning/">SQL Server Career Plan</a></li> <li><a onclick="_gaq.push(['_trackEvent', 'hmenu', 'beginner', 'click']);" href="https://www.mssqltips.com/sql-server-webcasts/">Webinars</a></li> <li><a onclick="_gaq.push(['_trackEvent', 'hmenu', 'beginner', 'click']);" href="https://www.mssqltips.com/sql-server-categories/">All Categories</a></li> </ul> </li> </body> </html>
The procedure is executed as follows:
exec dbo.usp_extract_links 'c:\sqltips\sample.htm'
Here are the results:

Next Steps
- You can create and compile this stored procedure in your master database and use it as a simple TSQL tool for extracting URL links from web page source HTML and JS code.
- The procedure was tested for SQL Server 2019, but should work with most versions of SQL Server.

Eli Leiba is a senior application DBA at Israel Electric Company, a teacher at Microsoft CTEC, and a senior database consultant. With 19 years of experience working with both SQL Server and Oracle RDBMS. He is certified in Oracle and SQL Server database administration and implementation and has a B.S. in Computer Science. He can be reached at: iecdba@hotmail.com.
- MSSQLTips Awards: Rising Star (50+ tips) – 2019 | Author of the Year Contender – 2016-2019


