How to Extract URLs from HTML using Stored Procedure in SQL Server
By: Eli Leiba | Updated: 2020-12-10 | Comments (3) | Related: More > TSQL
Problem
I need to extract all URLs from a source file such as an email, a web page or just text files. In this tip I will present a SQL Server stored procedure you can use to extract all URLs from a file.
Solution
The solution involves creating a T-SQL stored procedure in the SQL Server master database, called dbo.usp_extract_links, that accepts a single file name with the full path parameter and lists the URL links that exists in it.
Here is the stored procedure logic:
- The procedure creates a global temporary table that will store the input file contents.
- The procedure executes a dynamic T-SQL BULK INSERT statement that populates a global temporary table.
- The procedure opens a cursor that loops over the lines of the global temporary table.
- The procedure scans each line for a URL link (a href= tag value).
- If such value exists, it is stored in a table variable. If not, the procedure advances the cursor to the next loop iteration.
- When the loop ends, the table variable holds the URL link values.
- The global table is dropped and the cursor is deallocated.
- The table variable contents are displayed as output.
SQL Server Stored Procedure to Extract URL from Source File
-- ==================================================================================== -- Author: Eli Leiba -- Create date: 2020-10-08 -- Description: extracts the URL Links from a source web (html) file -- Result is the URL list -- ==================================================================================== CREATE PROCEDURE dbo.usp_extract_links (@htmfile nvarchar(3000)) AS BEGIN DECLARE @bulkinstsql varchar (1000), @currline nvarchar(3000), @res nvarchar(2000), @link nvarchar(1000) DECLARE @P1 int, @P2 int DECLARE @RESTABLE TABLE (id INT identity, URLlink nvarchar(1000)) CREATE TABLE ##htmlTable (detline nvarchar(3000)) SET @bulkinstsql = 'BULK INSERT ##htmlTable from ' +'''' + @htmfile + ''''; EXEC (@bulkinstsql) DECLARE c_line cursor for select detLine from ##htmlTable OPEN c_line FETCH NEXT FROM c_line INTO @currline WHILE @@FETCH_STATUS = 0 BEGIN SET @link = NULL; SET @P1 = CHARINDEX ('href=', @currline,1) IF (@p1 = 0) SET @link = null ELSE BEGIN SET @res = SUBSTRING (@currline, @P1 + 6, len (@currline) - @P1 - 7) SET @P2 = CHARINDEX ('>', @RES,1) IF @p2 = 0 SET @link = null ELSE BEGIN SELECT @link = substring (@res, 1, @p2 - 2) END END IF @link is NOT NULL insert into @RESTABLE values (@link) FETCH NEXT FROM c_line INTO @currline END CLOSE c_line DEALLOCATE c_line DROP table ##htmlTable SELECT * from @RESTABLE END GO
Sample Stored Procedure Execution
Below is sample code from a web page. I saved the web page source code in an html file called sample.htm in c:\sqltips directory.
<!DOCTYPE html> <html class="no-js" lang="en"> <head> <meta http-equiv="Content-Language" content="en-us" /> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252" /> <link rel="SHORTCUT ICON" href="/favicon.ico" /> <meta content="width=device-width, initial-scale=1.0" name="viewport" /> <title>Join the MSSQLTips SQL Server Community</title> <meta name="description" content="Signup for the MSSQLTips.com newsletter and get free SQL Server tips delivered to you." /> <link rel="canonical" href="https://www.mssqltips.com/get-free-sql-server-tips/" /> <link rel="stylesheet" href="/css/foundation-minimized.css"> <link rel="stylesheet" href="/css/app.css"> </head> <body style="background-color:white;"> <a name="top"></a> <div class="sticky"> <nav class="top-bar" data-topbar role="navigation" "> <ul class="title-area"> <li class="name"> <a href="https://www.mssqltips.com/"><img alt="mssqltips logo" src="/images/mslogo_bw_89x45-2.png"></a> </li> <li class="toggle-topbar menu-icon"><a href="#"><span>Menu</span></a></li> </ul> <section class="top-bar-section"> <ul class="right"> <li class="has-dropdown"><a onclick="_gaq.push(['_trackEvent', 'hmenu', 'beginner', 'click']);" href="/sql-server-tutorials/">Beginner</a> <ul class="dropdown"> <li><a onclick="_gaq.push(['_trackEvent', 'hmenu', 'beginner', 'click']);" href="https://www.mssqltips.com/sqlservertip/1948/what-is-sql-server/">What is SQL Server?</a></li> <li><a onclick="_gaq.push(['_trackEvent', 'hmenu', 'beginner', 'click']);" href=" https://www.mssqltips.com/sqlservertutorial/9214/sql-server-101-tutorial-outline-and-overview/">SQL Server 101</a></li> <li><a onclick="_gaq.push(['_trackEvent', 'hmenu', 'beginner', 'click']);" href="https://www.mssqltips.com/sqlservertip/6422/sql-server-concepts/">SQL Server Concepts</a></li> <li><a onclick="_gaq.push(['_trackEvent', 'hmenu', 'beginner', 'click']);" href="https://www.mssqltips.com/sqlservertip/6429/sql-server-download-quick-links/">SQL Server Download Links</a></li> <li><a onclick="_gaq.push(['_trackEvent', 'hmenu', 'beginner', 'click']);" href="https://www.mssqltips.com/sqlservertutorial/3700/my-first-sql-server-business-intelligence-project-tutorial/">My First BI Project </a></li> <li><a onclick="_gaq.push(['_trackEvent', 'hmenu', 'beginner', 'click']);" href="https://www.mssqltips.com/sqlservertip/2677/sql-server-career-planning/">SQL Server Career Plan</a></li> <li><a onclick="_gaq.push(['_trackEvent', 'hmenu', 'beginner', 'click']);" href="https://www.mssqltips.com/sql-server-webcasts/">Webinars</a></li> <li><a onclick="_gaq.push(['_trackEvent', 'hmenu', 'beginner', 'click']);" href="https://www.mssqltips.com/sql-server-categories/">All Categories</a></li> </ul> </li> </body> </html>
The procedure is executed as follows:
exec dbo.usp_extract_links 'c:\sqltips\sample.htm'
Here are the results:

Next Steps
- You can create and compile this stored procedure in your master database and use it as a simple TSQL tool for extracting URL links from web page source HTML and JS code.
- The procedure was tested for SQL Server 2019, but should work with most versions of SQL Server.
About the author

View all my tips
Article Last Updated: 2020-12-10