SQL Server function to validate email addresses

By:   |   Comments (1)   |   Related: 1 | 2 | 3 | 4 | > Functions User Defined UDF


Problem

When you send e-mail to large lists, validating the e-mail addresses before sending out the e-mail is worth doing to prevent having mail rejection messages clog up your mail server.  I had been doing the validation with a T-SQL User Defined Function (UDF), but it was getting too slow as the number of e-mails grew.  I needed a faster alternative, what options do I have?

Solution

While I could have improved the T-SQL in my function validating e-mail addresses, which requires a lot of string handling, T-SQL isn't the best technology for this problem. When the task at hand is one that .Net code handles better then T-SQL a SQLCLR scalar function can be a great solution. When introduced in SQL Server 2005, the SQLCLR allows stored procedures, functions, triggers, user-defined types and user-defined aggregates to be written in C#, VB.Net or C++.   I showed how to write a SQLCLR stored procedure in the tip Writing to an operating system file using the SQL Server SQLCLR.  This tip will show how to write a SQLCLR scalar function and compare it to the original T-SQL.

Although I acknowledge my T-SQL UDF is not totally optimized, in my tests of validating 10,000 email addresses the SQLCLR took 53ms and the T-SQL UDF took 14,822ms.  That is a significant improvement, so read on...


Lets take a look at the original T-SQL UDF.  Here's the code:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[udf_Txt_IsEmail] (
 @EmailAddr varchar(255) -- Email address to check
)   RETURNS BIT -- 1 if @EmailAddr is a valid email address
/*
* Checks an text string to be sure it's a valid e-mail address.
* Returns 1 when it is, otherwise 0.
* Example:
SELECT CASE WHEN 1=dbo.udf_Txt_IsEmail('[email protected]')
    THEN 'Is an e-mail address' ELSE 'Not an e-mail address' END
*
* Test:
print case when 1=dbo.udf_txt_isEmail('[email protected]')
       then 'Passes' else 'Fails' end + ' test for good addr'
print case when 0=dbo.udf_txt_isEmail('@novicksoftware.com')
       then 'Passes' else 'Fails' end + ' test for no user'
print case when 0=dbo.udf_txt_isEmail('[email protected]')
       then 'Passes' else 'Fails' end + ' test for 1 char domain'
print case when 1=dbo.udf_txt_isEmail('[email protected]')
       then 'Passes' else 'Fails' end + ' test for 2 char domain'
print case when 0=dbo.udf_txt_isEmail('[email protected]')
       then 'Passes' else 'Fails' end + ' test for no domain'
print case when 0=dbo.udf_txt_isEmail('anov [email protected]')
       then 'Passes' else 'Fails' end + ' test for space in name'
print case when 0=dbo.udf_txt_isEmail('ano#[email protected]')
       then 'Passes' else 'Fails' end + ' test for # in user'
print case when 0=dbo.udf_txt_isEmail('anovick@novick*software.com')
       then 'Passes' else 'Fails' end + ' test for * asterisk in domain'
****************************************************************/
AS BEGIN
DECLARE @AlphabetPlus VARCHAR(255)
      , @Max INT -- Length of the address
      , @Pos INT -- Position in @EmailAddr
      , @OK BIT  -- Is @EmailAddr OK
-- Check basic conditions
IF @EmailAddr IS NULL 
   OR NOT @EmailAddr LIKE '_%@__%.__%' 
   OR CHARINDEX(' ',LTRIM(RTRIM(@EmailAddr))) > 0
       RETURN(0)
SELECT @AlphabetPlus = 'abcdefghijklmnopqrstuvwxyz01234567890_-.@'
     , @Max = LEN(@EmailAddr)
     , @Pos = 0
     , @OK = 1
WHILE @Pos < @Max AND @OK = 1 BEGIN
    SET @Pos = @Pos + 1
    IF NOT @AlphabetPlus LIKE '%' 
                             + SUBSTRING(@EmailAddr, @Pos, 1) 
                             + '%' 
        SET @OK = 0
END -- WHILE
RETURN @OK
END
go
  
GRANT EXEC on dbo.udf_txt_isEmail to PUBLIC
go

It's not the most efficient of code.  That's why I have to update it. 

One of the features of this function is that the header comment includes a series of tests.  While these might be not be the most complete set of tests for an e-mail validation function, there the ones that I've been using for awhile.  They're easy to execute.  In SSMS, just select the text and execute the script.  It prints out a series of Passed/Failed messages about each test.  Later in the tip they'll be used to validate the new SQLCLR function.

The SQLCLR alternative will use a regular expression to validate the e-mail address.  Regular expressions are expressions in a mini-language specific to searching text strings.  They are perfectly suited to the task of validating e-mail addresses.   There are variations between implementations of the regular expression syntax, but most adhere to similar rules.  The .Net syntax is described in the documentation for .Net.  This link, http://msdn.microsoft.com/en-us/library/hs600312(VS.71).aspx is a good place to start.  There are many tutorials that you can find on-line if you want to learn more.  There are also libraries of regular expressions that can be found.  I sometimes use expressions from http://regexlib.com/.  For validating e-mails I've chosen an expression from the MSDN .Net documentation in an example that shows how to validate e-mail addresses.  You can find it in your Visual Studio documentation or on-line at http://msdn.microsoft.com/en-us/library/01escwtf(VS.71).aspx

Previous tips on the SQLCLR have shown how to create a SQLCLR project, Writing to an operating system file using the SQL Server SQLCLR, or to compile a .Net program from the command line and then load it into SQL Server CLR function to delete older backup and log files in SQL Server, so this tip will concentrate on the validation code and testing for performance differences.  I started with the existing project ns_txt and used the menu command Project/Add User-Defined Function.  I gave the function the name ns_txt_email and Visual Studio created a prototype function as shown here:

sqlclr function ns txt email just created

Writing the code was pretty simple, given that there was an example in the Visual Studio documentation.  There were a few steps that I needed to take:

  • Add a using statement for the System.Text.RegularExpressions namespace
  • Change the return type of the function to SqlBoolean
  • Add the parameter email with type SqlString
  • Change the return statement to return a new SqlBoolean with the result of the Regex.IsMatch function as it's value

Here's the code after I was done:

ns txt email sqlclr function

The regular expression is a long string constant that is cut off in the picture.  The full string is:

^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$

The Regex.IsMatch function returns a CLR Boolean, which is returned by the function as a SqlBoolean.  This is the type that best corresponds to the T-SQL type BIT.

The next task is to build and deploy the project and try the function.  The function ns_txt_email can be used from SSMS, but it can also be used. from Visual Studio.  For this project I decided to test by putting my tests in the Test.sql script that Visual Studio provides as part of a Database project. You can add scripts in the "Test Scripts" folder.  Execute them by right clicking on the script and choosing "Debug Script".   My test script for the SQLCLR function is here:

print case when 1=dbo.ns_txt_email('[email protected]')
       then 'Passes' else 'Fails' end + ' test for good addr'
print case when 0=dbo.ns_txt_email('@novicksoftware.com')
       then 'Passes' else 'Fails' end + ' test for no user'
print case when 0=dbo.ns_txt_email('[email protected]')
       then 'Passes' else 'Fails' end + ' test for 1 char in tld'
print case when 1=dbo.ns_txt_email('[email protected]')
       then 'Passes' else 'Fails' end + ' test for 2 char in tld'
print case when 0=dbo.ns_txt_email('[email protected]')
       then 'Passes' else 'Fails' end + ' test for no domain'
print case when 0=dbo.ns_txt_email('anov [email protected]')
       then 'Passes' else 'Fails' end + ' test for space in name'
print case when 0=dbo.ns_txt_email('ano#[email protected]')
       then 'Passes' else 'Fails' end + ' test for # in user'
print case when 0=dbo.ns_txt_email('anovick@novick*software.com')
 then 'Passes' else 'Fails' end + ' test for * asterisk in domain'

The output of a test script is sent to Visual Studio's Output window.  The following is the result of running the script. The results have been truncated on the right side for readability:

Auto-attach to process '[5968] [SQL] nsl6' on machine 'nsl6' succeeded.  
Debugging script from project script file.                               
                                                                         
The thread 'nsl6 [58]' (0x10d8) has exited with code 0 (0x0).            
The thread 'nsl6 [58]' (0x10d8) has exited with code 0 (0x0).            
The thread 'nsl6 [58]' (0x10d8) has exited with code 0 (0x0).            
'sqlservr.exe' (Managed): Loaded 'C:\WINDOWS\assembly\GAC_32\mscorlib\2.0
'sqlservr.exe' (Managed): Loaded 'C:\Program Files\Microsoft SQL Server\M
'sqlservr.exe' (Managed): Loaded 'C:\WINDOWS\assembly\GAC_32\System.Data
'sqlservr.exe' (Managed): Loaded 'C:\WINDOWS\assembly\GAC_MSIL\System\2.0
'sqlservr.exe' (Managed): Loaded 'C:\WINDOWS\assembly\GAC_32\System.Trans
'sqlservr.exe' (Managed): Loaded 'C:\WINDOWS\assembly\GAC_MSIL\System.Sec
'sqlservr.exe' (Managed): Loaded 'C:\WINDOWS\assembly\GAC_MSIL\System.Xml
'sqlservr.exe' (Managed): Loaded 'ns_txt', Symbols loaded.               
Auto-attach to process '[5968] sqlservr.exe' on machine 'nsl6' succeeded.
Passes test for good addr                                                
Passes test for no user                                                  
Passes test for 1 char in tld                                            
Passes test for 2 char domain                                            
Passes test for no domain                                                
Passes test for space in name                                            
Passes test for # in user                                                
Passes test for * asterisk in domain                                     
No rows affected.                                                        
(0 row(s) returned)                                                      
Finished running sp_executesql.

So the new function passes the original set of tests.

Now how about performance? Is the SQLCLR function really faster than the old T-SQL UDF?  To create some reasonable test data that anyone could use this script creates 10,000 sample e-mail addresses.  All of them should be valid.

use tempdb
go
create table #test_emails (
             id int not null identity(1,1) primary key
           , email nvarchar(255) not null
         )
go
declare @i int
set @i=0
while (@i < 10000) begin
   insert into #test_emails (email)
     select substring (
              'abcdefghijklmnopqrstuvwyz01234567890zywvutsr'
             , convert(smallint, 1+ rand() * 35)
             , convert(smallint, 2+ rand() * 10))
     + '@'
     + substring ('abcdefghijklmnopqrstuvwyzzywvutsrqp'
       , 1+convert(smallint, rand() * 25)
       , 2+convert(smallint, rand() * 10))
      + '.'
      + rtrim(substring (
                 'com net biz infous  za  ca  uk  edu gov tv  '
     , 1 + (4* convert(smallint, rand() * 10))
          , 4))
               
    SET @i = @i + 1
end
go

The next script compares the performance of the two functions.  It first selects the table of temporary email address in the expectation of pulling the tables pages into memory.  The SET STATISTICS TIME ON command asks SQL Server for the elapsed and CPU times for each command.  The results are summed so that the time to return results to the client isn't a factor in the measurement.

use tempdb
go
select * from #test_emails
go
set statistics time on
select sum(case when 1=ns_lib.dbo.udf_txt_IsEmail(email) 
                then 1 else 0 end) 
    from #test_emails

select sum(case when 1=ns_lib.dbo.ns_txt_email(email) 
                then 1 else 0 end) 
    from #test_emails
set statistics time off
go

After sending the results to the Text Results window and ignoring the list of sample e-mails the results are:

UDF results
-----------
      10000
(1 row(s) affected)

SQL Server Execution Times:
   CPU time = 6187 ms,  elapsed time = 14822 ms.
SQLCLR results
--------------
         10000
(1 row(s) affected)

SQL Server Execution Times:
   CPU time = 47 ms,  elapsed time = 53 ms.

The CPU time for the SQLCLR is only 47 milliseconds compared to 6.187 seconds for the T-SQL UDF. The SQLCLR function is 131 times faster, a dramatic difference.  I'm sure that I could improve the T-SQL a lot to make it faster, but I doubt that the T-SQL UDF is ever going to get close.  At least in this case SQLCLR can be dramatically faster in terms of CPU and the difference in elapsed time is even larger.

Next Steps

T-SQL functions that handle strings are candidates for replacement with SQLCLR functions.  That's especially true of validation functions, such as udf_txt_IsEmail, due to the availability of the regular expression feature in .Net.  If a function is dragging your performance down, consider replacing it with a similar SQLCLR UDF.



sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Andy Novick Andy Novick is a SQL Server Developer in the Boston area with 25 years of database and application development experience.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Saturday, June 27, 2009 - 7:41:21 AM - educoder Back To Top (3654)

If you're going to make use of the CLR in your stored procedure why not use...

Dim myMsg As New System.Net.Mail.MailMessage("[email protected]", "[email protected]")

and catch the exception that is thrown when the ToName is an invalid format.

 















get free sql tips
agree to terms