Using Regular Expressions With T-SQL: From Beginner To Advanced


By:
Overview

We can use some comparable expressions to a full regular expression library for matching certain patterns with T-SQL using the like operator. In this tutorial, we will practice using these expressions (referred to as regular expressions in the context only of T-SQL) for filtering price phrases involving alphabetic, numeric, and special characters. By the end of this tutorial, we will have another tool that we may use for precisely filtering data for some applicable situations.

I’ve used regex in numerous situations and it can be very useful to know and practice, which we’ll be doing in this tutorial.   While regular expressions can sometimes seem counterintuitive, we will experience many situations where this skill can quickly solve problems that multiple ANDs in a WHERE clause may not be as efficient to use. 

As for its use, some examples where regular expressions can provide us with assistance and make complex problems easy:

  • Applying very specific filters on text, numeric or special character data, especially when precision is paramount, and we cannot allow any possible error at all.
  • Parsing data for ETL purposes, finding patterns in code or in word use, or creating rules for inbound or outbound traffic.
  • In older versions of SQL Server, some functions to validate numbers may allow some characters, like e, and regular expressions can provide more accurate filters in these situations.
  • Muddying or intentionally corrupting data in some environments or in some situations to confuse or mislead infiltrators.

We can apply regex in situations where we need to look at text data, numerical data, or data that uses a combination, such as identifiers with letters, numbers and special characters.

The outline for this tutorial is as follows:

  1. Using regular expressions with alphabetic characters
    • Introduction to alphabetic regular expressions
    • Precise alphabetic filtering with regular expressions
    • Case sensitivity and regular expressions
    • Putting it all together
  2. Using regular expressions with numeric characters
    • Introduction to numeric regular expressions
    • Numerical ranges with any combinations or special characters
    • Putting it all together
  3. Using regular expressions with special characters and applications with it
    • Introduction to special character regular expressions
    • Using the “not” character with regular expressions
  4. Wrapping up with business applications using regular expressions examples
    • With date examples
    • With credit card examples
    • With URL examples
    • With email examples
  5. Summary





Comments For This Article




Monday, September 27, 2021 - 6:56:36 AM - Andy Tomalin Back To Top (89281)
It's bewildering to those of us who've used Regexp's in any other context (master data management, lexical analyser, programme language, schema definition language, office automation or RDBMS) why MS has failed to address this need properly over such a long time and why anyone thinks that what is provided by the LIKE expression is a Regexp. It isn't: a regexp allows conditionality and the ability to construct arbitrary *rule* match patterns (that's what regular means) with great power and expressiveness, and also of great compactness and efficiency which those of us who write and use it appreciate, even if our fellow readers may find a little too terse on a bad day.

Even in the latest docs, MS claims, "While traditional regular expressions are not natively supported in SQL Server, similar complex pattern matching can be achieved by using various wildcard expressions. See the String Operators documentation for more detail on wildcard syntax."

I fear this may be stubbornness or chauvinism by Microsoft. MS String Operators do not offer similar complexity of pattern matching unless you allow that, since all SQLs including MS's allow complex predicates containing an arbitrary number of boolean search conditions, it is possible to build functional equivalence for any given *fixed* pattern using SQL syntax. No doubt some folks are going to claim this is a 'feature' and that the declarative nature of SQL makes the intention of the pattern manifest and comprehensible compared to the write-only language of a Regexp. Sure thing. But in MS SQL you can't re-use a column expression result in the same block by referencing its alias, so building from atoms to complex re-referenced rules will generally require repetition of the expression, which never aids readability. WITH helps, but is not complete. In any case, it is only 'fixed patterns' unless you resort to self-writing SQL or using a pattern store: a regexp can vary the number of patterns at will, inline, at runtime.

In reality, Aho et al. got this all worked out back in the 70s under Unix, and MS just hasn't fallen in line with the timeless, efficient, and implementable simplicity of Regexps. It took Oracle a long time, too, but that was a long time ago.

MS could have gone halfway, like Teradata did before offering full regexp functions, and offered the SQL syntax LIKE ANY (<matchlist>) which is often neat but not complete. In my world LIKE ANY would be an ANSI SQL construction (I would go further) even though I don't carry a torch for TD.

There is published code for SQL Server that implements a Regexp using a CLR UDF and MS's own Regexp code libraries which still appears on an old MS site.
https://docs.microsoft.com/en-us/archive/msdn-magazine/2007/february/sql-server-regular-expressions-for-efficient-sql-querying

So why not just make REGEXP functions part of base SQL Server? Weird.

Friday, August 28, 2020 - 2:28:22 PM - Jeremy Kadlec Back To Top (86387)
Curt,

Thank you for the comment.

Completely agree. There is no true RegEx with T-SQL.

This is an example of what is RegEx like with T-SQL.

Feel free to reference this tip - RegEx capabilities with the SQL Server CLR.

Thank you,
Jeremy Kadlec
Community Co-Leader

Wednesday, August 26, 2020 - 10:51:28 AM - Curt Bennett Back To Top (86363)
Completely agree with Solomon. I was searching for how to use Regular Expressions in SQL, and Google incorrectly has this article listed on the first page. This is not an article about Regular Expressions, just wildcard matching. Calling wildcards Regular Expressions is simply wrong.

Tuesday, August 21, 2018 - 10:36:20 AM - SL Back To Top (77264)

Hi Tim.

I read your excellent article. 

I waas wondering if you  added the following row, wouldn't you get an undesirable result in your query below?

INSERT INTO alphareg 

VALUES ('Twree')

SELECT *

FROM alphareg

WHERE Alphabetic LIKE '%[T][HW][OR][E ]%' 


Tuesday, July 3, 2018 - 1:18:03 PM - Solomon Rutzky Back To Top (76515)

these expressions (referred to as regular expressions in the context only of T-SQL)

Hi Tim. My point is that the "expressions" you are describing (i.e. the 3 wildcard options supported by LIKE and PATINDEX) are very much not Regular Expressions, and so should not be referred to as being "regular expressions". Regular Expressions (commonly called "RegEx") is a very specific feature / functionality, one that is not natively supported by SQL Server. If you want to use Regular Expressions in T-SQL, then you need to use SQLCLR. For example, several RegEx functions are available in the free version of SQL# (a library of many SQLCLR stored procedures and functions that I created).

The wildcard options supported by LIKE and PATINDEX are no more Regular Expressions than the analogous wildcards supported by DOS / CMD where * means "0 or more of any", and ? means "0 or 1 of any". But, if being (very) loosely similar to something else is all that is required, then it would be ok to refer to Microsoft SQL Server as "Oracle" since both are RDBMSs, both use SQL, both support INSERT / UPDATE / DELETE / SELECT statements, and so on.

SQL wildcards and RegEx are only similar if you don't know anything, or know that much, about Regular Expressions. And the problem is that because many DBAs know very little (if anything at all) about Regular Expressions, misusing the term only serves to increase the general confusion about them. I think it's great that you are showing people how to use the SQL wildcards, but please stop referring to them as being Regular Expressions. And please change the title of this series to avoid confusion and misrepresentation.

Thank you.


Friday, June 15, 2018 - 4:52:10 AM - Tim Back To Top (76227)

@Solomon

We can use some comparable expressions to a full regular expression library for matching certain patterns with T-SQL using the like operator. In this tutorial, we will practice using these expressions (referred to as regular expressions in the context only of T-SQL) for filtering price phrases involving alphabetic, numeric, and special characters. [1rst paragraph]


Thursday, June 14, 2018 - 11:26:09 AM - Solomon Rutzky Back To Top (76222)

 

Hi Tim. Interesting article. One note here that might seem nit-picky but is an important distinction: the feature that you are describing here is very specifically NOT Regular Expressions (RegEx). The LIKE operator and PATINDEX function both support wildcards in the form of: "%" (zero or more of any), "_" (only one of any), and "[]" (only one of specific combination of list and/or range(s)). There is a good reason why the Microsoft documention only ever refers to these as "wildcards" and never uses the term "Regular Expressions". It is a common misconception that these wildcards are RegEx because the use of the "[]" list/range, but RegEx doesn't have a single-character "any" option that would equate to the "%" wildcard here. RegEx has "." that equates to the "_" wildcard in T-SQL, but you would then need to use the "*" (zero-or-more) quantifier to make it equivalent to the "%" T-SQL wildcard. In fact, quantifiers ("*" = 0 or more, "+" = 1 or more, "?" = 0 or 1, etc) is something that cannot be done with T-SQL wildcards. Because in T-SQL, they are just wildcards and not Regular Expressions.

 















get free sql tips
agree to terms