Using Regular Expressions With T-SQL: From Beginner To Advanced

Overview

We can use some comparable expressions to a full regular expression library for matching certain patterns with T-SQL using the like operator. In this tutorial, we will practice using these expressions (referred to as regular expressions in the context only of T-SQL) for filtering price phrases involving alphabetic, numeric, and special characters. By the end of this tutorial, we will have another tool that we may use for precisely filtering data for some applicable situations.

Explanation

I’ve used regex in numerous situations, and it can be very useful to know and practice, which we’ll be doing in this tutorial.   While regular expressions can sometimes seem counterintuitive, we will experience many situations where this skill can quickly solve problems that multiple ANDs in a WHERE clause may not be as efficient to use.

As for its use, some examples where regular expressions can provide us with assistance and make complex problems easy:

  • Applying very specific filters on text, numeric, or special character data, especially when precision is paramount, and we cannot allow any possible error at all.
  • Parsing data for ETL purposes, finding patterns in code or in word use, or creating rules for inbound or outbound traffic.
  • In older versions of SQL Server, some functions to validate numbers may allow some characters, like e, and regular expressions can provide more accurate filters in these situations.
  • Muddying or intentionally corrupting data in some environments or in some situations to confuse or mislead infiltrators.

We can apply regex in situations where we need to look at text data, numerical data, or data that uses a combination, such as identifiers with letters, numbers, and special characters.

The outline for this tutorial is as follows:

  1. Using regular expressions with alphabetic characters
    • Introduction to alphabetic regular expressions
    • Precise alphabetic filtering with regular expressions
    • Case sensitivity and regular expressions
    • Putting it all together
  2. Using regular expressions with numeric characters
    • Introduction to numeric regular expressions
    • Numerical ranges with any combinations or special characters
    • Putting it all together
  3. Using regular expressions with special characters and applications with it
    • Introduction to special character regular expressions
    • Using the “not” character with regular expressions
  4. Wrapping up with business applications using regular expressions examples
    • With date examples
    • With credit card examples
    • With URL examples
    • With email examples
  5. Summary

One comment

  1. It’s bewildering to those of us who’ve used Regexp’s in any other context (master data management, lexical analyser, programme language, schema definition language, office automation or RDBMS) why MS has failed to address this need properly over such a long time and why anyone thinks that what is provided by the LIKE expression is a Regexp. It isn’t: a regexp allows conditionality and the ability to construct arbitrary *rule* match patterns (that’s what regular means) with great power and expressiveness, and also of great compactness and efficiency which those of us who write and use it appreciate, even if our fellow readers may find a little too terse on a bad day.

    Even in the latest docs, MS claims, “While traditional regular expressions are not natively supported in SQL Server, similar complex pattern matching can be achieved by using various wildcard expressions. See the String Operators documentation for more detail on wildcard syntax.”

    I fear this may be stubbornness or chauvinism by Microsoft. MS String Operators do not offer similar complexity of pattern matching unless you allow that, since all SQLs including MS’s allow complex predicates containing an arbitrary number of boolean search conditions, it is possible to build functional equivalence for any given *fixed* pattern using SQL syntax. No doubt some folks are going to claim this is a ‘feature’ and that the declarative nature of SQL makes the intention of the pattern manifest and comprehensible compared to the write-only language of a Regexp. Sure thing. But in MS SQL you can’t re-use a column expression result in the same block by referencing its alias, so building from atoms to complex re-referenced rules will generally require repetition of the expression, which never aids readability. WITH helps, but is not complete. In any case, it is only ‘fixed patterns’ unless you resort to self-writing SQL or using a pattern store: a regexp can vary the number of patterns at will, inline, at runtime.

    In reality, Aho et al. got this all worked out back in the 70s under Unix, and MS just hasn’t fallen in line with the timeless, efficient, and implementable simplicity of Regexps. It took Oracle a long time, too, but that was a long time ago.

    MS could have gone halfway, like Teradata did before offering full regexp functions, and offered the SQL syntax LIKE ANY (<matchlist>) which is often neat but not complete. In my world LIKE ANY would be an ANSI SQL construction (I would go further) even though I don’t carry a torch for TD.

    There is published code for SQL Server that implements a Regexp using a CLR UDF and MS’s own Regexp code libraries which still appears on an old MS site.
    https://docs.microsoft.com/en-us/archive/msdn-magazine/2007/february/sql-server-regular-expressions-for-efficient-sql-querying

    So why not just make REGEXP functions part of base SQL Server? Weird.

Leave a Reply

Your email address will not be published. Required fields are marked *