Introduction to Alphabetic Regular Expressions for T-SQL


By:
Overview

We’ll start with regular expressions by creating alphabetic data (varchar) in a table and look at the basics of querying using regular expressions. 

For the sake of the alphabetic examples, we’ll create a column with a VARCHAR limit of 8000 characters.  We could create a VARCHAR(max) column, which will store up to 2GB of data, which will generally be more useful in storing large documents.  We’ll be adding data throughout this tutorial and initially we’ll start off by adding 20 rows of data.

Populate Test Alphabetic Data

The following script will add some test data that we will use to build and text the regular expressions.

CREATE TABLE alphareg(
   Alphabetic VARCHAR(8000)
)
 
INSERT INTO alphareg 
VALUES ('Two plus two equals four.')
   , ('But in Rome we must toe the line of fashion, spending beyond our means, and often on borrowed credit.')
   , ('Most dogs sleep 21 hours a day.')
   , ('2')
   , ('We were going to work on the project together, but he had to leave for basic training early.')
   , ('This SENTENCE is NOT written weLl.') 
   , ('Or as Alexander Suvorov would say, "When the training is hard, the fighting is easy."')
   , ('1812! The year of invasions.  Let me tell ya''.') 
   , ('This SENTENCE is NOT written weLl.') 
   , ('A')
   , ('b')
   , ('oooh, this isn''t written WRIGHT in several ways')
   , ('BD')
   , ('As he said this, Cupid sneezed approval on the left as before on the right.')
   , ('I like adverbs.')
   , ('Five?')
   , ('I''m going to the store right now.')
   , ('TWO')
   , ('"Yes"')
   , ('willful waste makes woeful waste')
 
SELECT *
FROM alphareg 

Below is the data we just inserted.

t-sql regex result set

20 results from the first query.

Regex to Find Where Data is Only One Character and Value is from A to Z

In our first example, we’ll demonstrate how we can return a data row with an alphabetic character of any value from A to Z.  We do not want anything else except one alphabetic character and as we’ll see, this will become a structure that we can use to get a precise result set back when we want to query data that matches a pattern.

SELECT *
FROM alphareg
WHERE Alphabetic LIKE '[A-Z]'   
t-sql regex result set

2 results from the second query.

The pattern of [A-Z] in the above query means that we want to only get the rows with 1 alphabetic character (no more) that range anywhere from A to Z and in this case, case sensitivity does not matter (we’ll look at case sensitivity later).  Only two rows match this – one row with A and another row with b. 

Regex to Find Where Data is Only One Character and Value is from C to D

If we ran a similar query using the range of C-D ([C-D]), which none of our rows have, we’d get nothing back:

SELECT *
FROM alphareg
WHERE Alphabetic LIKE '[C-D]'   
t-sql regex result set

No results from the third query.

Regex to Find Where Data is Two Characters and Values are from A to Z

If we wanted to return only the data rows with two alphabetic characters of any character A through Z – and no more and no less characters than two – we’d run a similar query, by adding one more range as follows.

SELECT *
FROM alphareg
WHERE Alphabetic LIKE '[A-Z][A-Z]'   
t-sql regex result set

One result from the fourth query.

We only get one result – BD.  In this query, we do not allow for any result with fewer than two alphabetic characters A through Z.  Regular expressions such as this (without the like character %) allow us to get an exact result back as far as characters, helping us validate lengths.  If we have requirements based on length – such as a value must be 11 alphabetic characters long, this technique eliminates anything less or greater than that length.

Regex to Find Where Data is Any Length and First Character has a Value from A to Z

Similar to text queries, if we want to return the rows that begin with an A through Z character, then we’ll use the % sign and run a similar query as query two, except we’ll add a percent sign to return any row where the first character is from A to Z and can be anything after that.

SELECT *
FROM alphareg
WHERE Alphabetic LIKE '[A-Z]%'   
t-sql regex result set

Seventeen results from the fifth query.

The above data rows all begin with an alphabetic character A through Z. If we look at the original 20 rows, we can see that 3 data rows do not match this definition and they were excluded. 

In the next section, we’ll look at getting specific alphabetic characters combined with ranges.

Additional Information
  • Using our same data, write a query that gets all the data rows of any character length that start with the alphabetic characters of B.  How many rows did you get?





Comments For This Article

















get free sql tips
agree to terms