Precise Numerical Filtering With Regular Expressions for T-SQL


By:
Overview

We now know how to query numerical data – whether decimal, integer or varchar with possible numbers – to look for numbers or ranges of numerical values.  What if we want to check if a data row begins with a number and also has another combination of numbers as well?  What about decimal precision, such as getting all the data with up to four decimal places, or three?  In the latter query, we will use special characters alongside numbers, which we’ll look at in these next set of examples.

Regex to Find Specific Number Patterns

Suppose that we wanted to get all numerical values from the column NumData that start with either a 2 or 3 and have any other character following it, but also have a 1 or 5 somewhere else in the data.  We would use the [2-3] range followed by the any option % with the next set of numbers specified [15] followed by another any option %.

SELECT *
FROM alphanumreg
WHERE NumData LIKE '[2-3]%[15]%'   
regex query results

Three results from first query.

This shows us that we can check for a range of numbers and allow for any set of numbers to follow while also checking for another range of numbers within the same row of data.  For data where numbers may mean a form of identifiers, this can be very useful to get any combination along with any other combination, such as any combination that starts with 1-3 along with any other combination within the same row of data that has an 814 in it.

More Complex Regex to Find Specific Number Patterns

What if we want to specify that we only want data that extends to a certain decimal point?  When we query for data beyond a decimal point, we introduce special characters, as the period [.] is a special character. 

  • In first query below this, we want any number with two decimal places only for the AlphabeticNum column. 
  • In the second query we want the same except three decimal places only for the AlphabeticNum column. 
  • In the third query we want any number with at least two decimal places for the AlphabeticNum column.
SELECT *
FROM alphanumreg
WHERE AlphabeticNum LIKE '%[.][0-9][0-9]'
 
SELECT *
FROM alphanumreg
WHERE AlphabeticNum LIKE '%[.][0-9][0-9][0-9]'
 
SELECT AlphabeticNum
FROM alphanumreg
WHERE AlphabeticNum LIKE '%[.][0-9][0-9]%'   
regex query results

Results from second set of queries.

As we see, the second query returns nothing, as no AlphabeticNum column has three decimal places that meets this requirement.  Developers tend to round up or down, which may not matter in some situations, but be careful about this.  Sometimes this can be costly, so using techniques like this can help us when we’ve obtained data from an ETL source and we want to validate decimal length.

Regex to Return Negative Numbers

What if we want to return all negative data?  We can apply regular expressions using the special character for the negative symbol [-] and find all the values, whether decimal, integer or varchar that lead with a negative.  In the case of varchars, we may have a leading negative symbol with words, so we could add additional filtering such as [0-9] to ensure the following data are digits, as decimals and integers will prevent alphabetic characters in general.

SELECT NumData
FROM alphanumreg
WHERE NumData LIKE '[-]%'
 
SELECT NumInt
FROM alphanumreg
WHERE NumInt LIKE '[-]%'
 
SELECT AlphabeticNum
FROM alphanumreg
WHERE AlphabeticNum LIKE '[-]%'    
regex query results

Results from third set of queries.

Regex for More Complex Number Patterns

As in the examples of using special characters with a decimal point or hyphen for negative numbers, we may want to find identifiers in varchar data that use a combination of alphabetic characters, numbers and special characters.  In the below two queries, we look for identifiers with a combination of all of these.  The first query looks for an identifier that has 2 starting digits 0 through 9, a hyphen, then any alphabetic character A through Z followed by one number 0 through 9, another hyphen, then one last digit 0 through 9.  In the second query, rather than any alphabetic character, we only look for the alphabetic characters E through F.

SELECT AlphabeticNum
FROM alphanumreg
WHERE AlphabeticNum LIKE '[0-9][0-9][-][A-Z][0-9][-][0-9]'
 
SELECT AlphabeticNum
FROM alphanumreg
WHERE AlphabeticNum LIKE '[0-9][0-9][-][E-F][0-9][-][0-9]'    
regex query results

Results from four set of queries.

In some cases, identifiers will use all three types of characters: numbers, alphabetic, and special characters, though sometimes we only have two of the three.

Summary

Up to this point, we’ve covered numerical regular expressions by showing:

  • How to limit numerical character length, including decimal precision
  • How to allow a range or specific set of numerical characters
  • How to allow for any preceding number or alphabetic character or following number or alphabetic character
  • How to combine special characters, like the hyphen and decimal point, with numbers and alphabetic characters

In the final section, we’ll look at other special characters, the not symbol with regular expressions, and applying some of these in business situations.

Additional Information
  • Using the above techniques, write a regular expression query that looks for 3 first characters of any digit 0 through 9, followed by two alphabetic characters E through H, followed by another numerical character 7 through 9.

Last Update: 4/6/2018




Comments For This Article





download














get free sql tips
agree to terms