Case Sensitivity and Regular Expressions for T-SQL


By:
Overview

Up to this point, all of our regular expressions in our query have used capital letters and this hasn’t impacted the results.  The reason the results were not impacted, is because our database is not set to be case sensitive, so we we’ll be changing our query settings in this section to test case sensitivity.  As we’ll see, querying will be similar and the case will matter in our ranges or specifications of alphabetic characters.

Regex Case Sensitivity

If we wanted to look for all the results in our table that only had one alphabetic character of any range (A through Z) and we wanted this to only be a lower case character, if we run both queries in our first set, notice how our result does not differ.  This means that even if we only want to return letters with capitals, we’ll also see lower case letters too and vice versa, this is because the database I set up is not case sensitive.

SELECT *
FROM alphareg
WHERE Alphabetic LIKE '[A-Z]'
 
SELECT *
FROM alphareg
WHERE Alphabetic LIKE '[a-z]'   
regex query results

Two results from each query.

Database Collation Setting for Case Sensitivity

One option is that we could create another database with this collation specified, though we may not want this if it affects all values in our database. 

Depending on our need, we may be able to specify the collation at the database level – if this is required for all columns, but be careful, as we may only need specific collation for a few queries.  In the below image, we see the collation option for creating a new database.

sql server collation setting

 

Using Regex and Column Collation to Test Case Sensitivity

Our query does not specify any case sensitivity, so we’ll use both of the same queries and specify the column collation to be Latin1_General_BIN, which will return results based on case sensitivity.

SELECT *
FROM alphareg
WHERE Alphabetic COLLATE Latin1_General_BIN LIKE '[A-Z]'
 
SELECT *
FROM alphareg
WHERE Alphabetic COLLATE Latin1_General_BIN LIKE '[a-z]'   
regex query results

One result from each query.

Now, our results return on the basis of our case sensitivity. 

Regex to Find Upper Case or Lower Case Characters

Finally, let’s query our data rows and see if we have any alphabetic characters that are capitalized within the sentence, but start out with a lower case character.  In this case, we’ll allow any range of starting characters – as long as they’re lower case – and search for any combination of upper case characters.

SELECT *
FROM alphareg
WHERE Alphabetic COLLATE Latin1_General_BIN LIKE '[a-z]%[A-Z]%'    
regex query results

One result from this query.

Regex to Find Upper Case or Lower Case Characters

What about any case for the first character being of a range A through Z with an upper case somewhere in the sentence or column?  We can use case combinations for this [A-Za-z] which means the first character can be any alphabetic character upper or lower case.

SELECT *
FROM alphareg
WHERE Alphabetic COLLATE Latin1_General_BIN LIKE '[A-Za-z]%[A-Z]%'   
regex query results

Eight results from this query.

Summary

Up to this point, we’ve covered alphabetic regular expressions by showing:

  • How to limit character length
  • How to allow a range or specific set of characters
  • How to allow for any preceding or following character
  • How to handle case sensitivity by either restricting it to a case or allowing both types of case sensitivity in some situations.

Since we often use regular expressions with numbers and alphabetic characters, we’ll look at how to filter on numbers along with how to extract numeric characters with decimals or specify decimal precision with regular expressions.

Additional Information
  • Case sensitivity can also impact performance with queries, so when setting up a database where regular expressions might be run against character-based data, such as char, varchar, nvarchar, consider if the case of the expression will matter with queries.  If in doubt, it can always be specified at the query level, like we did in this section.





Comments For This Article




Friday, June 15, 2018 - 5:00:24 AM - Tim Back To Top (76229)

@Solomon you are correct that collation can (and as shown above) be specified on the column level if that suits needs.  In working in multiple environments, developers do not always do this on the granular level, which introduces bugs further stream.  While this tip looks at individual situations and not specifics about when to do that, I would suggest developers who are considering look to further resources.  Your note should be valuable to them.


Thursday, June 14, 2018 - 12:49:24 PM - Solomon Rutzky Back To Top (76223)

Hi there. There are a few mistakes in this "case-sensitivity" page, 2 major and 1 minor:

  1. MAJOR: While true that the COLLATE keyword can, and should, be used to change the Collation on a per-predicate basis, it is not true that the Database's Collation controls anything here. What controls the Collation of each column is the Collation specified per each string column (excluding XML). And each column can have a different Collation. If a column is going to always be interacted with in a case-sensitive manner, than it's Collation should be case-sensitive, even if all other columns are case-insenitive.

  2. MAJOR: Binary Collations are specifically not case-sensitive, nor are they everything-sensitive. They have no concept of linguistic rules, which are still used even in Collations with all sensitivities enabled (e.g. Japanese_XJIS_140_CS_AS_KS_WS_VSS). For a full explanation and examples, please see: No, Binary Collations are not Case-Sensitive

  3. Minor: When using binary Collations, the "_BIN2" Collations should be used instead of the older "_BIN" Collations (starting with SQL Server 2005). The "_BIN" Collations do not always sort Unicode data correctly, whereas the "_BIN2" Collations do.














get free sql tips
agree to terms