SQL LIKE Statement for Various Text Patterns
I am new to T-SQL programming and want to learn about the SQL LIKE statement. How does this differ from the = (equals) syntax in a WHERE clause? How do you include wildcards characters to match patterns? How is performance affected by SQL queries that use this operator?
The SQL LIKE logical operator is used to compare string values (VARCHAR/NVARCHAR data types in SQL Server) to one another. It works just like the = (equal sign) operator, except that it allows for the use of wild cards to match patterns within a string. It can be used within any argument and appears most commonly in IF statements, WHERE clauses, and HAVING clauses. This tip will show how to use the LIKE operator, focusing on the wild card options.
All of the demos will utilize the WideWorldImporters sample database for all demos. The database can be downloaded for free from GitHub. All demos are executed using SQL Server Management Studio (SSMS) and SQL Server 2019, but the information and demos contained in this SQL tutorial are valid going back many versions of SQL Server.
Overview - SQL LIKE Operator
Consider the following query:
SELECT * FROM Sales.Invoices WHERE ConfirmedReceivedBy LIKE N'Ingrida Zeltina';
It can look strange at first as this query has a WHERE clause with no =, <, or > operators. The word LIKE is the operator and, in this case, will work the same as an =.
While the LIKE clause is meant to compare two string data types, it will try to compare other data types. The SELECT statement might even succeed if the values on both sides of the operator can be implicitly converted to a string data type. These two sample queries use the LIKE operator to compare two integer numeric values and a date value to a string.
SELECT * FROM Sales.Invoices WHERE SalespersonPersonID LIKE 7; --2 integers SELECT * FROM Sales.Invoices WHERE InvoiceDate LIKE '2013-01-08'; --A DATE column and a string literal
Even though these two sample SQL SELECT queries succeed, they are in very bad form and should not be used. LIKE should only be used to compare two string data types for a specified pattern.
Using Wildcard Characters to Match Patterns
The advantage of using LIKE over = is that it supports pattern matching with wild cards. This next section will showcase these wild cards.
% (Percent Sign) Wild Card
The % wild card is used to represent a pattern of any number of characters -- including zero characters. Consider this SQL statement:
DECLARE @String NVARCHAR(4000); SELECT @String = DeliveryInstructions FROM Sales.Invoices WHERE InvoiceID = 63452; IF @String LIKE N'%Escobar Crescent' -- character string BEGIN PRINT 'It is on Escobar Crescent'; END;
The % in the script means that the argument will evaluate to true if the value of @String ends with the words "Escobar Crescent". It could even be "Escobar Crescent" with nothing in front of it. Some other values for @String that would evaluate as true are "123 Escobar Crescent" or "Drop off items at Escobar Crescent". Some values for @String that would cause the argument to evaluate as false would be "Deliver to Escobar Crescent.", "Escobar Crescent, last house on the left", or "123 Fake Street". The first 2 of these do not end with the correct text. The last one is way off.
_ (Underscore or underbar) Wild Card
The LIKE operator uses the _ wild card to represent exactly one character of any kind. Unlike the % wild card, it cannot represent a string of 0 characters.
Imagine a scenario where a label comes out of the printer with one character smudged. It is affixed to a box and sent for delivery. The delivery person calls in and asks where this delivery should go. They tell the support staff that the label says, "Suite 2--something--, 1300 Sonkar Road". There is only one character in question. This would be a use case for the _ wild card in a pattern match search using the LIKE operator.
SELECT DISTINCT DeliveryInstructions FROM Sales.Invoices WHERE DeliveryInstructions LIKE 'Suite 2_, 1300 Sonkar Road'
This query would execute and find all delivery instructions delivered to a suite starting with a 2 on 1300 Sonkar Road. It would not include Suite 2 as there needs to be one more character after the 2. For that same reason, it would not include any suite in the range of the 200s, as there is only one missing character.
The _ is not limited to numbers. It can be any character. As such, along with returning values like Suite 21 or 24, it could return Suite 2A, 2F, or 2*.
[…] (Brackets) Wild Card
The bracket pair wild card works with the LIKE operator the same as the _ wild card in that it represents exactly one character. The difference is that it defines a universe of possible values for the one character within the brackets rather than letting it be any single character. This can be difficult to describe but tends to make more sense when viewing an example.
Consider a scenario where a user searches for rows belonging to another user named Brian. Or was it Bryan? The user cannot remember. There is precisely one letter in question, but the _ wild card would allow more potential pattern matches than just the two target matches. Since the user knows that the missing letter is only one of 2 possible values, I or Y, this would be a tremendous use case for the bracket pair wild card, […]. In this case, the pattern would be "BR[IY]AN". Only two string values could be compared to this pattern via the LIKE operator and evaluated to true. They are BRIAN and BRYAN.
When defining the universe of possible values against which to match, the values can be listed out one at a time between the brackets or be defined via a range. Some common ranges are alphabetical or numeric.
An alphabetic range might be "[A-Z]", which means any letter between A and Z, or "[B-F]", meaning every letter between B and F, inclusive. Those two wild cards will NOT match a - (dash). This is much shorter than typing all out possible values as "[ABCDEFGHIJKLMNOPQRSTUVWXYZ]" or "[BCDEF]", respectively.
Numeric ranges work the same as alphabetic ranges. The pattern "[3-6]" will only look for one character that is a 3, 4, 5, or 6. Remember that the bracket pair only looks at a single character. The pattern "[3-60]" will not match a number between 3 and 60. It will match any single number between 3 and 6 or a 0. Likewise, the pattern "[4-802]" will not match a number between 4 and 802. It will match any single number between 4 and 8 or a 0 or a 2.
Inversing a […] Wild Card with the NOT Options ^ (Carat Symbol)
Placing a ^ (carat) in front of a bracket pair wild card will tell the pattern match algorithm to look for exactly one character that is anything but a character listed in the universe within the brackets.
Consider a text column commonly filled with numbers such as a postal or zip code for U.S. addresses. A user might want to run a data quality query against such postal codes to look for invalid characters. In this case, an offending row would be a row that has any character other than a number.
In the WideWorldImporters database, that query might look something like this. For the record, this query returns no rows. Good job to whoever made this data set!
SELECT CustomerName, SC.PostalPostalCode, CN.CountryName FROM Sales.Customers SC INNER JOIN Application.Cities AC ON SC.DeliveryCityID = AC.CityID INNER JOIN Application.StateProvinces AP ON AC.StateProvinceID = AP.StateProvinceID INNER JOIN Application.Countries CN ON AP.CountryID = CN.CountryID WHERE CountryName = 'United States' AND SC.PostalPostalCode LIKE '%^[0-9]%';
The pattern looks for any one character that isn't between 0 and 9 and doesn't care what comes before or after it, as a single non-numeric value in that string means it is not a valid U.S. zip or postal code.
Combining Wild Cards
Arguments that use the LIKE operator are not limited to a single wild card. Often, wild card options are combined to match more specific patterns—the prior example used two % (precent signs) wild cards and the bracket pair.
This example looks for a row with an email address that doesn't match the pattern of at least one character followed by an @ sign. It then looks for at least two more characters followed by a ".". Finally, it expects one of c, o, e, or g followed by exactly two more characters. This would help find domain suffixes like com, org, edu, gov or other things that might match.
WHERE NOT Email LIKE '%[email protected]__%.[coeg]__'
Searching for Wild Card Values as Literals
Sometimes a pattern match is needed to find a literal character that would otherwise be a wild card value. An example might be looking for any sales that had a 10% discount applied. In this case, the % needs to be found literally and not treated as a wild card value. This can be completed by placing the wild card value inside a bracket pair.
SELECT * FROM [WideWorldImporters].[Sales].[SpecialDeals] WHERE DealDescription LIKE '%10[%]%';
When used in a WHERE or HAVING clause, arguments that have wild cards are rarely searchable. This means they will likely cause a table scan and lead to a slow query—exercise caution when using them, especially on larger tables.
One way to reduce the negative effects of an argument that is not searchable is to have other arguments that are searchable in the same clause. This allows SQL Server to use those arguments to quickly reduce the row count and only do the pattern match on a much smaller number of rows. That could mean including a date range of orders along with the LIKE-based argument.
I have devoted an entire SQL tutorial to searchable arguments, including a section on using LIKE.
- How to Write Searchable Arguments
- SQL WHERE Clause Explained
- T-SQL IN Operator
- T-SQL NOT IN Operator
- T-SQL NOT Equal Operator
- How to create a table using SQL Server Management Studio
- Create a Table in Microsoft SQL Server
- Create Tables in SQL Server with T-SQL
- Tips as a SQL Reference
About the author
View all my tips
Article Last Updated: 2022-10-27