Find text strings in character data types using SQL Server LIKE Operator
By: Greg Robidoux | Updated: 2007-10-08 | Comments (2) | Related: 1 | 2 | 3 | 4 | 5 | 6 | 7 | More > Scripts
One problem that you may be faced with is the need to find text data that is contained in a larger set of text. There are two ways this can be done either using the LIKE operator or by using Full Text indexing. Let's take a look at some of the options of using the LIKE operator and some pros and cons of this approach.
As mentioned already one approach of finding text that is stored within a larger set of text is by using the LIKE operator. The LIKE operator can be used in several different manners and there are some pros and cons of using this approach to limit data when issuing your queries. Let's first take a look at some of the options for using the LIKE operator.
|%||Any string of zero or more characters.||WHERE title LIKE '%computer%' finds all book titles with the word 'computer' anywhere in the book title.|
|_ (underscore)||Any single character.||WHERE au_fname LIKE '_ean' finds all four-letter first names that end with ean (Dean, Sean, and so on).|
|[ ]||Any single character within the specified range ([a-f]) or set ([abcdef]).||WHERE au_lname LIKE '[C-P]arsen' finds author last names ending with arsen and starting with any single character between C and P, for example Carsen, Larsen, Karsen, and so on.|
|[^]||Any single character not within the specified range ([^a-f]) or set ([^abcdef]).||WHERE au_lname LIKE 'de[^l]%' all author last names starting with de and where the following letter is not l.|
Source: SQL Server Books Online
Here are some sample queries, sample output and the execution plans that SQL Server uses to find the data. Without an index on the LastName column all of the execution plans would perform a Clustered Index Scan. To better show how the LIKE operator can affect performance, an index was created on the LastName column, so the results shown are how SQL Server will handle each of these queries based on having this index.
This query will pull back any record that starts with "Bal". Since we have a definitive starting point "Bal", SQL Server is able to do an "Index Seek" on the table and therefore use the index on the LastName column. The overall cost for the query is 0.0082582.
SELECT * FROM Person.Contact WHERE LastName LIKE 'Bal%'
This query will pull back any record that contains "ale". Since there is not a definitive starting point, SQL Server can not use the index and therefore chooses to do a "Clustered Index Scan". The overall cost for the query is 0.44451 which is not as good as the previous query.
SELECT * FROM Person.Contact WHERE LastName LIKE '%ale%'
This query will pull back any record that contains "ale", but also only has one character before "ale" and one character after the "ale". Since there is not a definitive starting point, SQL Server has to scan the index. In this example it is faster for SQL Server to scan the index created on the LastName column, but it still needs to scan the entire index. The overall cost for the query is 0.0877988 which is not as good as example 1, but it is faster than example 2.
SELECT * FROM Person.Contact WHERE LastName LIKE '_ale_'
This query will pull back any record that starts with "A, B or C", contains "al" and it doesn't matter what comes after the "al". Since there is a definitive starting point either "A, B or C", SQL Server is able to do an index seek on the LastName index. The overall cost for the query is 0.0082582 which is the same as example 1.
SELECT * FROM Person.Contact WHERE LastName LIKE '[A-C]al%'
This query will pull back any record that does not start with "B", but contains "ala" and any characters after that. Since there are so many other records that SQL Server needs to evaluate it will do an Index Scan on the new LastName index. The overall cost for the query is 0.14392.
SELECT * FROM Person.Contact WHERE LastName LIKE '[^B]ala%'
From the above samples you can see when SQL Server uses indexes and when SQL Server does not use indexes. On small tables this is not a big deal, but if you have very large tables the LIKE operator could significantly hurt performance and therefore you should look at using Full Text Indexes instead.
Another thing to note is that you can mix and match these wildcard operators when you build your query as shown in examples 4 and 5, so you can create some complex matching strings, but again the larger the dataset the longer these operations may take.
With any function that is used in your WHERE clause there may be the possibility that SQL Server does an Index Scan vs. an Index Seek. Keep this in mind as you develop your queries and also take a look at the execution plans to see how SQL Server will interpret and execute the query prior to pushing the code to your production servers.
- The LIKE operator can be handy tool, but be aware of possible performance issues
- Take a look at using Full Text Indexing vs. using the LIKE operator especially on larger tables.
- Understand how SQL Server will execute your queries ahead of time, before experiencing performance issues in your production systems.
Last Updated: 2007-10-08
About the author
View all my tips
- Find text strings in character data types using SQ...
- Finding and listing all columns in a SQL Server da...
- Searching and finding a string value in all column...
- Scan a SQL Server Database for Objects and Columns...
- SQL Server Find and Replace Values in All Tables a...
- List columns and attributes for every table in a S...
- Search all string columns in all SQL Server databa...
- More Database Developer Tips...