A SQL Server/Web Application Developer in my company approached me yesterday with a dilemma. He needed to be able to find a phrase within a specific field in a SQL Server database he was developing. If he received a "hit" then the process was to return the related value in a field within a parent table. This is a typically simple process that can be handled with the use of the T-SQL CHARINDEX() function, however there were certain caveats in his requirements that would not allow CHARINDEX() to be the complete solution to his problem. He needed to only consider situations where the value of the field started with the phrase or if the phrase was the beginning of a word in the value of the search field it was a valid hit. Read on and I'll explain further.
The web application being developed was for a medical website under which a user could select a phrase and search on various symptoms to return a related record in a table for possible malady. A phrase in this case could be multiple words or a single letter. Therefore if you selected your search phrase to be "B" and there were values in the search field of "Bone Mass", "Thumb Pain", or "Tumor, Benign" the results would be a positive hit on "Bone Mass" (because the value starts with the letter B) and "Tumor, Benign" (because it is preceded by a space). Even though the word "Thumb" includes the letter B in the search criteria it is omitted because it is, essentially, not the first letter of a word in the value.
Let's dismiss the medical jargon for a second and take a look at an example where I relate the content back to SQL Server. For the sake of this example I will go on the basis that we have two tables: ##ProductChild (with values pertaining to relational database management systems) and ##ProductParent, which will be used to store just two records that state whether a given record in ##ProductChild is a SQL Server database system or not. We will use "SQL" as our search criteria.
At this point if we query each table we will receive the following results:
There are two different approaches that can be taken to perform this query from this point forward. I will highlight both options and note now that the actual execution plan and execution times were identical.
Option 1: Use the CHARINDEX() Function
Now, using these sample tables we will search for all records in ##ProductChild where ChildDescription either starts with "SQL" or contains a word that starts with "SQL". We will ignore any results that do not meet this criteria, even if "SQL" appears in the field within a given word. I'll do so by breaking a rule I try to live by: no functions in the WHERE clause. These tables are extremely small. In the real-world scenario that prompted this article, we are also not dealing with more than a couple thousand records either.
Taking a closer look at this code it is extremely straight-forward compared to the ideas he and I tossed about: cursors, loops, all sorts of madness. We end up breaking the two positive hit possibilities into two parts of an OR statement:
- If ##ProductChild.ChildDescription starts with "SQL": C.ChildDescription LIKE @searchstring
- If ##ProductChild.ChildDescription contains a word that starts with "SQL": ('' + @searchstring , C.ChildDescription , 1) > 0
CHARINDEX() is the T-SQL function used to search for a given value within a field. It accepts three parameters:
- Search Value
- Field to Search
- Starting Position in Field
In the sample code I append a space onto the beginning of the search string so as to force the CHARINDEX function to ignore any hits within words in the ChildDescription table.
Option 2: Use the LIKE Condition
Instead of using CHARINDEX() to account for "hits" where the search string is preceded by a space we can instead simply append a space and '%' wildcard before the @searchstring variable as shown below. Ultimately this is probably the better solution as the format of the programming is consistent between criteria being searched upon. There is also one less variable and therefore less overhead involved in this code block. Simply put, the code is more intuitive from a readability standpoint.
Regardless of which option I use, the results are identical. I get valid hits on 4 records in the ##ProductChild.ChildDescription and therefore receive these results for the query:
If I was to search based upon "My" I'd receive the following results, due to the fact that MySQL is not a Microsoft SQL Server product.
Of course the proper process is to encapsulate this into a stored procedure. Using what we just learned this is the final result:
- More detailed information on the CHARINDEX() function, its syntax, and usage is available here.
- MSSQLTips.com has additional tips relying upon CHARINDEX().
- We comb the forums at MSSQLTips.com looking for subjects to write about. This tip came from a coworker with a specific problem. This could have easily been about a solution to a problem you were encountering as well.
Last Update: 12/29/2008
About the author
View all my tips