Improve SQL Server Query Performance with Searchable Arguments
My SQL Server query is running slowly. The WHERE clause is comparing against a column that is indexed, but the SQL query isn't doing an index seek. It sure seems like it shouldn't be scanning. Why might that be?
This is a common problem, especially for newer TSQL developers that are less familiar with the inner working of the SQL Server query engine.
This tip will look at 2 different queries where indexes aren't used by the optimizer, but could be if the query was tweaked. Next, it will explain why this small coding change will work. Finally, we will perform the tweak to see what sort of performance improvement can be expected.
Glossary of Terms
This tutorial will use 2 terms that might be new to readers.
Argument – An argument is another name for a Boolean expression that may appear in a join predicate, WHERE clause, or HAVING clause. Each of these clauses may have many arguments combined with AND and OR keywords. An argument might be "OrderDate = GETDATE()" or "ExtendedPrice > 100".
Searchable – An argument is searchable if it is written in a way that allows SQL Server to use an index.
The point of this tip is to teach TSQL programmers to make sure they are using searchable arguments whenever possible. Sometimes a developer might refer to a searchable argument by the vernacular, a sarge or a sarg.
All of the demos in this tip will use the WideWorldImporters sample database which can be downloaded for free from Github and will be run against SQL Server 2019. The images may be different, but the methodology should still work on older versions of SQL Server.
Function Calls in Arguments
Function calls in arguments are the single most common reason why arguments are not searchable.
Consider this query looking for invoices dated within 5 days of June 1, 2016. This author usually sees this type of example calling GETDATE() rather than using a hard coded date, but WideWorldImporters didn't have any rows new enough that GETDATE() made sense.
SELECT InvoiceDate FROM Sales.Invoices WHERE DATEDIFF(dd, InvoiceDate, '6/1/2016') <= 5;
Looking at the query plan for the query shows a clustered index scan.
That actually makes sense because the InvoiceDate column from the WHERE clause isn't indexed, but it does introduce a new question, "why didn't the optimizer put up that green text with missing index details?"
Let's take a look at the indexes on this table.
Below are the indexes for the Sales.Invoices table.
There is no index on InvoiceDate, so an index will be created and the query executed again.
CREATE INDEX mssqltips on Sales.Invoices(InvoiceDate);
The result is that the optimizer did use the index, but scanned it rather than perform a preferred seek operation.
To understand why this is the case, the query will be run one more time with an additional column in the SELECT statement.
SELECT InvoiceDate, DATEDIFF(dd, InvoiceDate, '6/1/2016') FROM Sales.Invoices WHERE DATEDIFF(dd, InvoiceDate, '6/1/2016') <= 5;
Notice that the argument in the WHERE clause is comparing the number 5 with the result of a DATEDIFF operation. When that DATEDIFF operation is placed into the SELECT clause of the query, it creates a new column that has no name. While the InvoiceDate column is indexed, this new, unnamed column that was invented during query execution is not indexed. This is why the argument is not searchable and why a scan is happening.
Also, this new column can't be indexed since it is only calculated during the query execution. This is why SQL Server doesn't put up the green "missing index" text in the query execution plan output.
How can this be fixed?
The key to fixing a problem such as this is to get the function calls away from the column that is being searched.
This alternate version of the query should return the same result set, but compares InvoiceDate to a value that the SQL Server engine can calculate just once at the beginning of the query - not once for every row in the table.
SELECT InvoiceDate FROM Sales.Invoices WHERE InvoiceDate >= DATEADD(dd, -5, '6/1/2016');
Upon executing this query, a seek operation is performed against the new index.
Using SET STATISTICS IO and SET STATISTICS TIME, the output shows the difference in performance between the 2 queries. The query on top is the corrected version with a searchable argument and the query on the bottom that does 60x the reads and takes 2x the elapsed time is the version that does not.
Consider a query looking for a person whose name starts with the letters "IVA". A common way to write that query would be to look at the first 3 characters of the name column and match them to "IVA" as shown below.
SELECT PersonID, FullName FROM [WideWorldImporters].[Application].[People] WHERE LEFT(FullName, 3) = 'Iva'
Unfortunately, exactly as in the query above, the newly minted column based on the LEFT function is not indexed. Once again SQL Server will scan and not request an index since there is no index that will fix this query.
A better way to write this query would be to use the LIKE operator. While it may seem counter intuitive to introduce the LIKE operator in an effort to get better performance, LIKE will still allow a seek so long as there is not a wild card at the start of the search value.
The next query has this change implemented.
SELECT PersonID, FullName FROM [WideWorldImporters].[Application].[People] WHERE FullName LIKE 'Iva%'
Running these 2 queries in immediate succession with the query plans included should show output similar to this.
Notice how the first query with the LEFT function in the WHERE clause performs a scan while the second query with LIKE performs a seek. The optimizer suggests that the LIKE version of the query is about 3x faster than the LEFT version.
Whenever writing TSQL code consider whether the arguments in join predicates, WHERE clauses, and HAVING clauses are searchable. The most common reason is including function calls that require the input of a column.
User-defined functions are also troublesome and often far worse since they are rarely natively compiled and run even slower than the built-in functions that were shown in these demo queries. This would mean that along with being unable to use an index, the query will be further slowed by the function itself!
- SQL Server Indexing Tutorial
- Learn more about using query plans and new indexes to tune queries
- Use STATISTICS TIME and STATISTICS IO to determine query performance
- Making Linked Server names easier to read
- Avoid SQL Server functions in the WHERE clause for Performance
- Avoid Using Functions in WHERE Clause
About the author
View all my tips
Article Last Updated: 2021-03-24