Improve SQL Server Query Performance with Searchable Arguments

By: Eric Blinn | Comments (2) | Related: > TSQL

Problem

My SQL Server query is running slowly. The WHERE clause is comparing against a column that is indexed, but the SQL query isn't doing an index seek. It sure seems like it shouldn't be scanning. Why might that be?

Solution

This is a common problem, especially for newer TSQL developers that are less familiar with the inner working of the SQL Server query engine.

This tip will look at 2 different queries where indexes aren't used by the optimizer, but could be if the query was tweaked. Next, it will explain why this small coding change will work. Finally, we will perform the tweak to see what sort of performance improvement can be expected.

Glossary of Terms

This tutorial will use 2 terms that might be new to readers.

Argument – An argument is another name for a Boolean expression that may appear in a join predicate, WHERE clause, or HAVING clause. Each of these clauses may have many arguments combined with AND and OR keywords. An argument might be "OrderDate = GETDATE()" or "ExtendedPrice > 100".

Searchable – An argument is searchable if it is written in a way that allows SQL Server to use an index.

The point of this tip is to teach TSQL programmers to make sure they are using searchable arguments whenever possible. Sometimes a developer might refer to a searchable argument by the vernacular, a sarge or a sarg.

Note

All of the demos in this tip will use the WideWorldImporters sample database which can be downloaded for free from Github and will be run against SQL Server 2019. The images may be different, but the methodology should still work on older versions of SQL Server.

Function Calls in Arguments

Function calls in arguments are the single most common reason why arguments are not searchable.

Consider this query looking for invoices dated within 5 days of June 1, 2016. This author usually sees this type of example calling GETDATE() rather than using a hard coded date, but WideWorldImporters didn't have any rows new enough that GETDATE() made sense.

SELECT InvoiceDate FROM Sales.Invoices WHERE DATEDIFF(dd, InvoiceDate, '6/1/2016') <= 5;

Looking at the query plan for the query shows a clustered index scan.

That actually makes sense because the InvoiceDate column from the WHERE clause isn't indexed, but it does introduce a new question, "why didn't the optimizer put up that green text with missing index details?"

Let's take a look at the indexes on this table.

exec sp_helpindex[Sales.Invoices]

Below are the indexes for the Sales.Invoices table.

This image shows that the InvoiceDate column is not indexes on the Sales.Invoices table.

There is no index on InvoiceDate, so an index will be created and the query executed again.

CREATE INDEX mssqltips on Sales.Invoices(InvoiceDate);

The result is that the optimizer did use the index, but scanned it rather than perform a preferred seek operation.

To understand why this is the case, the query will be run one more time with an additional column in the SELECT statement.

SELECT InvoiceDate, DATEDIFF(dd, InvoiceDate, '6/1/2016') 
FROM Sales.Invoices 
WHERE DATEDIFF(dd, InvoiceDate, '6/1/2016') <= 5;

Notice that the argument in the WHERE clause is comparing the number 5 with the result of a DATEDIFF operation. When that DATEDIFF operation is placed into the SELECT clause of the query, it creates a new column that has no name. While the InvoiceDate column is indexed, this new, unnamed column that was invented during query execution is not indexed. This is why the argument is not searchable and why a scan is happening.

Also, this new column can't be indexed since it is only calculated during the query execution. This is why SQL Server doesn't put up the green "missing index" text in the query execution plan output.

How can this be fixed?

The key to fixing a problem such as this is to get the function calls away from the column that is being searched.

This alternate version of the query should return the same result set, but compares InvoiceDate to a value that the SQL Server engine can calculate just once at the beginning of the query - not once for every row in the table.

SELECT InvoiceDate FROM Sales.Invoices WHERE InvoiceDate >= DATEADD(dd, -5, '6/1/2016');

Upon executing this query, a seek operation is performed against the new index.

Using SET STATISTICS IO and SET STATISTICS TIME, the output shows the difference in performance between the 2 queries. The query on top is the corrected version with a searchable argument and the query on the bottom that does 60x the reads and takes 2x the elapsed time is the version that does not.

This is the output of STATISTICS TIME and STATISTICS IO.

Another Example

Consider a query looking for a person whose name starts with the letters "IVA". A common way to write that query would be to look at the first 3 characters of the name column and match them to "IVA" as shown below.

SELECT PersonID, FullName
  FROM [WideWorldImporters].[Application].[People]
  WHERE LEFT(FullName, 3) = 'Iva'

Unfortunately, exactly as in the query above, the newly minted column based on the LEFT function is not indexed. Once again SQL Server will scan and not request an index since there is no index that will fix this query.

A better way to write this query would be to use the LIKE operator. While it may seem counter intuitive to introduce the LIKE operator in an effort to get better performance, LIKE will still allow a seek so long as there is not a wild card at the start of the search value.

The next query has this change implemented.

SELECT PersonID, FullName
  FROM [WideWorldImporters].[Application].[People]
  WHERE FullName LIKE 'Iva%'

Running these 2 queries in immediate succession with the query plans included should show output similar to this.

execution plans index scan vs index seek

Notice how the first query with the LEFT function in the WHERE clause performs a scan while the second query with LIKE performs a seek. The optimizer suggests that the LIKE version of the query is about 3x faster than the LEFT version.

Conclusion

Whenever writing TSQL code consider whether the arguments in join predicates, WHERE clauses, and HAVING clauses are searchable. The most common reason is including function calls that require the input of a column.

User-defined functions are also troublesome and often far worse since they are rarely natively compiled and run even slower than the built-in functions that were shown in these demo queries. This would mean that along with being unable to use an index, the query will be further slowed by the function itself!

Next Steps

About the author

Eric Blinn is the Sr. Data Architect for Squire Patton Boggs. He is also a SQL author and PASS Local Group leader.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Monday, March 29, 2021 - 6:24:18 PM - Eric Blinn	Back To Top (88468)
Thank you, Kyle! I'll try to come up with more topics like this one.

Monday, March 29, 2021 - 11:09:19 AM - Kyle	Back To Top (88467)
Well laid out and easy to understand, nicely done. I'd love to see more on this. There's always these little gotchas in queries that don't reveal themselves until you hit millions of records and it would be nice to see more of them in advance :)