Avoid SQL Server functions in the WHERE clause for Performance

By:   |   Comments (13)   |   Related: > Performance Tuning


Problem

SQL Server offers many handy functions that can be used either in your SELECT clause or in your WHERE clause.  For the most part these functions provide complex coding that would be very difficult to get this same functionality without these functions.  In addition to the built in functions you also have the ability to develop your own user defined functions.  When functions are used in the SELECT clause to return uppercase output, a substring or whatever, it doesn't affect performance that much, but when functions are used improperly in the WHERE clause these functions can cause major performance issues.

Solution

When functions are used in the SELECT clause, the function has to be run with each data value to return the proper results.  This may not be a bad thing if you are only returning a handful of rows of data.  But when these same functions are used in the WHERE clause this forces SQL Server to do a table scan or index scan to get the correct results instead of doing an index seek if there is an index that can be used.  The reason for this is is that the function value has to be evaluated for each row of data to determine it matches your criteria. 

Here are some simple statements that show you the affect of using a function in the WHERE clause.  To get a better understanding of how these queries are working we are also getting the query plan. This can be done by hitting Ctrl-M in a query window to turn this function on before running the query.

Example 1

This first example uses the LEFT function to get the first two characters of the email address.  Once this is done each row is evaluated to see if it matches the "As" criteria.  The EmailAddress is indexed, so SQL Server should be able to use an index seek to find the data.

SELECT EmailAddress
FROM person.contact 
WHERE left(EmailAddress,2) = 'As'

From the query plan output we can see that this query does an index scan, which means it reviews all rows before returning the results.  This is because of the LEFT function that is being used.

query plan

Another version of this same query which will return the same results uses the LIKE clause instead.  This query uses the like clause to get all data that begins with "As".  Since there is an index on the the EmailAddress column SQL Server can do an index seek which is much more efficient then an index scan.

SELECT EmailAddress		
FROM person.contact 
WHERE EmailAddress like 'As%'
query plan

Example 2

Here is another example where the UPPER clause is used to transform the EmailAddress into upper case before evaluating the data.  Again the EmailAddress is indexed.

SELECT EmailAddress		
FROM person.contact 
WHERE upper(EmailAddress) like 'AS%'

We can see that the query plan for this also does an index scan versus an index seek.

query plan

A second version of this query again just uses the LIKE clause to get the same results.

SELECT EmailAddress		
FROM person.contact 
WHERE EmailAddress like 'AS%'

Again this query does an index seek versus a index scan.

query plan

Example 3

Here is another example where the function DateDiff is used.  The function is getting rows where the difference in minutes between the ModifiedDate and the getdate() function is greater then zero.  In addition, column ModifiedDate is indexed.

SELECT ModifiedDate		
FROM person.contact 
WHERE datediff(minute,ModifiedDate,getdate())>0

This first query is using the function and therefore an index scan has to occur.

query plan

In this version we are just doing a straight comparison of ModifiedDate compared to getdate().

SELECT ModifiedDate		
FROM person.contact 
WHERE ModifiedDate < getdate()

Since we are not using a function this query is using an index seek.

query plan

Summary

The data that was used to run these queries was the data in the AdventureWorks database.  This database is quite small compared to most SQL Server installations, so the query time results as well as the overall I/O overhead that may be imposed by doing index scans versus index seeks will be quite different and also probably show some significant improvements in your environment.  Not using functions in the WHERE clause is a simple thing to avoid and can provide big performance gains if use alternative methods.

Next Steps
  • Look for poor performing statements in your databases where scans are occurring to see if functions are being used in the WHERE clause
  • Look for alternative methods for getting the same query results, such as some of the examples have shown


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Greg Robidoux Greg Robidoux is the President and founder of Edgewood Solutions, a technology services company delivering services and solutions for Microsoft SQL Server. He is also one of the co-founders of MSSQLTips.com. Greg has been working with SQL Server since 1999, has authored numerous database-related articles, and delivered several presentations related to SQL Server. Before SQL Server, he worked on many data platforms such as DB2, Oracle, Sybase, and Informix.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Friday, March 30, 2018 - 12:20:05 AM - deFreitas Back To Top (75557)

 

It's not just for MSSQL, PostgreSQL and Oracle have the same behavior with the same examples, I am not sure about MYSQL


Tuesday, January 2, 2018 - 9:36:36 AM - Greg Robidoux Back To Top (74673)

Hi Gary,

Yes you are correct.

If you invoke a function call on a value from a row in the query it will need to call the function for each row in the query.  So for this example it has to evaluate each StartTime value to see if this is true.

SELECT StartTime FROM CommandLog WHERE DATEADD(DD,30,StartTime)>'12/29/2017'

Buf it you use a function that returns only one value it will get run once.  This only has to get the current date once and then subtract 30 days from it once.  This value is then used to evaluate if the StartTime is greater than this calculated value.

SELECT StartTime FROM CommandLog WHERE StartTime > DATEADD(DD,-30,GETDATE())

-Greg


Friday, December 29, 2017 - 2:57:41 PM - Gary Pollock Back To Top (74576)

 I appreciate this post.  Is this only true for for "lvalue" functions, where the input is based on a value for the row?  In SQL Server 2012, I created a command log table, created a nonclustered index on the StartTime column, and ran the below scripts.  In all the below, results came back in less than a second--there are only 55k rows in the table.  Based on the plans, it looks like the query optimizer can tell when it only needs to run a function once, so having a function in the WHERE clause is OK so long as the function is not dependent on part of the table.

 

--1) Function on column value; uses index scan on new index

SELECT StartTime

FROM CommandLog

WHERE DATEADD(DD,30,StartTime)>'12/29/2017'

 

--2) Function a value not related to table; uses index seek on new index

SELECT StartTime

FROM CommandLog

WHERE StartTime>DATEADD(DD,-30,GETDATE())

 

--3) No function; uses index seek on new index

SELECT StartTime

FROM CommandLog

WHERE StartTime>'11/29/2017'

 


Wednesday, May 20, 2015 - 10:09:16 AM - slavica Back To Top (37236)

'but what if we dont have an index on that column on whice we are using UDF in where clause .then how can we improve the performace'

This is maybe one more way of how I found SurrogateKey (column Id which is Identity) very useful.

I made usualy Clustered Index on that colum, but if you already have other Clustered Index then you can make NonClusteredIndex on that column. In Include list od that nonclustered index shoul be all columns from SELECT list.

then instead of

SELECT ModifiedDate
FROM person.contact
WHERE datediff(minute,ModifiedDate,getdate())>0

.


SELECT ModifiedDate
FROM person.contact
WHERE Id>0

and datediff(minute,ModifiedDate,getdate())>0


will be used IndexSeek of that NonClusteredIndex for anu function or any column which does not have Index.



Tuesday, September 16, 2014 - 1:49:43 PM - Greg Robidoux Back To Top (34548)

Hi Puru,

the computed column would have to be created on the table in the other database in order to use that approach.

-Greg


Tuesday, September 16, 2014 - 11:58:46 AM - puru Back To Top (34546)

thanks for the help Greg

But i have one more issue that the funtion that i have to use with the column is in other db and  I think we cannot use database name when creating a computed column.

 


Monday, September 15, 2014 - 4:06:14 PM - Greg Robidoux Back To Top (34537)

Hi Puru,

Not sure what your UDF does, but if performance is an issue you could create a computed column instead of using the UDF and use a persisted value for that column instead of calculating it every time.

http://www.mssqltips.com/sqlservertip/1682/using-computed-columns-in-sql-server-with-persisted-values/

http://msdn.microsoft.com/en-us/library/ms189292.aspx

Let me know if this helps.

-Greg


Monday, September 15, 2014 - 3:28:42 PM - puru Back To Top (34536)

Hey good article ,but what if we dont have an index on that column on whice we are using UDF in where clause .then how can we improve the performace


Wednesday, November 6, 2013 - 4:30:02 AM - aa Back To Top (27414)

brilliant


Thursday, September 20, 2012 - 5:45:13 PM - Greg Robidoux Back To Top (19602)

@nikit - you are correct.  Changing the collation does a scan on the index if it is case sensitive.

Looks like you don't have much of an option if your database is case sensitive and you need to search the data ignoring the case.

A work around would be to always store the data using the same case or making the column case insensitive in the table if it doesn't really need to be case sensitive.

CREATE TABLE [Person].[Contact2](
   [EmailAddress] [nvarchar](50) collate SQL_Latin1_General_CP1_CI_AS NULL,
   ) ON [PRIMARY]


Thursday, September 20, 2012 - 4:20:11 PM - nikit Back To Top (19599)

SELECT EmailAddress FROM person.contact WHERE EmailAddress like 'AS%' collate SQL_Latin1_General_CP1_CI_AS

IF you use the collate property that will also force the SQL server to use the Scan instead of seek. I currently have that same issue and not sure how to resolve it


Friday, October 14, 2011 - 9:48:31 AM - Greg Robidoux Back To Top (14845)

Try something like this to see if this works:

SELECT EmailAddress FROM person.contact WHERE EmailAddress like 'AS%' collate SQL_Latin1_General_CP1_CI_AS
 


Friday, October 14, 2011 - 8:53:14 AM - Kenny Back To Top (14840)

In Example 2:

SELECT EmailAddress FROM person.contact WHERE upper(EmailAddress) like 'AS%'

your alternative is to just remove the upper() function??

I tried that in a database with case sensitive settings. it does NOT give the same results!

==> not a valid alternative.

Do you know any? Postgres, Oracle, .. allow indexes on lower(EmailAddress), MS SQL does not. 















get free sql tips
agree to terms