By: Fikrat Azizov | Last Updated: 2015-12-07 | Comments (1) | Functions - User Defined UDF
While working on SQL Server performance optimization for one of my customers I came across a very long running stored procedure which was doing the following; a 'SELECT' query inserting its output into temporary table, followed by an 'INSERT' query reading from this temporary table, then some data manipulation and finally INSERTing the data into another table for reporting purposes.
After further investigation and analysis of this process I found the real problem was the extensive use of scalar user defined functions (UDF). By identifying the problem and reworking the stored procedure I was able to reduce the overall duration from 90 minutes to less than 5 minutes. This example highlights the importance of identifying and fixing performance problems caused by UDFs.
In this tip we will cover some issues caused by UDFs and several queries you can use to capture where UDFs are being used and the performance impact they are causing, so you know where you need to optimize your code to improve performance.
Why SQL Server scalar functions are bad?
For the purpose of this discussion I'll divide all UDF into 2 categories -scalar functions and table valued functions. Although functions are useful for encapsulation and code simplification, there are several performance problems related to user defined functions, especially scalar functions. Some of these problems include:
- Scalar functions act like cursors, causing row-by row processing. This explains the exponential performance degradation I was seeing as the number rows in the query grew.
- Scalar functions don't allow queries to benefit from parallelism, therefore limiting their executions to a single processor.
- Scalar functions lead to incorrect cost estimation resulting in suboptimal execution plans.
The bottom line is that scalar functions are hidden bottlenecks - it's hard to understand the performance impact by looking at execution plans that use UDFs which makes troubleshooting problematic.
I've included a few links below to articles discussing performance problems caused by functions and ways to fix them and I'm sure you can find many more publications on the internet. However, in this tip I'll focus on some techniques to identify scalar functions in your environment and measure the performance impact.
How to measure the SQL Server performance impact of scalar functions?
When it comes to assessing performance impact of scalar functions, as a DBA I'd like to know:
- What is the list of functions in use and how frequently are they being called?
- What is the performance impact of these functions - in other words, how much time is spent executing them and what was the resource consumption?
As I've mentioned earlier, SQL execution plans may not be the best tool to estimate the performance impact of a scalar function as it does not reflect the overall impact when used with large queries.
Below I've explored different ways to measure UDF performance impact and included the advantages and limitations.
Using DMV's against the SQL Server plan cache to catch UDF execution plans
This query will show performance statistics (like number of executions, average elapsed time, average logical/physical reads, etc.) for recently called functions. Functions having high number of executions as well as the ones involving high number of reads would be the best candidates for optimization. This is the most accurate way of measuring UDF performance impact on a production environment in my view. However, it has limitations - it will only catch functions accessing data as SQL Server does not create execution plans for pure calculation type functions.
This query will return all types of UDFs, including scalar UDFs.
sp_msforeachdb ' IF ''?'' NOT IN (''msdb'',''master'',''tempdb'') BEGIN USE [?] SELECT top 100 DB_NAME() as [database], qs.total_worker_time /1000000 As TotalWorkerTime, QS.total_elapsed_time/1000000 As TotalElapsedTime_Sec, QS.total_elapsed_time/(1000000*qs.execution_count) AS [avg_elapsed_time_Sec], QS.execution_count, QS.total_logical_reads/QS.execution_count As Avg_logical_reads, QS.max_logical_writes, ST.text AS ParentQueryText, SUBSTRING(ST.[text],QS.statement_start_offset/2+1, (CASE WHEN QS.statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(max), ST.[text])) * 2 ELSE QS.statement_end_offset END - QS.statement_start_offset)/2) AS [Query Text] , QP.query_plan , O.type_desc FROM sys.dm_exec_query_stats QS CROSS APPLY sys.dm_exec_sql_text(QS.sql_handle) AS ST CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) QP LEFT JOIN Sys.objects O ON O.object_id =St.objectid Where O.type_desc like ''%Function%'' ORDER by qs.total_worker_time DESC ; END '
Using DMV's against SQL Server plan cache to catch execution plans of the queries, referring to UDF
The query provided below checks the SQL plan cache for queries having 'ComputeScalar' operations caused by scalar functions and shows performance statistics for those queries. Although you can use this script to measure performance of their parent queries, it will not show the resource impact caused by the functions.
;WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan') ,GetQueriesWithUDF AS ( SELECT DISTINCT udf.value('(@FunctionName)','varchar(100)') As UDF_Name, RIGHT(udf.value('(@FunctionName)','varchar(100)'), Len(udf.value('(@FunctionName)','varchar(100)'))- charindex('.',udf.value('(@FunctionName)','varchar(100)'), charindex('.',udf.value('(@FunctionName)','varchar(100)'),1)+1)) As Stripped_UDF_Name, QS.execution_count, CONVERT(float,cs.value('(.//RelOp/@EstimateRows)','varchar(100)')) As EstimatedRows, QS.total_elapsed_time/1000000 As TotalElapsedTime_Sec, QS.max_elapsed_time/1000000 As max_elapsed_time_Sec, QS.total_elapsed_time/(1000000*qs.execution_count) AS [avg_elapsed_time_Sec], QS.total_logical_reads, QS.total_logical_reads/qs.execution_count As avg_logical_reads, SUBSTRING(ST.[text],QS.statement_start_offset/2+1, (CASE WHEN QS.statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(max), ST.[text])) * 2 ELSE QS.statement_end_offset END - QS.statement_start_offset)/2) AS [Query_Text], ST.text As Parent_Query FROM sys.dm_exec_query_stats QS CROSS APPLY sys.dm_exec_sql_text (QS.sql_handle ) ST CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp CROSS APPLY qp.query_plan.nodes('.//ComputeScalar') AS CompCsalar(cs) CROSS APPLY cs.nodes('.//UserDefinedFunction[@FunctionName]') AS fn(udf) Where DB_Name(qp.dbid) not in ('msdb','master','tempdb') ) Select * From GetQueriesWithUDF Where Query_text like '%'+Stripped_UDF_Name+'%' ORDER BY EstimatedRows desc OPTION(MAXDOP 1, RECOMPILE);
Using SQL Server Extended Events to capture UDF counts
This query creates an Extended Events session to capture 'module_end' events filtered by the function object type. Please note you'd need to replace 'AdventureWorks2014' in this script with your database's name before executing. To minimize the trace impact on the server, I've used a histogram target which collects only the total number of executions per each UDF. However, I'd recommend running it on a test server before running on production. Also, this method works well with SQL 2012 and later versions, but may not work with older versions.
CREATE EVENT SESSION [UDF_Trace] ON SERVER ADD EVENT sqlserver.module_end( ACTION(sqlserver.database_name,sqlserver.sql_text) WHERE ([sqlserver].[equal_i_sql_ansi_string]([object_type],'FN') AND [sqlserver].[database_name]=N'AdventureWorks2014')) ADD TARGET package0.histogram(SET filtering_event_name=N'sqlserver.module_end', source=N'object_name',source_type=(0)) WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS, MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB, MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF) GO ALTER EVENT SESSION [UDF_Trace] ON SERVER STATE=START GO
Using Server Side Trace in SQL Server to capture UDF execution details
This query creates server traces to capture SP:Completed events and filters them by ObjectType = 20038. Please note that this trace is detailed and could potentially lead to serious performance problems due to the high number of events, so be careful when running it on production environment.
I've used server traces rather than the traditional Profiler traces to reduce the impact of the trace collection on the server. For those not familiar with server side traces, they are lightweight traces that run in the background and can be stopped and started using T-SQL commands. The results can be analyzed using SQL Profiler or T-SQL commands, here you can find useful information on managing them.
To start the trace you need to open the attached script from here, specify the trace location and file name in variable @FilePath, execute it and write down the trace ID produced by the query. You'll need this ID to stop the trace later. Once the trace data is collected, replace @TraceId in the following commands with the trace ID and execute it to stop the trace:
exec sp_trace_setstatus @TraceId, 0 exec sp_trace_setstatus @TraceId, 2
You can analyze function statistics from the traces using the query below (please replace the file path below with the value of @FilePath variable used earlier):
select databasename,objectname,sum(duration) FROM fn_trace_gettable('c:\data\UDFTrace.trc',default) group by databasename,objectname
- Learn how to resolve performance problems caused by scalar functions from these articles:
- Check out these additional resources:
Last Updated: 2015-12-07
About the author
View all my tips