Inline Scalar UDFs in Fabric Warehouses – Part 1

Problem

When business logic changes in a view, stored procedure, or ORM, we want to minimize the number of places that need to be updated. We don’t want to retype code changes in several places, especially with complex CASE expressions or calculations—we’re prone to errors. The more objects we modify, the more likely it is that we’ll break something. Also, more code changes mean more testing. How do we create reusable code in SQL?

Solution

One way to create reusable code routines is with inline scalar user-defined functions (UDFs). A long time ago, SQL Server added UDFs. But starting in 2019, Microsoft introduced the concept of inlining, which boosts performance. Now, Fabric has inline scalar UDFs for warehouses.

In Part 1 of this two-part series, we’ll start by looking at what UDFs are and why you might want to use them. We’ll determine how to ensure your functions remain inline-able in a warehouse. Finally, we’ll look at how a simple calculation performs in a speed test compared to inline code. In Part 2, we’ll dive deeper into more complex UDFs.

Scalar UDF

What is a scalar user-defined function (UDF)? I’m glad you asked. A scalar UDF is a type of function that accepts zero or more parameters as input (though I’ve never created one with zero), performs logic operations, and outputs a scalar (meaning one) value. The return value can be almost any data type.

Reusing Code

The primary benefit of using UDFs is that they package code into a single routine that can be reused in multiple places. In other words, UDFs allow code reuse, which I’m a massive fan of, since it reduces the number of objects that need updating.

For example, a complex CASE expression or calculation used in multiple stored procedures needs updating in each object when the logic changes. If we use functions instead, we can update a single object (the function). Even if we don’t update stored procedures or views directly, it’s wise to test them to confirm they still return expected results after making changes to the function.

Inlining Scalar UDFs

After SQL Server introduced scalar UDFs, they got a bad rap for good reasons, but mainly if they accessed user tables. Until 2019, when you called scalar UDFs, they executed on a row-by-row basis, often resulting in slow performance, especially with a larger result set, similar to a cursor or a recursive CTE. That changed with SQL Server 2019, which introduced inlining. Inlining embeds UDFs that qualify directly into the calling query instead of executing via the UDF operator.

For example, without inlining, we might have a SELECT statement that runs in under a second, but with a UDF, the time may balloon to two to three minutes. The cost is hidden behind a Compute Scalar operator in the execution plan, but you can see this iterative action with Extended Events.

Scalar UDFs are similar to views and stored procedures since they allow you to package reusable code. However, could you imagine if you used a view and it stopped parallelism? Or if views were invoked on a row-by-row basis? No one would ever use them. But that was the typical user experience before inlinable UDFs.

In my experience, if they were simple, the performance wasn’t awful.

Scalar UDFs in a Warehouse

Microsoft recently added scalar UDFs for Fabric Warehouses. As I write, they’re in preview, and the behavior may change. The focus seems to be on the inlinable type versus the other. This makes sense because, for a Warehouse, we usually invoke the function for a larger rowset, not a single row, like in OLTP systems. I’ve included an article that details all the limitations.

A big difference between SQL Server and a Fabric Warehouse is that we can’t include a non-inlinable function in the SELECT query that references a user table in the FROM clause. If you ask me, this was a great idea since it removes one of the ways UDFs used to perform poorly.

Warehouse Testing

To test the new UDFs, I’ll create a warehouse called MSSQLTips_DW in my Fabric capacity, not a trial capacity. But it’s only an F2 since I’m being cheap. Still, this should be enough horsepower to tease out any differences.

Fabric Warehouse

I’ll use the script below to create a test table and populate it with ten million rows. Why ten million? Because anything less in a warehouse just doesn’t feel right.

/*
* MSSQLTips.com
*/
DROP TABLE IF EXISTS dbo.BigTable;
GO
CREATE TABLE dbo.BigTable
(
    Id INT NOT NULL,
    RandomNumber1 INT NOT NULL,
    RandomNumber2 INT NOT NULL,
    RandomDate DATE NOT NULL
);
GO
/*
Source: https://www.cathrinewilhelmsen.net/using-a-numbers-table-in-sql-server-to-insert-test-data/
*/
;WITH
    L0   AS (SELECT 1 AS n UNION ALL SELECT 1),
    L1   AS (SELECT 1 AS n FROM L0 AS a CROSS JOIN L0 AS b),
    L2   AS (SELECT 1 AS n FROM L1 AS a CROSS JOIN L1 AS b),
    L3   AS (SELECT 1 AS n FROM L2 AS a CROSS JOIN L2 AS b),
    L4   AS (SELECT 1 AS n FROM L3 AS a CROSS JOIN L3 AS b),
    L5   AS (SELECT 1 AS n FROM L4 AS a CROSS JOIN L4 AS b),
    Numbers AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5)
INSERT INTO dbo.BigTable
(
    Id,
    RandomNumber1,
    RandomNumber2,
    RandomDate
)
SELECT TOP (10000000)
    n AS Id,
    (ABS(CHECKSUM(NEWID())) % 100000 + 1) AS RandomNumber1,
    (ABS(CHECKSUM(NEWID())) % 10 + 1) AS RandomNumber2,
    CAST(DATEADD(DAY, (ABS(CHECKSUM(NEWID())) % 5845), '1980-01-01') AS DATE) AS RandomDate
FROM Numbers;
GO

Let’s compare an inlinable and a non-inlinable UDF. First, we’ll look at the non-inlinable type. An easy way to get your UDF removed from the inlinable list is to add a non-deterministic function, like GETUTCDATE or RANK. A deterministic function always returns the same results given the same input values.

WHILE loops also disqualify a UDF for inlining—Fabric says, “No inlining for you.” If you’re using WHILE loops in UDFs, at least try another method.

/*
* MSSQLTips.com
*/
CREATE OR ALTER FUNCTION dbo.DaysSinceDate (@InputDate DATE)
RETURNS INT
AS
BEGIN
    RETURN DATEDIFF(DAY, @InputDate, GETUTCDATE());
END
GO

Can we even run the query below?

/*
* MSSQLTips.com
*/
SELECT dbo.DaysSinceDate(RandomDate)
FROM dbo.BigTable;
GO

Results:

Msg 19835, Level 16, State 1, Line 213
Scalar UDF execution is currently unavailable in this context.
See https://go.microsoft.com/fwlink/?linkid=2304777 for updates.

Nope, we sure can’t. However, if we run it without any reference to a user table, SQL won’t return an error message.

SELECT dbo.DaysSinceDate('01-01-2024') AS DaysSince;

Results:

DaysSince
-----------
560

Now let’s check the DMV sys.sql_modules to see if it’s inlinable.

/*
* MSSQLTips.com
*/
SELECT SCHEMA_NAME(o.schema_id) as function_schema,
       OBJECT_NAME(m.object_id) as function_name,
       m.is_inlineable,
       o.type
FROM sys.sql_modules as m
   INNER JOIN sys.objects as o
        on m.object_id = o.object_id
WHERE o.type = 'FN';
GO

Results:

query results

The results show that inlining isn’t possible. One way to make it inlinable is to pass in the literal results of GETUTCDATE into the UDF. For our example, it’s a bit silly, but imagine the function containing a few dozen lines of code and GETUTCDATE being a small part of it.

Now let’s look at a function that’s inlinable. Be sure to include the schema name, so SQL doesn’t throw an error message.

/*
* MSSQLTips.com
*/
CREATE OR ALTER FUNCTION dbo.DaysSinceDate
(
    @InputDate DATETIME2,
    @InputDate2 DATETIME2
)
RETURNS INT
AS
BEGIN
    RETURN DATEDIFF(DAY, @InputDate, @InputDate2);
END
GO

This function isn’t the best, but it works for our example. Let’s run the statement below.

/*
* MSSQLTips.com
*/
SELECT TOP 1000000
    dbo.DaysSinceDate(RandomDate, GETUTCDATE()) AS DaysSince
FROM dbo.BigTable;
GO

After running the query, SQL returns results without an error message, showing the function is now inlinable. We can even verify our DMV using the query above.

Query Results

Performance Test

For this article, I’m testing a simple UDF and using an addition calculation to see if there’s any performance difference in execution time between the two methods—inline code or a UDF.

The code below creates the function, and the other two queries are what I used to test.

/*
* MSSQLTips.com
*/
CREATE OR ALTER FUNCTION dbo.GetSecretCalculation
(
    @randomNumber1 INT,
    @randomNumber2 INT
)
RETURNS INT
AS
BEGIN
    RETURN @randomNumber1 + @randomNumber2;
END;
GO

These are the two queries I ran five times each (actually six times to ensure the data was cached). Notice that I’m using the label option, which makes it easier to find in the next script.

/*
* MSSQLTips.com
*/
SELECT TOP 1000000
    dbo.GetSecretCalculation(RandomNumber1, RandomNumber2) AS RandomNumber,
    Id
FROM dbo.BigTable
OPTION (LABEL = 'UDF');
SELECT TOP 1000000
    (RandomNumber1 + RandomNumber2) AS RandomNumber,
    Id
FROM dbo.BigTable
OPTION (LABEL = 'Inline');

The DMV, which captures the execution history, is queryinsights.exec_requests_history. This is the script I used to record the performance of the five runs.

/*
* MSSQLTips.com
*/
;WITH TopQueries
AS (SELECT TOP 50
        r.total_elapsed_time_ms,
        r.data_scanned_memory_mb,
        r.data_scanned_disk_mb,
        r.data_scanned_remote_storage_mb,
        r.command,
        r.start_time,
        r.label
    FROM queryinsights.exec_requests_history AS r
    WHERE r.start_time >= DATEADD(HOUR, -1, GETUTCDATE())
   )
SELECT t.total_elapsed_time_ms,
       AVG(t.total_elapsed_time_ms) OVER () AS avg_elapsed_time,
       SUM(t.total_elapsed_time_ms) OVER () AS total_elapsed_time,
       t.data_scanned_memory_mb,
       t.data_scanned_disk_mb,
       t.data_scanned_remote_storage_mb,
       t.command,
       t.start_time,
       t.label
FROM TopQueries AS t
ORDER BY t.start_time DESC;
GO

Below are the results from five executions of each query.

MethodExecution CountAverage Time in Milliseconds
Inline5350
UFD5317

Performance Summary

Although the UDF slightly outdid the inline code, I consider the results a tie and attribute the differences to noise. The optimizer used the same execution plan for both queries. But how do complex UDFs perform compared to inline code? Stay tuned for Part 2 of this series.

Key Points

  • If your team prefers scalar UDFs, keep using them. Lots of articles criticize them, and before SQL Server 2019, performance could be poor, especially when user tables were involved. But you can’t satisfy everybody online.
  • To get the most out of your scalar UDFs, ensure they’re inlinable. If in doubt, check the DMV script above. I don’t write non-inlinable ones.
  • I like building scalar UDFs with a bottom-up mindset. I avoid planning every complex code piece as a UDF, but repeating code usually signals that creating a function could be a good idea. As with everything in SQL, it depends.

Next Steps

Leave a Reply

Your email address will not be published. Required fields are marked *