Explore Natively Compiled SQL Server Stored Procedure Execution Plans
By: Mehdi Ghapanvari | Updated: 2023-10-09 | Comments | Related: More > Performance Tuning
Problem
This article aims to elucidate the disparity between native compilation and traditional procedures regarding execution plans. I will refrain from stating which option is faster or superior since it ultimately depends on your specific scenario. In a specific scenario, I replaced a natively compiled stored procedure with a traditional stored procedure to improve performance and achieved the desired outcome.
Solution
T-SQL is a CPU-intensive programming language that operates in an interpreted manner. Even a simple T-SQL statement requires the execution of thousands of CPU instructions. SQL Server can compile stored procedures natively for interacting with memory-optimized tables. This process involves optimizing and transforming T-SQL statements into Visual C code, which creates a DLL. Natively compiled stored procedures are known for their improved performance compared to traditional stored procedures. Native compilation enables quicker data access and more efficient execution of queries compared to interpreting (traditional) T-SQL.
However, there are cases where they may experience slower execution times. The only supported join type in natively compiled stored procedures is the nested loop. Natively compiled code does not support parallelism and always has serial execution plans. One of the main features of OLTP systems is their small size. Initially, it may seem more efficient to utilize a natively compiled stored procedure that employs a nested loop join and executes sequentially. However, in today's context, there are OLTP systems that can handle a volume of more than 50 million records every day. In an OLTP system, we typically do not execute complex queries. But sometimes, running a complex query on an OLTP system is unavoidable. In some situations, running queries in parallel mode can speed up the process of extracting data from an OLTP system.
In this article, I will not compare the execution times of two stored procedures. I will be comparing the actual execution plans of the two procedures. In my experience, the Natively Compiled Stored Procedure was slow, and I replaced it with a traditional stored procedure for better performance.
Set Up Test Environment
I will use the AdventureWorks database and add an in-memory optimized filegroup.
Use master GO Alter Database AdventureWorks2019 ADD Filegroup FG_Inmom Contains Memory_Optimized_Data Alter Database AdventureWorks2019 ADD File (Name = F_Inmom, FileName = N'D:\Data\F_Inmom') To FileGroup FG_Inmom
GO
The following command alters the database compatibility level to 160 (SQL Server 2022):
Alter Database AdventureWorks2019 Set Compatibility_Level = 160 GO
I will create two memory-optimized tables and populate them with a set of data:
Drop Table If Exists [dbo].[SalesOrderHeader_InMemory] Drop Table If Exists [dbo].[SalesOrderDetail_InMemory] GO CREATE TABLE [dbo].[SalesOrderHeader_InMemory] ( [SalesOrderID] [int] NOT NULL, [CustomerID] [int] NOT NULL, [TaxAmt] [money] NOT NULL, [OrderDate] [datetime] NOT NULL, [DueDate] [datetime] NOT NULL, [Freight] [money] NOT NULL, [SubTotal] [money] NOT NULL, CONSTRAINT [PK_SalesOrderHeader_InMemory] PRIMARY KEY NonClustered ( [SalesOrderID] ASC ) ) With (Memory_optimized = On, Durability = Schema_and_data) GO CREATE TABLE [dbo].[SalesOrderDetail_InMemory] ( [SalesOrderID] [int] NOT NULL, [SalesOrderDetailID] [int] NOT NULL, [UnitPrice] [money] NOT NULL, [ProductID] [int] NOT NULL, [OrderQty] [smallint] NOT NULL, CONSTRAINT [PK_SalesOrderDetail_InMemory] PRIMARY KEY NONCLUSTERED ( [SalesOrderID] ASC, [SalesOrderDetailID] ASC ) ) WITH (Memory_optimized = On, Durability = Schema_and_data) GO Insert Into [dbo].[SalesOrderHeader_InMemory] Select [SalesOrderID], [CustomerID], [TaxAmt], [OrderDate], [DueDate], [Freight], [SubTotal] From sales.SalesOrderHeader GO Insert Into [dbo].[SalesOrderDetail_InMemory] Select [SalesOrderID], [SalesOrderDetailID], [UnitPrice], [ProductID], [OrderQty] From sales.SalesOrderDetail GO
Now, I aim to create two stored procedures: a Natively Compiled and a traditional one.
CREATE Or Alter PROCEDURE USP_NativeCompile WITH NATIVE_COMPILATION, SCHEMABINDING AS BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english' ) Select soh.SalesOrderID, soh.CustomerID, soh.OrderDate, soh.DueDate, soh.Freight, sod.UnitPrice, soh.SubTotal, soh.TaxAmt From dbo.SalesOrderHeader_InMemory soh Inner Join dbo.SalesOrderDetail_InMemory sod On sod.SalesOrderID = soh.SalesOrderID Order By UnitPrice Desc END GO Create Or Alter Procedure USP_Regular AS Select soh.SalesOrderID, soh.CustomerID, soh.OrderDate, soh.DueDate, soh.Freight, sod.UnitPrice, soh.SubTotal, soh.TaxAmt From SalesOrderHeader_InMemory soh Inner Join SalesOrderDetail_InMemory sod On sod.SalesOrderID = soh.SalesOrderID Order By UnitPrice Desc GO
To view the actual execution plan, press Ctrl + M and then execute the following procedure:
Exec USP_NativeCompile GO
Pay attention to the following image:

As illustrated in the above image, the natively compiled stored procedure was executed in serial mode. As a result of incorporating an ORDER BY clause in the query, the execution plan includes a sort operator. The sort operator is expensive, and running the query in parallel may be faster.
The image below (zoomed in section of above image) illustrates that SQL Server utilized a nested loop join operator:

A nested loop join is employed in the execution plan, where each row of the outer table is sequentially matched with the relevant rows in the inner table. The performance of a nested loop join is typically efficient for small to moderate-sized tables or when the join condition is highly selective. However, it may become less optimal for larger tables or less selective join conditions.
Next, I will execute the traditional procedure to fetch data from the memory-optimized tables:
Exec USP_Regular GO
Take note of the parallelism operator highlighted in the execution plan:

In the following image, you can see that SQL Server employed eight logical CPU cores to execute the procedure:

In parallel query execution, a single query is divided into multiple smaller tasks, and each task is assigned to a separate processor core or thread for execution. The results from these parallel tasks are then combined to produce the final result set.
The image below (part of the above execution plan) illustrates the utilization of a hash match join operator by SQL Server:

During a hash match join, SQL Server generates a hash table from the smaller input or table, which it uses to probe and match rows from the larger input or table.
Summary
The natively compiled stored procedure enables quicker data access and more efficient execution of queries compared to interpreting (traditional) T-SQL. However, there are cases where they may experience slower execution times.
In this article, we compared actual execution plans for a natively compiled and a traditional procedure. This article aims to encourage an informed decision in readers when selecting between traditional and natively compiled procedures. In conclusion, any such choice should be preceded by testing and measurement.
Next Steps
- SQL Server In Memory OLTP Tips
- SQL Server Stored Procedure Native Compilation Advisor
- Migrate to Natively Compiled SQL Server Stored Procedures for Hekaton
About the author

View all my tips
Article Last Updated: 2023-10-09