How To Retrieve the Last Non-NULL Value in SQL Server

By:   |   Updated: 2022-09-14   |   Comments (5)   |   Related: > TSQL


Problem

Sometimes there are gaps in the result set of my SQL query. For example, a specific state was not known for a period, and thus the query returns NULL for that period. In that case, I would like to return the previous known value. Is this possible in SQL?

Solution

Suppose we have the following sample data:

sample data

We have a couple of months worth of timesheet data for two employees. We retrieved the contract type for each employee. However, employee A has left the company in March, but in the contract type lookup table, the contact type was only listed until the end of February. This means NULL is returned for March, which ideally should have been the value "Permanent". Employee B on the other hand, has switched from a temporary contract to a permanent position, but this change only went in effect in May. For March and April, NULL is returned instead of "Temporary".

The goal of the solution is to return the following result set:

desired result set

The "last known contract type" column returns the current contract type. If it is NULL, the last known contract type is fetched. It keeps going back in time until a non-NULL value is found. This solution can be created in SQL using window functions. In this tip, we propose two solutions. The first solution will work on older versions of SQL Server (SQL Server 2012 – 2019), while the second one will work on SQL Server 2022 or Azure SQL Database.

At the time of writing, SQL Server 2022 is still in preview (currently CTP 2.1 has been released). This means functionality or features of SQL Server might change, disappear, or be added in the final release.

Older Versions of SQL Server

This solution has been proposed by Itzik Ben-Gan in this article: The Last non NULL Puzzle. The solution uses multiple steps and the LAG function to get the desired result.

Let’s generate the sample data and store it into a temporary table:

DROP TABLE IF EXISTS #SampleData;
 
WITH CTE_SampleData AS
(
    SELECT DateKey = 20210101, EmployeeCode = 'A', ContractType = 'Permanent', DaysWorked = 20
    UNION ALL
    SELECT DateKey = 20210201, EmployeeCode = 'A', ContractType = 'Permanent', DaysWorked = 18
    UNION ALL
    SELECT DateKey = 20210301, EmployeeCode = 'A', ContractType = NULL, DaysWorked = 1
    UNION ALL
    SELECT DateKey = 20210101, EmployeeCode = 'B', ContractType = 'Temporary', DaysWorked = 20
    UNION ALL
    SELECT DateKey = 20210201, EmployeeCode = 'B', ContractType = 'Temporary', DaysWorked = 18
    UNION ALL
    SELECT DateKey = 20210301, EmployeeCode = 'B', ContractType = NULL, DaysWorked = 0
    UNION ALL
    SELECT DateKey = 20210401, EmployeeCode = 'B', ContractType = NULL, DaysWorked = 0
    UNION ALL
    SELECT DateKey = 20210501, EmployeeCode = 'B', ContractType = 'Permanent', DaysWorked = 19
)
SELECT *
INTO #SampleData
FROM CTE_SampleData;

The first step is to group the rows with NULL values together with the last non-NULL value. This can be done by using a MAX window function:

WITH cte_grp AS
(
    SELECT
         *
        ,grp = MAX(IIF(ContractType IS NOT NULL, DateKey,NULL)) OVER (PARTITION BY EmployeeCode ORDER BY DateKey ROWS UNBOUNDED PRECEDING)
    FROM #SampleData
)
SELECT * FROM cte_grp

The following result is returned:

grouped result set

To make this work, you need a column with unique and incrementing values per employee. This can be an identity column, but in this example the dates work as well. As you can see, for employee A, 20210201 is returned as the month with the last non-NULL value in row 3, while for employee B, 20210201 is returned as well for both rows 6 and 7.

In the next step, we can use the MAX window function again to retrieve the effective last known non-NULL value within each group, giving us the desired end result.

WITH cte_grp AS
(
    SELECT
         *
        ,grp = MAX(IIF(ContractType IS NOT NULL, DateKey,NULL)) OVER (PARTITION BY EmployeeCode ORDER BY DateKey ROWS UNBOUNDED PRECEDING)
    FROM #SampleData
)
SELECT
     *
    ,LastKnownContractType = MAX(ContractType) OVER (PARTITION BY EmployeeCode, grp ORDER BY DateKey ROWS UNBOUNDED PRECEDING)
FROM cte_grp
ORDER BY EmployeeCode, DateKey
desired end result with legacy window functions

SQL Server 2022 (and Newer) and Azure SQL Database

In the ANSI SQL standard, there’s an optional IGNORE NULLS option, which tells the database engine to ignore null values while executing the function. This would be ideal for our use case here. However, in older versions of SQL Server, this feature was not implemented. The opposite of IGNORE NULLS is RESPECT NULLS, which you could say has been the default since window functions were introduced.

In the preview of SQL Server 2022, and a bit earlier in Azure SQL DB, this option has been implemented. Unfortunately only in the FIRST_VALUE and LAST_VALUE functions. Hopefully, it gets implemented in other functions as well, such as LAG and LEAD.

Fortunately, we can solve our use case with the LAST_VALUE function. The query is shorter than the previous one:

SELECT
     *
    ,LastKnownContractType  = LAST_VALUE(ContractType) IGNORE NULLS OVER (PARTITION BY EmployeeCode ORDER BY DateKey)
FROM #SampleData
ORDER BY EmployeeCode, DateKey

The result:

result with IGNORE NULLS

As you can see, the SQL statement is shorter and perhaps easier to understand.

Next Steps
  • Stay tuned for more SQL Server 2022 tips on the MSSQLTips website!
  • If you want to try it out yourself, you can create a pre-configured virtual machine in Azure. You can create a free trial here. More info about the SQL Server 2022 release can be found here. Or you can try it out in Azure SQL Database as well.
  • You can find more SQL Server 2022 tips in this overview.
  • If you want to learn more about window functions in T-SQL, check out this tutorial.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Koen Verbeeck Koen Verbeeck is a seasoned business intelligence consultant at AE. He has over a decade of experience with the Microsoft Data Platform in numerous industries. He holds several certifications and is a prolific writer contributing content about SSIS, ADF, SSAS, SSRS, MDS, Power BI, Snowflake and Azure services. He has spoken at PASS, SQLBits, dataMinds Connect and delivers webinars on MSSQLTips.com. Koen has been awarded the Microsoft MVP data platform award for many years.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2022-09-14

Comments For This Article




Wednesday, January 25, 2023 - 2:38:53 PM - Koen Verbeeck Back To Top (90855)
Hi Jeff,
I did some performance tests. I blew up the sample data to about 16 million rows.
The SQL statement using LAST_VALUE is a bit faster (not that much) than the statement using the "older" window functions.
This is mainly caused by the fact there's only one sort operator needed. In the worktable, there are 76622 read ahead reads compared with 161647 for the slower statement. Query cost is 36%, the other 64%.

Regards,
Koen

Monday, January 23, 2023 - 1:02:47 PM - Koen Verbeeck Back To Top (90845)
Hi Jeff,

I do not :) I wrote this article more as a functional explanation of the new feature.
But it's an interesting questions for sure. I'll see if I can make some time to whip up some test queries to compare performance.

Regards,
Koen

Sunday, January 15, 2023 - 8:39:31 PM - Jeff Moden Back To Top (90828)
Do you have any performance metrics for LAST_VALUE() and FIRST_VALUE()?

Friday, September 16, 2022 - 3:42:07 AM - Koen Verbeeck Back To Top (90490)
Hi,

as a general rule, I try to avoid correlated subqueries. They tend to be less scalable, as in a worst case scenario you end up with one subquery for each row. I tested the queries from this tip on a larger sample data set, and typically the logical reads for a solution using window functions were much lower than the query using a correlated subquery.

Before SQL Server 2012, the correlated subquery was your only option though, but I hope that most of us are working on a more recent version of SQL Server :)

Regards,
Koen

Thursday, September 15, 2022 - 2:32:31 AM - K Back To Top (90482)
for older version is posssible:

SELECT *, LastKnownContractType = (SELECT TOP 1 ContractType FROM #SampleData WHERE EmployeeCode = t1.EmployeeCode AND ContractType IS NOT NULL AND DateKey <= t1.DateKey ORDER BY DateKey DESC)
FROM #SampleData t1














get free sql tips
agree to terms