How To Retrieve the Last Non-NULL Value in SQL Server
Sometimes there are gaps in the result set of my SQL query. For example, a specific state was not known for a period, and thus the query returns NULL for that period. In that case, I would like to return the previous known value. Is this possible in SQL?
Suppose we have the following sample data:
We have a couple of months worth of timesheet data for two employees. We retrieved the contract type for each employee. However, employee A has left the company in March, but in the contract type lookup table, the contact type was only listed until the end of February. This means NULL is returned for March, which ideally should have been the value "Permanent". Employee B on the other hand, has switched from a temporary contract to a permanent position, but this change only went in effect in May. For March and April, NULL is returned instead of "Temporary".
The goal of the solution is to return the following result set:
The "last known contract type" column returns the current contract type. If it is NULL, the last known contract type is fetched. It keeps going back in time until a non-NULL value is found. This solution can be created in SQL using window functions. In this tip, we propose two solutions. The first solution will work on older versions of SQL Server (SQL Server 2012 – 2019), while the second one will work on SQL Server 2022 or Azure SQL Database.
At the time of writing, SQL Server 2022 is still in preview (currently CTP 2.1 has been released). This means functionality or features of SQL Server might change, disappear, or be added in the final release.
Older Versions of SQL Server
This solution has been proposed by Itzik Ben-Gan in this article: The Last non NULL Puzzle. The solution uses multiple steps and the LAG function to get the desired result.
Let’s generate the sample data and store it into a temporary table:
DROP TABLE IF EXISTS #SampleData; WITH CTE_SampleData AS ( SELECT DateKey = 20210101, EmployeeCode = 'A', ContractType = 'Permanent', DaysWorked = 20 UNION ALL SELECT DateKey = 20210201, EmployeeCode = 'A', ContractType = 'Permanent', DaysWorked = 18 UNION ALL SELECT DateKey = 20210301, EmployeeCode = 'A', ContractType = NULL, DaysWorked = 1 UNION ALL SELECT DateKey = 20210101, EmployeeCode = 'B', ContractType = 'Temporary', DaysWorked = 20 UNION ALL SELECT DateKey = 20210201, EmployeeCode = 'B', ContractType = 'Temporary', DaysWorked = 18 UNION ALL SELECT DateKey = 20210301, EmployeeCode = 'B', ContractType = NULL, DaysWorked = 0 UNION ALL SELECT DateKey = 20210401, EmployeeCode = 'B', ContractType = NULL, DaysWorked = 0 UNION ALL SELECT DateKey = 20210501, EmployeeCode = 'B', ContractType = 'Permanent', DaysWorked = 19 ) SELECT * INTO #SampleData FROM CTE_SampleData;
The first step is to group the rows with NULL values together with the last non-NULL value. This can be done by using a MAX window function:
WITH cte_grp AS ( SELECT * ,grp = MAX(IIF(ContractType IS NOT NULL, DateKey,NULL)) OVER (PARTITION BY EmployeeCode ORDER BY DateKey ROWS UNBOUNDED PRECEDING) FROM #SampleData ) SELECT * FROM cte_grp
The following result is returned:
To make this work, you need a column with unique and incrementing values per employee. This can be an identity column, but in this example the dates work as well. As you can see, for employee A, 20210201 is returned as the month with the last non-NULL value in row 3, while for employee B, 20210201 is returned as well for both rows 6 and 7.
In the next step, we can use the MAX window function again to retrieve the effective last known non-NULL value within each group, giving us the desired end result.
WITH cte_grp AS ( SELECT * ,grp = MAX(IIF(ContractType IS NOT NULL, DateKey,NULL)) OVER (PARTITION BY EmployeeCode ORDER BY DateKey ROWS UNBOUNDED PRECEDING) FROM #SampleData ) SELECT * ,LastKnownContractType = MAX(ContractType) OVER (PARTITION BY EmployeeCode, grp ORDER BY DateKey ROWS UNBOUNDED PRECEDING) FROM cte_grp ORDER BY EmployeeCode, DateKey
SQL Server 2022 (and Newer) and Azure SQL Database
In the ANSI SQL standard, there’s an optional IGNORE NULLS option, which tells the database engine to ignore null values while executing the function. This would be ideal for our use case here. However, in older versions of SQL Server, this feature was not implemented. The opposite of IGNORE NULLS is RESPECT NULLS, which you could say has been the default since window functions were introduced.
In the preview of SQL Server 2022, and a bit earlier in Azure SQL DB, this option has been implemented. Unfortunately only in the FIRST_VALUE and LAST_VALUE functions. Hopefully, it gets implemented in other functions as well, such as LAG and LEAD.
Fortunately, we can solve our use case with the LAST_VALUE function. The query is shorter than the previous one:
SELECT * ,LastKnownContractType = LAST_VALUE(ContractType) IGNORE NULLS OVER (PARTITION BY EmployeeCode ORDER BY DateKey) FROM #SampleData ORDER BY EmployeeCode, DateKey
As you can see, the SQL statement is shorter and perhaps easier to understand.
- Stay tuned for more SQL Server 2022 tips on the MSSQLTips website!
- If you want to try it out yourself, you can create a pre-configured virtual machine in Azure. You can create a free trial here. More info about the SQL Server 2022 release can be found here. Or you can try it out in Azure SQL Database as well.
- You can find more SQL Server 2022 tips in this overview.
- If you want to learn more about window functions in T-SQL, check out this tutorial.
About the author
View all my tips
Article Last Updated: 2022-09-14