How to Use SQL Server Coalesce to Work with NULL Values


By:   |   Updated: 2020-06-25   |   Comments (3)   |   Related: More > T-SQL


Problem

Whenever you are using T-SQL to develop queries, you will encounter situations where you have to deal with NULL values. No matter how hard you try, you cannot eliminate NULLs. You can follow best practices when developing your SQL Server database schema, but you still cannot eliminate all NULL values. The simple fact is that you have to allow NULL for some columns and some queries will return NULL values.

The issue with NULL values is that not planning for them being included in your query results will lead to problems. How can we deal with detecting NULL values and replacing them with a non-NULL value?

Solution

COALESCE is one of the tools you have in SQL Server to work with NULL values. It may not be the first one you think of, but it can be a very good choice. In this tip I will provide examples of how you can use COALESCE to peacefully coexist with NULL values.

Before we dig in to the details on COALESCE, let's first discuss a few things about NULL in SQL Server and the approach taken in this tip.

SQL Server "SET" Options for NULL

There are a couple of options in SQL Server that you set at the database level to determine behavior related to NULL; e.g.:

SET ANSI_NULLS {ON | OFF}

ANSI_NULLS should be set to ON which is the ISO compliant behavior. When this is the case, a SELECT statement that uses WHERE columnname = NULL or columnname <> NULL will always return zero rows even if there are rows with a non-NULL value. You cannot test for equals or not equals NULL. You must test for columnname IS NULL or columnname IS NOT NULL. Sometime in the future ANSI_NULLS will only allow ON and setting it to OFF will be an error.

SET ANSI_PADDING {ON | OFF}

ANSI_PADDING should be set to ON. When set to ON the behavior is as follows:

  • char columns are padded with trailing blanks
  • binary columns are padded with trailing zeros
  • varchar columns do not trim trailing blanks
  • varbinary columns do not trim trailing zeros
SET CONCAT_NULL_YIELDS_NULL {ON | OFF}

CONCAT_NULL_YIELDS_NULL should be set to ON. This causes concatenating anything with a NULL value to result in a NULL value. Sometime in the future CONCAT_NULL_YIELDS_NULL will only allow ON and setting it to OFF will be an error.

SET ANSI_WARNINGS {ON | OFF}

SET ANSI_WARNINGS specifies ISO standard behavior for several error conditions.

As a general rule, the SQL Server Native ODBC driver and the SQL Server Native client OLE DB provider set the above options to ON. More importantly, the above options are required to be ON in order to add an index on a computed column or a view (also requires ARITHABORT=ON, QUOTED_IDENTIFIER=ON and NUMERIC_ROUNDABORT=OFF).

I will assume the above settings are ON for the demo code in this tip.

NULL Values Are Unavoidable

The column definition in the CREATE TABLE statement allows us to specify a column as NULL or NOT NULL.  If we specify NULL in the column definition, then the column can have a value of NULL. When we insert a row, we are not required to specify a value for any column that allows NULL or we can assign NULL as the column value. Alternatively, we can specify NOT NULL in the column definition, and we are required to assign a non-NULL value to the column on INSERT and we cannot assign a NULL value on UPDATE.

However, even if we specify columns in tables as NOT NULL, there will still be instances where we have to deal with NULL values when we write T-SQL code.

In the sections that follow, I will walk through examples where we have to consider NULL values in order to come up with the correct result and show how we can use COALESCE to transform NULL values into non-NULL values.

Sample Schema

I will use the following sample schema in the demo code for each section that follows:

Sample schema

The overall idea of this schema is a really small slice of a data mart with customer data. I will use sample queries from the tables in this schema to demonstrate the various scenarios for using COALESCE.

SQL Server Coalesce Definition

Our starting point in this tip is to provide a precise definition of SQL COALESCE.

What is it?

When you view COALESCE in SQL Docs, you will find it under Transact-SQL (T-SQL) Reference / Language elements / Expressions. Generally speaking, you use the COALESCE expression in the column list of a SELECT statement, although its usage is not limited to the SELECT statement. COALESCE itself takes as arguments a list of 1 to N expressions and returns the value of the first expression that is not NULL.

Why use it?

COALESCE provides a simple way to evaluate multiple expressions and determine the first non-null expression based simply on the order of the expression list. You can specify your own value to use in the event all of your expression parameters evaluate to NULL. There are situations where we require a non-NULL value and COALESCE provides a way for use to assure that we get one.

SQL Server versions supported

2005, 2008, 2008R2, 2012, 2104, 2016, 2017, 2019, Azure

SQL Server Coalesce Usage

The following is from COALESCE in SQL Docs:

Syntax

COALESCE ( expression [ ,...n ] )

Arguments

expression is an expression of any type

The simple example uses COALESCE to return an alternative value for a column that IS NULL. The following T-SQL query returns the [Tier] for a customer or NONE if the customer [Tier] IS NULL:

SELECT [Name], COALESCE([Tier], 'NONE') AS [Tier]
FROM [dbo].[Customer]

The following are the results:

COALESCE on Tier query results

Use Coalesce for String Concatenation

Simple string concatenation can produce unexpected results when one or more of the columns to be concatenated have a NULL value. In the example below, [FirstName] and [LastName] are defined as NOT NULL; [MiddleName] is defined as NULL (i.e. allows NULL).

SELECT [FirstName] + ' ' + [MiddleName] + ' ' + [LastName] AS [FullName]
FROM [dbo].[Contact];

The result from the above query is:

Concatenate with NULL results

This is not what we want. To investigate, here are the results showing the individual columns:

Customer name column results

The issue is caused by the [MiddleName] column with the NULL values and the assumption that the database setting for CONCAT_NULL_YIELDS_NULL is ON. We can use COALESCE to get the correct result in the query below:

SELECT [FirstName] + ' ' + COALESCE([MiddleName], '') + ' ' + [LastName]
FROM [dbo].[Contact];

The result from the above query is:

Concatenate results on customer name columns using COALESCE

If [MiddleName] is NULL COALESCE returns an empty string; otherwise it returns the value of [MiddleName].

Compare SQL Server Coalesce to Case

Based on the usage of COALESCE, you can accomplish the same thing with the CASE statement. In fact COALESCE on SQL Docs notes that the query optimizer actually rewrites COALESCE as a CASE statement. This has an important side-effect which I will discuss after we walk through the sample code below.

We have a customer table that tracks the last date for several events that are of interest to the business users. Various marketing campaigns will select customers based on how long it's been since a particular event occurred.

Here are the customers and their last date columns for the various events:

Customer last activity date query results

The following T-SQL query uses COALESCE to determine the Last Activity Date for each customer:

SELECT 
   [Name]
  ,COALESCE([LastOrderDate]
  ,[LastWebSearchDate]
  ,[LastInquiryDate]
  ,[LastPaymentDate]
  ,[Created]) AS [LastActivityDate]
FROM [dbo].[Customer]

The results from the query are:

Customer last activity date results using COALESCE

Remember that COALESCE returns the first non-NULL value in the argument list. All of the last event date columns allow NULL because when a customer is created and for a period of time after creation, these events have not actually happened. The [Created] column does not allow NULL so it is the last argument. If none of the other events have happened, the [LastActivityDate] returned will be [Created]. By using COALESCE I have specified the order of the columns to check for the first non-NULL value.

Here is the equivalent T-SQL query using CASE:

SELECT 
   [Name]
  ,CASE WHEN [LastOrderDate]     IS NOT NULL THEN [LastOrderDate]
        WHEN [LastWebSearchDate] IS NOT NULL THEN [LastWebSearchDate]
        WHEN [LastInquiryDate]   IS NOT NULL THEN [LastInquiryDate]
        WHEN [LastPaymentDate]   IS NOT NULL THEN [LastPaymentDate]
        ELSE [Created] 
   END AS [LastActivityDate]
FROM [dbo].[Customer];

The results are the same as using COALESCE:

Customer last activity date query results using CASE

I prefer COALESCE over CASE because it's simpler and requires less code. However, CASE allows for implementing more logic for the conditions. Either way you get the same result.

CASE evaluates each expression in the WHEN clause and returns the result expression of the first expression that is TRUE, else it returns the ELSE expression. The side-effect with CASE that I alluded to earlier is that the expression in the CASE that returns TRUE will get evaluated a second time when it is returned. If the expression is a query, under certain circumstances it could return a different result when evaluated the second time.

Compare SQL Server Coalesce and ISNULL

When you first figure out that you need to deal with NULL values, you will likely find the ISNULL function. The definition is simple: Replaces NULL with the specified replacement value.

SELECT [Name], ISNULL([Tier], 'NONE') AS [Tier]
FROM dbo.Customer

The following are the query results (same as using COALESCE earlier):

Query results using ISNULL

In the earlier tip Deciding between COALESCE and ISNULL in SQL Server, Aaron Bertrand provides a very detailed analysis that will help you to decide between ISNULL and COALESCE. I will highlight what I consider to be the most important points from Aaron's tip which will help you to decide which one to use based on particular situations.

The Obvious

ISNULL accepts two arguments: the expression to check for NULL and the expression to return if the first expression is NULL. COALESCE accepts multiple arguments and returns the first one that is not NULL or NULL if every expression evaluates to NULL.

Data Type Precedence

Data Type Precedence comes in to play when combining expressions of different data types; the data type with the lower precedence gets converted to the data type with the higher precedence. ISNULL uses the data type of the first expression while COALESCE uses data type precedence. There is less surprise with ISNULL. To highlight this important difference, Aaron uses this example:

DECLARE @datetime DATETIME
SELECT COALESCE (@datetime, 0);

You might think that since the @datetime variable is clearly NULL, the SELECT would return the integer value of zero. However, because COALESCE uses data type precedence and DATETIME has a higher precedence than INTEGER, the return value is 1900-01-01 00:00:00.000.

Other Subtle Points

Here are some additional points that Aaron made:

  • The performance difference between using ISNULL versus COALESCE is negligible in most cases
  • COALESCE is in the ANSI SQL standard
  • ISNULL is not consistent across Microsoft products; e.g. ISNULL in Access is a function that returns TRUE if the argument is NULL and FALSE otherwise

SQL Coalesce with Computed Columns

A computed column is a column in a table where the column value is determined as the result of an expression. The expression can include other columns in the table as well as logic. By default, a computed column is not stored in the table; the value is determined when you use it; e.g. include it in a SELECT statement. You can override the default behavior and store the value in the table by adding PERSISTED to the definition. If you wanted to index the column, you would add PERSISTED.

I like to use computed columns to define the value of a column based on some business logic that I want to make automatically available to the business users for their queries and reports. This avoids the problem of users having to specify the correct expression in their queries and will hopefully discourage them from coming up with their own definitions which aren't always the same. As the value of the underlying columns used in the computed column change, the value of the computed column is changed automatically.

In a data warehousing scenario, the business users typically spend a lot of time analyzing customers.  As an example, I will implement a computed column named Status with the following logic to determine the value:

  • If the days since the last order is NULL, the Status is NEW
  • If days since the last order is more than 180 days, the Status is CLOSED
  • If days since the last order is more 90 days, the Status is INACTIVE
  • If none of the above is true, the Status is ACTIVE

The first thing I will do is add a column for the DaysSinceLastOrder in order to simplify the logic. The following T-SQL can be used to conditionally add a column to an existing table (i.e. add the column if it does not exist, otherwise do nothing):

IF NOT EXISTS (SELECT 1
               FROM INFORMATION_SCHEMA.COLUMNS
               WHERE TABLE_SCHEMA = 'dbo'
                 AND TABLE_NAME = 'Customer'
                 AND COLUMN_NAME = 'DaysSinceLastOrder')

ALTER TABLE [dbo].[Customer] ADD [DaysSinceLastOrder] INT;

At some point during our nightly ETL process we recalculate the days since the last order. Here is a simple UPDATE statement to do this:

UPDATE [dbo].[Customer]
SET [DaysSinceLastOrder] = DATEDIFF(Day, [LastOrderDate], GETDATE());

Now the Customer table has the following values for DaysSinceLastOrder:

Query results showing days since last order

Now I will add a computed column to the Customer table to implement the Status:

IF NOT EXISTS (SELECT 1 
               FROM INFORMATION_SCHEMA.COLUMNS 
               WHERE TABLE_SCHEMA = 'dbo' 
                 AND TABLE_NAME = 'Customer'
                 AND COLUMN_NAME = 'Status')

ALTER TABLE [dbo].[Customer]
ADD [Status] AS CASE 
   WHEN COALESCE ([DaysSinceLastOrder], -1) = -1 THEN 'NEW' -- customer has not placed an order
   WHEN [DaysSinceLastOrder] > 180 THEN 'CLOSED'
   WHEN [DaysSinceLastOrder] > 90 THEN 'INACTIVE'
   ELSE 'ACTIVE'
END;			

The Status computed column first uses COALESCE on the DaysSinceLastOrder column to return -1 if the column is NULL. Remember that if a customer has not yet placed an order, the DaysSinceLastOrder will be NULL.

Adding a DaysSinceLastOrder column and a Status computed column are examples of the extra steps I like to take to make life easy for the business users and ultimately myself.  I try to provide the results of calculations and also eliminate the need for the business users to have to deal with NULL values. I can't eliminate every case where NULL will be returned but I can try.

Use Coalesce to Pivot Data

Sometimes you want to take multiple rows of data and pivot them into a delimited list contained in a single column. You can use COALESCE to make this easy. This was demonstrated in the tip The Many Uses of COALESCE. I will use an example of states assigned to regions. The following T-SQL can be used to setup some sample data:

CREATE TABLE [dbo].[Region] (
   [Region] VARCHAR(20) NOT NULL
  ,[State] CHAR(2) NOT NULL)

INSERT [dbo].[Region] ([Region], [State])
VALUES 
  ('RED', 'MD')
, ('RED', 'DE')
, ('WHITE', 'NJ')
, ('WHITE', 'NY')
, ('BLUE', 'VA')
, ('BLUE', 'WV')
, ('BLUE', 'NC');

Use the following T-SQL to select the states for a region and put them in a single column delimited by semi-colons:

DECLARE @REGION_LIST VARCHAR(MAX);

SELECT @REGION_LIST = COALESCE(@REGION_LIST, '') + [State] + '; '
FROM [dbo].[Region]
WHERE [Region] = 'BLUE';

SELECT @REGION_LIST AS [RegionList];

The following are the results from the query:

Pivot with COALESCE query results

Note the query concatenates the state from each row returned. It uses COALESCE to avoid concatenating the state to the @REGION_LIST variable when it is NULL which happens with the first row of the results.

Use Coalesce in an UPDATE

For ease of querying the last activity dates for a customer, the last activity date columns are available in the Customer table. The following query shows the customer and the last activity dates:

SELECT 
   [Name]    
  ,[LastOrderDate]  
  ,[LastWebSearchDate] 
  ,[LastInquiryDate] 
  ,[LastPaymentDate] 
FROM [dbo].[Customer];

The query returns the following results:

Customer last activity dates all set to NULL

As you can see all of the last activity dates are NULL.

In our nightly ETL process we want to update these dates based on the latest transactions. The ETL process drops and recreates the CustomerLastActivity table with a single row for the Customer, activity type and the latest activity date as shown below:

Customer last activity dates

We want to pivot the data into a shape that we can use to easily update the Customer table. The following are the query results with the data in the shape that we want:

Customer last activity dates after pivot and ready for update

Here is the T-SQL statement that we can use to pivot the data and perform the UPDATE of the Customer table:

;WITH CTE_ACTIVITY AS (
SELECT
   [CustomerKey]    
  ,[ORDER] AS [LastOrderDate]
  ,[SEARCH] AS [LastWebSearchDate]
  ,[INQUIRY] AS [LastInquiryDate]
  ,[PAYMENT] AS [LastPaymentDate]
FROM (
  SELECT 
     [CustomerKey]
    ,[ActivityType]
    ,[ActivityDate]
  FROM [dbo].[CustomerLastActivity]
 ) a
 PIVOT (
  MAX([ActivityDate])
  FOR [ActivityType] IN (
     [ORDER]
   , [SEARCH]
   , [INQUIRY]
   , [PAYMENT]
  ) 
 ) pvt
)

UPDATE c
SET[LastOrderDate] =    COALESCE(a.[LastOrderDate], c.[LastOrderDate])
, [LastWebSearchDate] = COALESCE(a.[LastWebSearchDate], c.[LastWebSearchDate])
, [LastInquiryDate] =   COALESCE(a.[LastInquiryDate], c.[LastInquiryDate])
, [LastPaymentDate] =   COALESCE(a.[LastPaymentDate], c.[LastPaymentDate])
FROM [dbo].[Customer] c
JOIN [CTE_ACTIVITY] a
	ON a.[CustomerKey] = c.[CustomerKey];

The following are the main points for the above T-SQL statement:

  • The SELECT statement inside the CTE_ACTIVITY common table expression produces the results above where we have the CustomerKey and the four last activity dates
  • The UPDATE statement joins the Customer table and the CTE_ACTIVITY and updates the last activity dates in the Customer table using the CTE_ACTIVITY results
  • The CTE_ACTIVITY results may have NULL values for one or more of the last activity dates
  • I use COALESCE to check the last activity date in the CTE_ACTIVITY results and update the column in the Customer table if the value in the CTE_RESULTS is not NULL; otherwise I update the value in the Customer table to its current value

After executing the above T-SQL the Customer table rows have the following values for the last activity dates:

Customer last activity date query results after update

Use Coalesce with Incremental Update

Incremental update is a classic pattern where you want to update data in a target table with the data from a source table that has changed. SQL Server provides some built-in capabilities to perform this task such as Change Tracking, Change Data Capture and even Replication. There are scenarios where you are not able to leverage these built-in solutions and you have to roll your own. It may be that the source database is a database snapshot or the source database is used by a third-party application and the vendor will not provide any support if you're using one of the built-in change mechanisms.

In the custom incremental update, you keep track of the maximum last modified date that you loaded in the target from the source; e.g. store it in a table. Each time you do the incremental update, you retrieve the rows from the source where the last modified date is greater than what you last loaded. Another approach that may work is to query the maximum last modified date in the target table and retrieve the rows in the source table where the last modified date is greater that what's in the target table. This sounds simple and straight-forward, so what's the issue?

The issue is one that doesn't happen often, but when it does and you're not looking for it, it's a huge problem. The pseudo-code for the incremental update starts with something like this:

DECLARE @LAST_MODIFIED DATETIME;

SELECT @LAST_MODIFIED = MAX(LastModified)
FROM <target_tablename>;

INSERT <target_tablename> (<columnlist>)
SELECT <column_list>
FROM <source_tablename>
WHERE LastModified > @LAST_MODIFIED;

This is extremely simplified but it illustrates the point. You retrieve the appropriate value for the @LAST_MODIFIED variable from the <target_table> and you INSERT the rows modified in the source into the target since the last time you ran this.

What happens if the <target_tablename> is empty? The answer is that @LAST_MODIFIED is NULL and you do not select any rows from <source_tablename> because there is no row where LastModified > NULL is true. Now it's not likely that the <target_tablename> will be empty but it probably will be at least once.

The simple solution is you have to check if @LAST_MODIFIED IS NULL and do something. Let's take a really simple case. You don't want to retrieve any data from <source_tablename> that is more than 5 years old. Add the following T-SQL using COALESCE after you get the value of @LAST_MODIFIED:

SET @LAST_MODIFIED = COALESCE(@LAST_MODIFIED, DATEADD(year, -5, GETDATE()))
Next Steps


Last Updated: 2020-06-25


get scripts

next tip button



About the author
MSSQLTips author Ray Barley Ray Barley is a Principal Architect at IT Resource Partners and a MSSQLTips.com BI Expert.

View all my tips





Comments For This Article




Thursday, June 25, 2020 - 2:17:57 PM - Gerald W Britton Back To Top (86045)

CONCAT might be a better choice for your example building up a customer name string


Thursday, June 25, 2020 - 1:11:10 PM - Ray Barley Back To Top (86044)

COALESCE isn't necessary but it's the topic for the tip.


Thursday, June 25, 2020 - 4:57:59 AM - Andy Back To Top (86040)
CASE 
   WHEN COALESCE ([DaysSinceLastOrder], -1) = -1 THEN 'NEW'
   WHEN [DaysSinceLastOrder] > 180 THEN 'CLOSED'
   WHEN [DaysSinceLastOrder] > 90 THEN 'INACTIVE'
   ELSE 'ACTIVE'
END;

Why is COALESCE necessary here? Why not just use WHEN [DaysSinceLastOrder] IS NULL THEN 'NEW'



download





Recommended Reading

SQL Server Cursor Example

Using MERGE in SQL Server to insert, update and delete at the same time

Rolling up multiple rows into a single row and column for SQL Server data

How to use @@ROWCOUNT in SQL Server

SQL Server Loop through Table Rows without Cursor














get free sql tips
agree to terms