Some Tricky Situations When Working with SQL Server NULLs

By: Sergey Gigoyan | Comments | Related: > Debugging

Problem

Any SQL Server database developer knows that NULL does not mean 0 or an empty string, it means that the value is unknown. The result of a straightforward comparison with NULL is unknown, so special instructions are required (IS NULL, COALESCE, NULLIF, etc.) to get the desired correct result in our T-SQL code. However, there are other (less popular) subtleties related to NULL, which must be understood to write bug free code. In this tip we will discuss some of them.

Solution

We will discuss some tricky points related to NULLs in SQL Server T-SQL code where we should be more careful to avoid mistakes.

NULLs in T-SQL with a NOT IN Clause

Let's start with the following example. We have two tables and a query which uses these tables. The second table contains NULLs and it is used in the subquery with a "NOT IN" clause. What will be the result? Some may mistakenly think that the result will be {1,5,3}, because the second table misses only these values.

--Creating the first temporary table for testing
IF(OBJECT_ID('tempdb..#TestTableA') IS NOT NULL)
	DROP TABLE #TestTableA

CREATE TABLE #TestTableA
(
   ID INT
)

--Creating the second temporary table for testing
IF(OBJECT_ID('tempdb..#TestTableB') IS NOT NULL)
	DROP TABLE #TestTableB

CREATE TABLE #TestTableB
(
   ID INT
)

--Inserting data 
INSERT INTO #TestTableA(ID) VALUES (1),(2),(5),(9),(3)
INSERT INTO #TestTableB(ID) VALUES (2),(7),(NULL),(9)

--Checking the result of the query
SELECT ID
FROM #TestTableA
WHERE ID NOT IN (SELECT ID FROM #TestTableB)

However, we will receive an empty result set. The rationale is the following - as we have a NULL (unknown) value in the second table, we can't be sure that values from the first table are not contained in the second, so SQL Server produces an empty result set.

So, be careful when using NOT IN in a subquery that returns data from a nullable column.

How to work with NULLs and Aggregate Functions

Now let's understand how NULL values affect the result of aggregate functions. Let's start with the COUNT() function. Is there any difference between COUNT(*) and COUNT('column_name')? When 'coulumn_name' is not a nullable column, then both return the count of table rows, but when the 'column_name' is nullable and contains NULLs, then NULLs are ignored and COUNT('column_name') returns the count of the rows whose 'column_name' field's value is not NULL.

--Creating temporary table for testing
IF(OBJECT_ID('tempdb..#TestTable') IS NOT NULL)
	DROP TABLE #TestTable
GO

CREATE TABLE #TestTable ( ID INT )

--Inserting data for testing 
INSERT INTO #TestTable(ID) VALUES (1),(2),(NULL),(4)

SELECT COUNT(ID) AS 'The result of Count(ID)' FROM #TestTable

SELECT COUNT(*) AS 'The result of Count(*)' FROM #TestTable 

TRUNCATE TABLE #TestTable

--Inserting only NULLs 
INSERT INTO #TestTable(ID) VALUES (NULL),(NULL),(NULL),(NULL)

SELECT COUNT(ID) AS 'The result of Count(ID) when ID column contains only NULL values' FROM #TestTable

SELECT COUNT(*) AS 'The result of Count(*) when ID column contains only NULL values' FROM #TestTable 

--Min , Max, AVG
TRUNCATE TABLE #TestTable

INSERT INTO #TestTable(ID) VALUES (1),(2),(NULL),(4),(5)

SELECT AVG(ID) AS Average FROM #TestTable

The code above illustrates this behavior. As we can see the NULLs are ignored. Moreover, when a column contains only NULL values, this behavior is not changed: COUNT(*) returns count of all rows and COUNT(ID) returns 0. Aggregate functions MIN(), MAX(), AVG() ignore NULLs as well.

Assigning values to variables and NULLs

Is there any difference between using SET and SELECT when assigning a value from a query to a variable in T-SQL? The following example answers this question:

DECLARE @var INT= 0

--Creating temporary table for testing
IF(OBJECT_ID('tempdb..#TestTable') IS NOT NULL)
	DROP TABLE #TestTable

CREATE TABLE #TestTable ( ID INT )

--Inserting data for testing 
INSERT INTO #TestTable(ID) VALUES (1),(2),(4)

--Using SELECT
SELECT @var=ID FROM #TestTable WHERE ID=3

SELECT @var AS 'Variable value after SELECT'

--Using SET
SET @var = ( SELECT ID FROM #TestTable WHERE ID=3 )

SELECT @var AS 'Variable value after SET'

The result shows that when the query returns no result, SET assigns NULL to the variable, but SELECT does not assign any value, so the value of the variable remains the same - @var=0. One more thing about initialization of variables: when a variable has been declared and has not been initialized, it's value is NULL, and any operation with this variable makes the result NULL:

DECLARE @var1 INT, @var2 INT=7

SET @var2 = @var1 + @var2

SELECT @var2 As Variable2

NULLs and GROUP BY Statements in T-SQL

When grouping with a column in a GROUP BY statement that contains NULLs, they will be put into one group in your result set:

IF(OBJECT_ID('tempdb..#TestTable') IS NOT NULL)
	DROP TABLE #TestTable
GO 

CREATE TABLE #TestTable ( ID INT, Value INT )

INSERT INTO #TestTable(ID, Value) VALUES (1, 10),(2, 20),(NULL, 70),(NULL, 90),(4, 50),(5, 60)

SELECT ID, AVG(Value) AS Average, MAX(Value) As Maximum 
FROM #TestTable 
GROUP BY ID

As we can see all aggregations for "NULL" group are counted in the same way as for other groups:

NULLs and < > (!=) comparison in SQL Server

Finally let's answer this question: What will be the result when we are checking the 'non-equality' and the value of one of the operands (or both) is NULL? Often developers mistakenly think that the result will be 'true', because NULL is not equal to any value. However the result is 'false', because the comparison with an unknown value always returns false (but this behavior changes when SET ANSI_NULLS is OFF). Let's see an example below:

DECLARE @a INT=1,
        @b INT

SELECT @a AS a, @b AS b

IF(@a <> @b)
	SELECT 1 AS result
ELSE
	SELECT 0 AS result

The result is 0 which means that the result of the comparison is 'false':

Conclusion

As we see, working with NULLs sometimes can be confusing. Therefore it's crucial for the developer to be aware of these subtleties.

Next Steps

Check out these additional resources:

About the author

Sergey Gigoyan is a database professional with more than 10 years of experience, with a focus on database design, development, performance tuning, optimization, high availability, BI and DW design.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips