solving sql server problems for millions of dbas and developers since 2006



SQL Server DBA Tips SQL Server Developer Tips SQL Server Business Intelligence Tips SQL Server Career Tips SQL Server Tip Categories SQL Server Tutorials SQL Server Webcasts SQL Server Whitepapers SQL Server Tools SQL Server Questions and Answers MSSQLTips Authors About MSSQLTips SQL Server User Groups MSSLQTips Giveaways MSSQLTips Advertising Options

MSSQLTips Facebook Page MSSQLTips LinkedIn Page MSSQLTips RSS Feed MSSQLTips Twitter Page MSSQLTips Google+ Page








Join SQL Server tables where columns include NULL values

By: | Read Comments (4) | Print

Greg is the President of Edgewood Solutions and a co-founder of MSSQLTips.com.

Related Tips: 1 | 2 | 3 | 4 | More

Problem
When building database tables you are faced with the decision of whether to allow NULL values or to not allow NULL values in your columns.  By default SQL Server sets the column value to allow NULL values when creating new tables, unless other options are set.  This is not necessarily a bad thing, but dealing with NULL values especially when joining tables can become a challenge.  Let's take a look at this issue and how this can be resolved.

Solution
In most cases your columns in your tables probably allow NULL values unless you consciously changed them when you created a table or changed the default setting for your database by using the SET ANSI_NULL_DEFAULT OFF and SET ANSI_NULL_DFLT_ON OFF option.  These settings change the behavior of how a table is created if you do not specify NULL or NOT NULL when creating a table.

The following examples show you how a table would get created differently with each of these options, but by default SQL Server sets columns to allow nulls.

Example 1: When these two options are OFF the default nullable value is set to no, meaning no null values are acceptable.

ALTER DATABASE AdventureWorks SET ANSI_NULL_DEFAULT OFF;
GO
SET ANSI_NULL_DFLT_ON OFF;
GO
CREATE TABLE Table1 (a TINYINT);
GO
sp_help Table1
GO

The screen shot below shows that the column does not allow NULLs.


Example 2: When these two options are ON the default nullable value is set to yes, meaning null values are acceptable.

ALTER DATABASE AdventureWorks SET ANSI_NULL_DEFAULT ON;
GO
SET ANSI_NULL_DFLT_ON ON;
GO
CREATE TABLE Table2 (a TINYINT);
GO
sp_help Table2
GO

The screen shot below shows that the this column does allow NULLs.


So now that we have that covered let's get down to the issue at hand.  So based on how tables and columns are setup and how data is stored you may run into an issue where you have data stored in tables that have NULL values and you need to join on these values.  Sounds pretty easy, but let's see what happens when this occurs.

Setup Test

First let's create two tables and load some sample data.  (This is not a very practical example, but it does help illustrate the issue.)

CREATE TABLE [dbo].[CarModels](
[Make] [varchar](50) NULL,
[Model] [varchar](50) NULL,
[Trim] [varchar](50) NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[Orders](
[Make] [varchar](50) NULL,
[Model] [varchar](50) NULL,
[Trim] [varchar](50) NULL
) ON [PRIMARY]
GO

INSERT INTO dbo.Orders VALUES ('Nissan','Altima','2-door 2.5 S Coupe')
INSERT INTO dbo.Orders VALUES ('Nissan','Altima','4-door 3.5 SE Sedan')
INSERT INTO dbo.Orders VALUES ('Nissan','Altima','')
INSERT INTO dbo.Orders VALUES ('Nissan','Altima',NULL)

INSERT INTO dbo.CarModels VALUES ('Nissan','Altima','')
INSERT INTO dbo.CarModels VALUES ('Nissan','Altima','2-door 2.5 S Coupe')
INSERT INTO dbo.CarModels VALUES ('Nissan','Altima','2-door 3.5 SE Coupe')
INSERT INTO dbo.CarModels VALUES ('Nissan','Altima','4-door 2.5 S Sedan')
INSERT INTO dbo.CarModels VALUES ('Nissan','Altima','4-door 3.5 SE Sedan')
INSERT INTO dbo.CarModels VALUES ('Nissan','Altima','4-door 3.5 SL Sedan')
INSERT INTO dbo.CarModels VALUES ('Nissan','Altima','4-door HYBRID Sedan')
INSERT INTO dbo.CarModels VALUES ('Nissan','Altima',NULL)


Selecting data

The first thing we will do is show the data in these two tables.  The idea here is that we want to join the tables based on Make, Model and Trim.  For most of the records there is some value, but there are a few records where the Trim value is NULL.

SELECT *
FROM dbo.Orders a

SELECT *
FROM dbo.CarModels b

The first query does a straight join of these tables based on all three columns.

SELECT *
FROM dbo.Orders a
INNER JOIN dbo.CarModels b
ON a.Make = b.Make
AND a.Model = b.Model
AND a.Trim = b.Trim

The result of the above query matches only three of the four records from the Orders table. The records that have a NULL value for Trim do not show up in the resultset.

This next example introduces the ISNULL function in the join clause.  The ISNULL function takes two parameters, the first is the value to interrogate to see if it is NULL and it if is NULL the second parameter specifies what the value should be converted to.  So in this example if any value is NULL it will be converted to a space ''.

SELECT *
FROM dbo.Orders a
INNER JOIN dbo.CarModels b
ON a.Make = b.Make
AND a.Model = b.Model
AND isnull(a.Trim,'') = isnull(b.Trim,'')

In the results below we can see that we now have more rows, but one of the issues we are facing is that there are also values of space in the table and therefore we are picking up additional joins that we do not want.  The rows that should not be included are highlighted below.

To take this a step further we are using the ISNULL function again, but this time we are converting the NULL values to '999999'.  This is a value that we know does not exist in our table and therefore will not cause any unwanted joins.

SELECT *
FROM dbo.Orders a
INNER JOIN dbo.CarModels b
ON a.Make = b.Make
AND a.Model = b.Model
AND isnull(a.Trim,'999999') = isnull(b.Trim,'999999')

Here is our final result with the four rows we are expecting.


Summary

As we have seen from the above examples joining NULL values does not work.  Even though you have two NULL values SQL Server does not treat these as the same value.  Internally a value of NULL is an unknown value and therefore SQL Server does not equate an unknown value being equal to another unknown value.  Another design decision is to not allow NULL values and therefore you will not run into these issues.

Next Steps

  • When dealing with NULL values don't forget about the ISNULL function
  • Be aware when creating tables whether you want to allow NULL values or not.  By allowing NULL values you introduce other issues that you may need to face at a later time.
  • Although the ISNULL function is handy, if you are doing large joins having this function in the join clause will slow down the query. A lot of this will also depend on the indexes you have setup and how the indexes are used for the join.  In general it is not a good idea to use functions in your joins or on the left side of your WHERE clause, because SQL Server needs to interrogate each value and therefore may negate the use of the index.   Although in some cases there are no other options, so you need to do what you need to do.


Related Tips: 1 | 2 | 3 | 4 | More | Become a paid author


Last Update: 3/4/2008

Share: Share 






Comments and Feedback:

Wednesday, March 05, 2008 - 8:45:05 AM - jaybutler Read The Tip

The final version of the query is a performance problem waiting to happen. Using 'expression operator expression' constructs will not allow SQL Server's optimizer to use and indexes on the Trim column. The condition has to be restated as 'column operator column' or 'column operator expression.

That would turn your:

...
AND a.Model = b.Model
AND isnull(a.Trim,'999999') = isnull(b.Trim,'999999')

into something like this:

...
AND a.Model = b.Model
AND ( a.Trim = b.Trim OR ( a.Trim IS NULL AND b.Trim IS NULL ) )

This also avoids the potential problem of choosing a value you thought to be out of range that eventually is used.

Although the potential performance problem was mentioned in the notes at the end of the column, the article itself should have ended with the best SQL rather than the most convenient.


Friday, April 04, 2008 - 7:46:18 AM - admin Read The Tip

jaybutler,

Thank you for the feedback and the alternative approach.

Thank you,
The MSSQLTips.com Team


Thursday, October 14, 2010 - 8:39:35 AM - Joe Celko Read The Tip
I also think this is a bad idea.  What we need is the AQNSI/ISO Standard SQL:2003  "IS DISTINCT FROM" comparison operator  instead of compiler modifications on the fly.


Thursday, October 14, 2010 - 9:05:37 AM - Greg Robidoux Read The Tip
I think overall, joining on NULL values is a bad idea.  The better option is to not allow NULL values and this wouldn't be an issue in the first place.  Life would be much easier if we never took shortcuts.



Post a Comment or Question

Keep it clean and stay on the subject or we may delete your comment.
Your email address is not published. Required fields are marked with an asterisk (*)

*Name   *Email   Notify for updates
Comments
*Enter Code refresh code


 
Sponsor Information
"Amazing, Amazing, Amazing! SQL doctor is truly one of the most powerful tools I have seen."

Time-strapped database professionals use SQL Monitor to look after their servers. Try it online.

SQL Servers keeping you up at night? Contact expert SQL Server consultants for a Health Check.

Free Trial: Get Proactive Insight with Spotlight® for SQL Server Enterprise.

Join the over million SQL Server Professionals who get their issues resolved daily.

Learn SQL Server 2012, Performance Tuning, Development, Administration, Replication and more - free webcasts


Copyright (c) 2006-2012 Edgewood Solutions, LLC All rights reserved
privacy | disclaimer | copyright | advertise | about
authors | contribute | feedback | giveaways | user groups
Some names and products listed are the registered trademarks of their respective owners.


Edgewood Solutions LLC | MSSharePointTips.com | MSSQLTips.com