Using bit columns with NULLs when three options exist in SQL Server

By:   |   Comments (19)   |   Related: > Data Types


Problem

Frequently surveys ask yes/no questions and use a bit datatype. What are the effects of allowing this column to be NULL in a SQL Server table?  Check out this tip to learn more.

Solution

The bit's ability to store a NULL gives it a fundamental third value needing no additional storage space. Using NULL to denote the survey question was read, but not answered is logical. Like 1 and 0, the value of NULL must be used consistently. The existence of a record in survey results differentiates "I saw the question and decided to skip it" from "the system never asked me that question".

Proper Data Retrieval with NULL Columns

NULL is a funny animal in the SQL kingdom; it uses a different syntax in the WHERE clause for queries. Below are two examples.

NULL is not equal to anything, not '' nor even NULL. This query will always return 0 records.

select count(question1) as count_na
from survey_answers
where question1 = NULL

The correct syntax is:

select count(question1) as count_na
from survey_answers
where question1 IS NULL

As an alternative, the ISNULL(<column>, value) function will supply the value of the second parameter if the value of the column in the first parameter is NULL. Keep in mind that the supplied value must be of the same datatype. If using ISNULL with a bit and needing a third non-null value, use something like ISNULL(CAST(<column> as tinyint), 2). The choice of the value i.e. 2 and the data type i.e. tinyint can be changed to best suit your situation. ISNULL is handy for sorting NULLs where they appear best. Supplying -1 or the highest value + 1 will left or right justify rows with NULLs. Supplying a value for ISNULL that's between other columns' values places them in the middle. Outputting a message like "None found" is a common use too.

NULL Storage Space Considerations in SQL Server

Bits ties tinyint and char in using one byte. Why a byte when a bit is an eighth of one? If a table has one bit, SQL allocates an entire byte. As other bits are defined, they use the other bits; up to eight bit fields total. When creating a ninth bit, a second byte is used, and creating a seventeenth will use a the first bit in a third byte.

Aggregating Data

Here is a sample data model for our survey application:

Survey ERD

Sample Survey ERD

Using the schema above, here is a query to count the responses by question:

select coalesce(yesses.resp_ques_id, nos.resp_ques_id, nas.resp_ques_id) ques_id, 
 isnull(yesses.yes, 0) yes, 
 isnull(nos.no, 0) [no], 
 isnull(nas.na, 0) na
from
 (select resp_ques_id, count(1) as yes 
  from response 
  where resp_response = 1 
  group by resp_ques_id
 ) yesses 
 
 left outer join
 (select resp_ques_id, count(1) as no 
  from response 
  where resp_response = 0 
  group by resp_ques_id
 ) nos on yesses.resp_ques_id = nos.resp_ques_id 
 
 left outer join
 (select resp_ques_id, count(1) as na 
  from response 
  where resp_response is null 
  group by resp_ques_id
 ) nas on nas.resp_ques_id = nos.resp_ques_id

Here are the sample data:

Survey Aggregate Query Output

This query will scale for any number of questions in a survey. To filter by a survey, simply add a join in each sub query from response to question with an AND ques_surv_id = x.

Next Steps
  • For any situation with three possible values, the bit datatype can be utilized by leveraging NULL as a logical value.
  • If "not answered" and "not applicable" need stored along with "yes" and "no", a char or tinyint becomes necessary.
  • Detecting NULLs requires a special syntax in the WHERE clause: IS [NOT] NULL.
  • ISNULL() can be used to make output more readable by replacing a NULL value with a logical business term.
  • In a high-volume scenario, having multiple bits in place of char or tinyint will make a difference.
  • Counting bit fields' values utilizes three sub queries, one for each value.
  • Become familiar with NULL in T-SQL. Review IS [NOT] NULL and ISNULL().
  • Decide if using bit fields in your system would save disk space while maintaining readability.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Brent Shaub Brent Shaub has been creating, renovating and administering databases since 1998.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Tuesday, February 21, 2012 - 11:19:56 AM - Alex B Back To Top (16113)

From MSDN: In a future version of SQL Server, ANSI_NULLS will always be ON and any applications that explicitly set the option to OFF will generate an error. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.

So back to ISNULL

with text types

ISNULL(,'A')!=ISNULL(,'B') 

or with numeric types or bit type (no need to cast tinyint)

ISNULL(,1)!=ISNULL(,0)


Friday, January 6, 2012 - 9:25:32 AM - ScottPletcher Back To Top (15535)

As to disk space requirements for making a column NULLable, I thought SQL Server now reserved the NULL bit for every column, regardless of whether the column was (initially) specified as NULLable or not.  Maybe they figured that since it only cost one bit per column, it was more effective overall to assign the bit so that making the column NULLable later did not require adjusting every data page to add the NULL bit for that column.


Friday, January 6, 2012 - 9:13:46 AM - GrumpyOldDBA Back To Top (15534)

Sometimes I really worry! ANSI NULLs are required for a number of features to work, indexed views, filtered indexes, partitioning  and so on. The whole concept of nullable bits is contrary to every aspect of data typing and integrity, sorry but I absolutely stand against using either nullable bits or using nulls to indicate a value. To further clarify nullable columns create wider indexes ( when indexed ) a very good reason to make sure that nulls are never applied to columns which are indexed. 


Friday, January 6, 2012 - 7:18:19 AM - pipTheGeek Back To Top (15531)

@Brent, it isn't that there would be no space saving, using the nullable bit field uses DOUBLE the space of a non-nullable tinyint becuase you have a byte allocated to store the null bitmap AS WELL AS the byte allocated to store the actual bit value.

*Unless you are using sparse columns, but they would not normally be used for only a few nullable columns.


Friday, January 6, 2012 - 3:55:44 AM - Brent Shaub Back To Top (15530)

Kaki, your grammar and English read like a native, so great job on that.

Pip, yes, I later learnt, as you've stated, that multiple bit fields for a record make use of the byte allocated by the first bit.  In the example provided with just one, there would indeed be no space saving for that lone bit compared to a tinyint.

Scott Pletcher, (thanks for the last name to separate from the other Scott), I agree it would be something unexpected for other developers to learn to use NULL as a legit value.

While I don't mind standing out in a crowd, I have come to believe that using a tinyint is a better solution.  I appreciate everyone's contributions and backing up the underlying reasoning.

Alex, had to remind myself of SET ANSI_NULLS OFF.  The one great thing about a query after SET ANSI_NULLS OFF is that

WHERE ? IN (1, 2, 3, NULL).

will work as if NULL is just another value.  Thanks for that.


Thursday, January 5, 2012 - 7:09:42 PM - Alex Fekken Back To Top (15528)

I agree with the previous posters that it is wrong to use NULL as representing a meaningfull value. In my opinion NULL is a state ("don't know if there is a value or whether a value would even be meaningfull") rather than a value and this is consistent with the way NULL are treated in expressions.

Fortunately nobody (so far) is suggesting to switch ANSI_NULLS OFF...


Thursday, January 5, 2012 - 9:44:57 AM - ScottPletcher Back To Top (15524)

An interesting article with some interesting points. 

I, too, however, would not want to assign a specific meaning, beyond "unknown", to NULL; and would also rather expand the column from a bit to, say, a tinyint.  I think it would be a big difference from what people are used to, so it would take designers and developers some time to get used to, to say the least.


Thursday, January 5, 2012 - 5:39:54 AM - pipTheGeek Back To Top (15523)

You say in the post "The bit's ability to store a NULL gives it a fundamental third value needing no additional storage space.".  This isn't true.  When the column is nullable it uses a bit from the "NULL bitmap" for the row. (Unless the table is sparse)

Your example schema where resp_response is the only nullable column in the table, means that one byte is used for the value of this field, another byte is used for NULL / NOT NULL (the nullable bitmap is allocated in whole bytes, so between 1 and 8 nullable fields uses one byte, 9 nullable fields would use 2 bytes etc).  You would save one byte of storage per record if you changed from a nullable bit to a non nullable tinyint.

So saving disk space is unlikely to happen if using nullable bit columns.


Thursday, January 5, 2012 - 3:23:27 AM - Kaki Back To Top (15522)

Hi Scott!

As I see it there are two problems with your design. First you allow NULL to implicity have a value. You specify the value in your text as "maybe skipped". Also in surveys there is an option like "I don't want to answer this question".

My experiense is that if you find yourself in the situation where NULL has a meaning you're doing something wrong. It can be poor design, bad understanding of requirements or anyting else. In this case I wold propably have a lookup table with values "1:Yes, 2:No, 3:Skipped, 4:Don't want to answer". This fast solution would also make it easy to insert the answer "Maybe" to the question "Do you want to whatever?".

When I see a NULL i often read it as "There is absolutely no way to make any decisions on this value.", although it's not an absolute rule. For example "when was this invoice sent?" and the Sent column is NULL the odds that it has not been sent is pretty high (but i would not rely on that)

Other than that it was a good article.

// Kaki

PS. English is not my first language. Sorry for any grammar errors...


Tuesday, December 20, 2011 - 12:27:19 AM - ken ambrose Back To Top (15396)

re: I'm most interested in the last statement about regretting NULLs.  Certainly many fields in tables will be NULL when their values are unknown.  Are you saying not to have any NULL fields, or that by using NULLs intentionally, the logic is less clear than using a non-null value?  If the latter, consider the main premise of the article being about saving disk space by using the three possible values of a bit field.  I know a man who saved space in a bowling league Access database by storing scores (range 0-300) in a byte.  How?  By adding 45 when displaying and subtracting 45 when entering.  That saved a byte per score.  Over three games per person per week for the nine months of just for one league, that added up.  If disk space is not an issue, then use tinyint.

I am saying not to have any null fields. But you will say, sometimes some attribute may be unknown but we want to create a record of what we do know.  Fine, use a value to indicate the data is currently unknown. Or even better, move nullable columns into relations.  When you have the the data for a column, add it to the relation.  When no data, don't create a row in the relation.  Voila', no nulls needed.  Ever.

It might be worth thinking about why it is so tempting to try to use nulls to represent some consistent meaning in databases. Note that I said "tempting", I did not say I think it is a good idea.

We actually do not have a true RDBMS implementation from any current vendor - if I understand Codd.  And that fact is the main reason nulls are such a problem using the current RDBMS available.  You see, RDBMS were supposed to be able to support "domains". Domains are definitions for allowable values, along with the operations allowable on those values.

If current vendors supported this feature, we could make perfect sense of a domain defined as "all valid dates in the gregorian calendar, or the characters "n/a.  Allow date subtraction and addition on all the valid dates, and simply return "n/a" if any reference is made to a column containing "n/a".

 

But as we all know, domains are not really supported in current implementations.  Only pieces of the definition.  Hence these recurring hassles trying to use them to represent business facts in an effiicent and reliable manner...

 


Monday, December 19, 2011 - 2:35:41 PM - Jeremy Kadlec Back To Top (15395)

Scott C.,

Thank you for the additional feedback.

Thank you,
Jeremy Kadlec


Monday, December 19, 2011 - 8:18:34 AM - Scott C Back To Top (15390)

Let me try a fuller explanation.  The example query showed a literal NULL as part of the NOT IN () list to illustrate the issue, hopefully that is not something any sane reader of this forum would actually do.  So the problem I was trying to illustrate was using NOT IN with a subquery that might include a NULL.  The nullablility of the search field is beside the point.  I posted what I thought was an absurd example, I never considered someone would try to fix it.

I didn't say your version with the ISNULL function doesn't work, but I was trying to point out that it is not a desirable change.  Your version works not because of the ISNULL, but because you also removed the NULL value from the list.  The list was the point of the example, not the search field.  If a query has a filter clause of WHERE xyz NOT IN (list), and the list (whether literal costants or a subquery) contains a NULL, the query will return no records.  Changing the search field to ISNULL(xyz, 'abc') (or any other expression) will have no effect on the results, but will probably cause the query optimizer to ignore any index on xyz and do a table scan.  If the list has no nulls, then the WHERE xyz NOT IN (list) filter will exclude any records where xyz is NULL without the help of ISNULL.  So in both cases the ISNULL function adds nothing.

The only reason to use ISNULL would be if you wanted to turn NULL values into something that would satisfy NOT IN, so the records with xyz = NULL would be included in the results.  This could also be written WHERE xyz IS NULL or xyz NOT IN (list).  But that was not the original example query.


Wednesday, December 14, 2011 - 7:03:03 PM - Brent Shaub Back To Top (15362)

Hi Scott,

I disagree that the query with ISNULL() does not work.  It does.  When executed, the resultset will be all tables.  Whether or not it's indexable is not my area of speciality, and I'll dig into it more.  It's useful to know.

Based on the example in your latest post, I reckon the first query's IN clause started down a different path as the latter mentions a subquery rather than explicit values.  It would be the subquery, then, that needs the ISNULL to return any static value to avoid the pitfall you've mentioned here.

Another option is AND field IS NOT NULL (or OR field IS NOT NULL) along with the IN depending on the query's logic.

These details aside, thanks for pointing out that NULL in the resultset of an IN can provide unexpected results.


Tuesday, December 13, 2011 - 8:21:54 AM - Scott C Back To Top (15345)

In your proposed change, the ISNULL function is pointless and will only result in making the WHERE clause non-SARGable (i.e. unable to use indexes).  "WHERE NULL NOT IN (subquery)" will never be true, whether the subquery includes NULL or not, so there is no reason to add functions to handle it.  The obvious way to correct the query is to remove the NULL literal from the IN list.

The point of my example was to watch out for "WHERE ? NOT IN (SELECT ColumnWithNulls FROM Anywhere)".  If the subquery includes at least one NULL value, the WHERE will exclude all records from any query.  It doesn't matter what the ? represents.

When using "WHERE NOT IN (subquery)" the subquery should include "WHERE x IS NOT NULL" unless you're absolutely certain that the subquery will never include a NULL,


Monday, December 12, 2011 - 7:43:12 PM - Brent Shaub Back To Top (15342)
SELECT * FROM sys.tables WHERE name NOT IN ('Bippety', 'Boppety', 'Boo', NULL)

This is a human-readable expression that will result correctly with two small changes:

SELECT * FROM sys.tables WHERE ISNULL(name, 'NULLNAME') NOT IN ('Bippety', 'Boppety', 'Boo', 'NULLNAME')

The example using sys.tables where name is a required field when changed to a user table with a NULLable column will be easier to confirm the logic is to exclude certain values and those that are NULL.

I appreciate seeing three ways to process the results in one scan.  The added bonus of BITFIELD+0 doing an implicit conversion sure makes for less typing over CAST.

Thanks for your input.  One of the great benefits of this forum is seeing better ways to achieve things.


Monday, December 12, 2011 - 8:48:03 AM - Scott C Back To Top (15336)
It is valuable to know how to use NULLs correctly, although most experts will take issue with using NULL to represent a specific value.  I won't pile on, you can get away with it occasionally but I wouldn't make a habit of it.

Any tutorial on dealing with NULL values is incomplete without mentioning "WHERE value NOT IN (list with NULLs)".  Here is an example:

SELECT * FROM sys.tables WHERE name NOT IN ('Bippety', 'Boppety', 'Boo', NULL)

This query returns 0 records in any database (SQL 2005 or later, assuming SET ANSI_NULLS ON), so either those three magic words represent every possible table name or there is something funny going on with the NULL.

One aspect of your article I do have to complain about is writing a query with three separate table scans to tally the answers.  Here are three different ways to write it with one table scan.  The first and third version use "resp_response+0" to do an implicit CAST to INT.

SELECT  resp_ques_id,
        yes  = SUM(resp_response+0),
        [no] = COUNT(resp_response) - SUM(resp_response+0),
        na   = COUNT(*) - COUNT(resp_response)
FROM #response
GROUP BY resp_ques_id
SELECT  resp_ques_id,
        yes  = SUM(CASE WHEN resp_response = 1 THEN 1 ELSE 0 END),
        [no] = SUM(CASE WHEN resp_response = 0 THEN 1 ELSE 0 END),
        na   = SUM(CASE WHEN resp_response IS NULL THEN 1 ELSE 0 END)
FROM #response
GROUP BY resp_ques_id
SELECT  resp_ques_id, yes = [1], [no] = [0], na = [2]
FROM (
    SELECT  resp_ques_id, resp_response = ISNULL(resp_response+0, 2), answer = 1
    FROM #response
) survey
PIVOT (COUNT(answer) FOR survey.resp_response IN ([0], [1], [2])) results

Sunday, December 11, 2011 - 8:19:49 AM - Brent Shaub Back To Top (15331)

Hi Ken, glad to hear your opinion and points.

1.  I find it easy to include NULLs in report queries using isnull(,).  This function enables the NULL results to be together before or after the other results as deemed useful in the output.

2.  I agree that NULL can mean an error, and your point is taken.  I disagree that it cannot be detected.  The most likely scenario for an error is all values for that row being NULL aside from the identity, although it is technically possible for only that field to be always NULL due to an error with it.  I believe preliminary testing would catch always NULL and be corrected first.  What kinds of errors during a bulk insert / update would cause only one field to be NULL only some of the time?

3.  In the situation mentioned of many systems being interrelated--especially cross-platform and with different DBMSes--it makes sense to not use NULL if they are handled differently by their DBMSes.  I've used Excel, Access and SQL Server 2000, 2005 and 2008 so far and only Excel poses the occasional transfer error.  The added workload for integrating into other DBMSes seems imminent to outweigh the cost of any disk space savings when using a bit for a tinyint.  Your example sounds more applicable in large business environments, and I give you this point for scalability.

4.  I'm most interested in the last statement about regretting NULLs.  Certainly many fields in tables will be NULL when their values are unknown.  Are you saying not to have any NULL fields, or that by using NULLs intentionally, the logic is less clear than using a non-null value?  If the latter, consider the main premise of the article being about saving disk space by using the three possible values of a bit field.  I know a man who saved space in a bowling league Access database by storing scores (range 0-300) in a byte.  How?  By adding 45 when displaying and subtracting 45 when entering.  That saved a byte per score.  Over three games per person per week for the nine months of just for one league, that added up.  If disk space is not an issue, then use tinyint.

I welcome any additional information / viewpoints.  With disk space becoming less an issue, the point I'm hoping to make is using just the right amount of space necessary.  I hear what you're saying about system integration and how NULLs can be symptoms of errors, but I believe NULLs will exist in all systems and need handled, whether intentional or unknown.

Great points.  Thanks for your feedback.


Wednesday, December 7, 2011 - 9:18:11 PM - Faheem Ahmad Back To Top (15314)

Thanks.

You really put good efforts to explain this new idea.

Really appreciated.

Hands off.


Wednesday, December 7, 2011 - 8:56:02 PM - ken ambrose Back To Top (15313)

Hi,

I have to say that using nulls to persist what is HOPED to represent some consistent meaning is a very poor idea in my opinion, for a number of reasons:

1: What happens if you want to summarize on this data?  what happens to the results reported from a grouping on this data where some of the some values are null?  It ain't pretty, and it won't be what 99% of users would expect.

2: When debugging, what does a null mean?  For example, if an insert or update fails unexpectedly, what value will end up in the column?  You guessed it, null.  So now in your db (unless you never ever have a failed update) you wont be able to tell what is the result of a failed update vs. what is an intended null.

3:In my work on a large enterprise data warehouse, populating datamarts supporting about 15 downstream apps, with both SSAS, COGNOS, and other reporting tools on top of the data, tons of lines in the code on the reporting side just tries to avoid errors or unexpected results when there are nulls.  Creating them intentionally will only make that problem worse in a large environment. 

4:There is no consistency between different vendors RDBMS when it comes to handling nulls, and often there is not full consistency in ONE vendor's RDBMS product.  If you ever have to migrate to a different back end you will regret every single null you have allowed in your data.

I could go on, those are just the first few issues that come to mind from my experience. 















get free sql tips
agree to terms