![]() |
|
|
By: Brent Shaub | Read Comments (19) | Print Brent has been creating, renovating and administering databases since 1998. Related Tips: More |
|
Frequently surveys ask yes/no questions and use a bit datatype. What are the effects of allowing this column to be NULL in a SQL Server table? Check out this tip to learn more.
The bit's ability to store a NULL gives it a fundamental third value needing no additional storage space. Using NULL to denote the survey question was read, but not answered is logical. Like 1 and 0, the value of NULL must be used consistently. The existence of a record in survey results differentiates "I saw the question and decided to skip it" from "the system never asked me that question".
NULL is a funny animal in the SQL kingdom; it uses a different syntax in the WHERE clause for queries. Below are two examples.
NULL is not equal to anything, not '' nor even NULL. This query will always return 0 records.
select count(question1) as count_na from survey_answers where question1 = NULL
The correct syntax is:
select count(question1) as count_na from survey_answers where question1 IS NULL
As an alternative, the ISNULL(<column>, value) function will supply the value of the second parameter if the value of the column in the first parameter is NULL. Keep in mind that the supplied value must be of the same datatype. If using ISNULL with a bit and needing a third non-null value, use something like ISNULL(CAST(<column> as tinyint), 2). The choice of the value i.e. 2 and the data type i.e. tinyint can be changed to best suit your situation. ISNULL is handy for sorting NULLs where they appear best. Supplying -1 or the highest value + 1 will left or right justify rows with NULLs. Supplying a value for ISNULL that's between other columns' values places them in the middle. Outputting a message like "None found" is a common use too.
Bits ties tinyint and char in using one byte. Why a byte when a bit is an eighth of one? If a table has one bit, SQL allocates an entire byte. As other bits are defined, they use the other bits; up to eight bit fields total. When creating a ninth bit, a second byte is used, and creating a seventeenth will use a the first bit in a third byte.
Here is a sample data model for our survey application:
|
Survey ERD |
|---|
![]() |
Using the schema above, here is a query to count the responses by question:
select coalesce(yesses.resp_ques_id, nos.resp_ques_id, nas.resp_ques_id) ques_id, isnull(yesses.yes, 0) yes, isnull(nos.no, 0) [no], isnull(nas.na, 0) na from (select resp_ques_id, count(1) as yes from response where resp_response = 1 group by resp_ques_id ) yesses left outer join (select resp_ques_id, count(1) as no from response where resp_response = 0 group by resp_ques_id ) nos on yesses.resp_ques_id = nos.resp_ques_id left outer join (select resp_ques_id, count(1) as na from response where resp_response is null group by resp_ques_id ) nas on nas.resp_ques_id = nos.resp_ques_id
Here are the sample data:

This query will scale for any number of questions in a survey. To filter by a survey, simply add a join in each sub query from response to question with an AND ques_surv_id = x.
| Share: | Share | Tweet |
|
![]() |
![]() |
Connect with MSSQLTips.com |
| Wednesday, December 07, 2011 - 8:56:02 PM - ken ambrose | Read The Tip |
|
Hi, I have to say that using nulls to persist what is HOPED to represent some consistent meaning is a very poor idea in my opinion, for a number of reasons: 1: What happens if you want to summarize on this data? what happens to the results reported from a grouping on this data where some of the some values are null? It ain't pretty, and it won't be what 99% of users would expect. 2: When debugging, what does a null mean? For example, if an insert or update fails unexpectedly, what value will end up in the column? You guessed it, null. So now in your db (unless you never ever have a failed update) you wont be able to tell what is the result of a failed update vs. what is an intended null. 3:In my work on a large enterprise data warehouse, populating datamarts supporting about 15 downstream apps, with both SSAS, COGNOS, and other reporting tools on top of the data, tons of lines in the code on the reporting side just tries to avoid errors or unexpected results when there are nulls. Creating them intentionally will only make that problem worse in a large environment. 4:There is no consistency between different vendors RDBMS when it comes to handling nulls, and often there is not full consistency in ONE vendor's RDBMS product. If you ever have to migrate to a different back end you will regret every single null you have allowed in your data. I could go on, those are just the first few issues that come to mind from my experience. |
|
| Wednesday, December 07, 2011 - 9:18:11 PM - Faheem Ahmad | Read The Tip |
|
Thanks. You really put good efforts to explain this new idea. Really appreciated. Hands off. |
|
| Sunday, December 11, 2011 - 8:19:49 AM - Brent Shaub | Read The Tip |
|
Hi Ken, glad to hear your opinion and points. 1. I find it easy to include NULLs in report queries using isnull(,). This function enables the NULL results to be together before or after the other results as deemed useful in the output. 2. I agree that NULL can mean an error, and your point is taken. I disagree that it cannot be detected. The most likely scenario for an error is all values for that row being NULL aside from the identity, although it is technically possible for only that field to be always NULL due to an error with it. I believe preliminary testing would catch always NULL and be corrected first. What kinds of errors during a bulk insert / update would cause only one field to be NULL only some of the time? 3. In the situation mentioned of many systems being interrelated--especially cross-platform and with different DBMSes--it makes sense to not use NULL if they are handled differently by their DBMSes. I've used Excel, Access and SQL Server 2000, 2005 and 2008 so far and only Excel poses the occasional transfer error. The added workload for integrating into other DBMSes seems imminent to outweigh the cost of any disk space savings when using a bit for a tinyint. Your example sounds more applicable in large business environments, and I give you this point for scalability. 4. I'm most interested in the last statement about regretting NULLs. Certainly many fields in tables will be NULL when their values are unknown. Are you saying not to have any NULL fields, or that by using NULLs intentionally, the logic is less clear than using a non-null value? If the latter, consider the main premise of the article being about saving disk space by using the three possible values of a bit field. I know a man who saved space in a bowling league Access database by storing scores (range 0-300) in a byte. How? By adding 45 when displaying and subtracting 45 when entering. That saved a byte per score. Over three games per person per week for the nine months of just for one league, that added up. If disk space is not an issue, then use tinyint. I welcome any additional information / viewpoints. With disk space becoming less an issue, the point I'm hoping to make is using just the right amount of space necessary. I hear what you're saying about system integration and how NULLs can be symptoms of errors, but I believe NULLs will exist in all systems and need handled, whether intentional or unknown. Great points. Thanks for your feedback. |
|
| Monday, December 12, 2011 - 8:48:03 AM - Scott C | Read The Tip |
It is valuable to know how to use NULLs correctly, although most experts will take issue with using NULL to represent a specific value. I won't pile on, you can get away with it occasionally but I wouldn't make a habit of it. Any tutorial on dealing with NULL values is incomplete without mentioning "WHERE value NOT IN (list with NULLs)". Here is an example: SELECT * FROM sys.tables WHERE name NOT IN ('Bippety', 'Boppety', 'Boo', NULL)
This query returns 0 records in any database (SQL 2005 or later, assuming SET ANSI_NULLS ON), so either those three magic words represent every possible table name or there is something funny going on with the NULL. One aspect of your article I do have to complain about is writing a query with three separate table scans to tally the answers. Here are three different ways to write it with one table scan. The first and third version use "resp_response+0" to do an implicit CAST to INT. SELECT resp_ques_id, SELECT resp_ques_id, SELECT resp_ques_id, yes = [1], [no] = [0], na = [2] |
|
| Monday, December 12, 2011 - 7:43:12 PM - Brent Shaub | Read The Tip |
SELECT * FROM sys.tables WHERE name NOT IN ('Bippety', 'Boppety', 'Boo', NULL)
This is a human-readable expression that will result correctly with two small changes: SELECT * FROM sys.tables WHERE ISNULL(name, 'NULLNAME') NOT IN ('Bippety', 'Boppety', 'Boo', 'NULLNAME')
The example using sys.tables where name is a required field when changed to a user table with a NULLable column will be easier to confirm the logic is to exclude certain values and those that are NULL. I appreciate seeing three ways to process the results in one scan. The added bonus of BITFIELD+0 doing an implicit conversion sure makes for less typing over CAST. Thanks for your input. One of the great benefits of this forum is seeing better ways to achieve things. |
|
| Tuesday, December 13, 2011 - 8:21:54 AM - Scott C | Read The Tip |
|
In your proposed change, the ISNULL function is pointless and will only result in making the WHERE clause non-SARGable (i.e. unable to use indexes). "WHERE NULL NOT IN (subquery)" will never be true, whether the subquery includes NULL or not, so there is no reason to add functions to handle it. The obvious way to correct the query is to remove the NULL literal from the IN list. The point of my example was to watch out for "WHERE ? NOT IN (SELECT ColumnWithNulls FROM Anywhere)". If the subquery includes at least one NULL value, the WHERE will exclude all records from any query. It doesn't matter what the ? represents. When using "WHERE NOT IN (subquery)" the subquery should include "WHERE x IS NOT NULL" unless you're absolutely certain that the subquery will never include a NULL, |
|
| Wednesday, December 14, 2011 - 7:03:03 PM - Brent Shaub | Read The Tip |
|
Hi Scott, I disagree that the query with ISNULL() does not work. It does. When executed, the resultset will be all tables. Whether or not it's indexable is not my area of speciality, and I'll dig into it more. It's useful to know. Based on the example in your latest post, I reckon the first query's IN clause started down a different path as the latter mentions a subquery rather than explicit values. It would be the subquery, then, that needs the ISNULL to return any static value to avoid the pitfall you've mentioned here. Another option is AND field IS NOT NULL (or OR field IS NOT NULL) along with the IN depending on the query's logic. These details aside, thanks for pointing out that NULL in the resultset of an IN can provide unexpected results. |
|
| Monday, December 19, 2011 - 8:18:34 AM - Scott C | Read The Tip |
|
Let me try a fuller explanation. The example query showed a literal NULL as part of the NOT IN () list to illustrate the issue, hopefully that is not something any sane reader of this forum would actually do. So the problem I was trying to illustrate was using NOT IN with a subquery that might include a NULL. The nullablility of the search field is beside the point. I posted what I thought was an absurd example, I never considered someone would try to fix it. I didn't say your version with the ISNULL function doesn't work, but I was trying to point out that it is not a desirable change. Your version works not because of the ISNULL, but because you also removed the NULL value from the list. The list was the point of the example, not the search field. If a query has a filter clause of WHERE xyz NOT IN (list), and the list (whether literal costants or a subquery) contains a NULL, the query will return no records. Changing the search field to ISNULL(xyz, 'abc') (or any other expression) will have no effect on the results, but will probably cause the query optimizer to ignore any index on xyz and do a table scan. If the list has no nulls, then the WHERE xyz NOT IN (list) filter will exclude any records where xyz is NULL without the help of ISNULL. So in both cases the ISNULL function adds nothing. The only reason to use ISNULL would be if you wanted to turn NULL values into something that would satisfy NOT IN, so the records with xyz = NULL would be included in the results. This could also be written WHERE xyz IS NULL or xyz NOT IN (list). But that was not the original example query. |
|
| Monday, December 19, 2011 - 2:35:41 PM - Jeremy Kadlec | Read The Tip |
|
Scott C., Thank you for the additional feedback. Thank you, |
|
| Tuesday, December 20, 2011 - 12:27:19 AM - ken ambrose | Read The Tip |
|
re: I'm most interested in the last statement about regretting NULLs. Certainly many fields in tables will be NULL when their values are unknown. Are you saying not to have any NULL fields, or that by using NULLs intentionally, the logic is less clear than using a non-null value? If the latter, consider the main premise of the article being about saving disk space by using the three possible values of a bit field. I know a man who saved space in a bowling league Access database by storing scores (range 0-300) in a byte. How? By adding 45 when displaying and subtracting 45 when entering. That saved a byte per score. Over three games per person per week for the nine months of just for one league, that added up. If disk space is not an issue, then use tinyint. I am saying not to have any null fields. But you will say, sometimes some attribute may be unknown but we want to create a record of what we do know. Fine, use a value to indicate the data is currently unknown. Or even better, move nullable columns into relations. When you have the the data for a column, add it to the relation. When no data, don't create a row in the relation. Voila', no nulls needed. Ever. It might be worth thinking about why it is so tempting to try to use nulls to represent some consistent meaning in databases. Note that I said "tempting", I did not say I think it is a good idea. We actually do not have a true RDBMS implementation from any current vendor - if I understand Codd. And that fact is the main reason nulls are such a problem using the current RDBMS available. You see, RDBMS were supposed to be able to support "domains". Domains are definitions for allowable values, along with the operations allowable on those values. If current vendors supported this feature, we could make perfect sense of a domain defined as "all valid dates in the gregorian calendar, or the characters "n/a. Allow date subtraction and addition on all the valid dates, and simply return "n/a" if any reference is made to a column containing "n/a".
But as we all know, domains are not really supported in current implementations. Only pieces of the definition. Hence these recurring hassles trying to use them to represent business facts in an effiicent and reliable manner...
|
|
| Thursday, January 05, 2012 - 3:23:27 AM - Kaki | Read The Tip |
|
Hi Scott! As I see it there are two problems with your design. First you allow NULL to implicity have a value. You specify the value in your text as "maybe skipped". Also in surveys there is an option like "I don't want to answer this question". My experiense is that if you find yourself in the situation where NULL has a meaning you're doing something wrong. It can be poor design, bad understanding of requirements or anyting else. In this case I wold propably have a lookup table with values "1:Yes, 2:No, 3:Skipped, 4:Don't want to answer". This fast solution would also make it easy to insert the answer "Maybe" to the question "Do you want to whatever?". When I see a NULL i often read it as "There is absolutely no way to make any decisions on this value.", although it's not an absolute rule. For example "when was this invoice sent?" and the Sent column is NULL the odds that it has not been sent is pretty high (but i would not rely on that) Other than that it was a good article. // Kaki PS. English is not my first language. Sorry for any grammar errors... |
|
| Thursday, January 05, 2012 - 5:39:54 AM - pipTheGeek | Read The Tip |
|
You say in the post "The bit's ability to store a NULL gives it a fundamental third value needing no additional storage space.". This isn't true. When the column is nullable it uses a bit from the "NULL bitmap" for the row. (Unless the table is sparse) Your example schema where resp_response is the only nullable column in the table, means that one byte is used for the value of this field, another byte is used for NULL / NOT NULL (the nullable bitmap is allocated in whole bytes, so between 1 and 8 nullable fields uses one byte, 9 nullable fields would use 2 bytes etc). You would save one byte of storage per record if you changed from a nullable bit to a non nullable tinyint. So saving disk space is unlikely to happen if using nullable bit columns. |
|
| Thursday, January 05, 2012 - 9:44:57 AM - ScottPletcher | Read The Tip |
|
An interesting article with some interesting points. I, too, however, would not want to assign a specific meaning, beyond "unknown", to NULL; and would also rather expand the column from a bit to, say, a tinyint. I think it would be a big difference from what people are used to, so it would take designers and developers some time to get used to, to say the least. |
|
| Thursday, January 05, 2012 - 7:09:42 PM - Alex Fekken | Read The Tip |
|
I agree with the previous posters that it is wrong to use NULL as representing a meaningfull value. In my opinion NULL is a state ("don't know if there is a value or whether a value would even be meaningfull") rather than a value and this is consistent with the way NULL are treated in expressions. Fortunately nobody (so far) is suggesting to switch ANSI_NULLS OFF... |
|
| Friday, January 06, 2012 - 3:55:44 AM - Brent Shaub | Read The Tip |
|
Kaki, your grammar and English read like a native, so great job on that. Pip, yes, I later learnt, as you've stated, that multiple bit fields for a record make use of the byte allocated by the first bit. In the example provided with just one, there would indeed be no space saving for that lone bit compared to a tinyint. Scott Pletcher, (thanks for the last name to separate from the other Scott), I agree it would be something unexpected for other developers to learn to use NULL as a legit value. While I don't mind standing out in a crowd, I have come to believe that using a tinyint is a better solution. I appreciate everyone's contributions and backing up the underlying reasoning. Alex, had to remind myself of SET ANSI_NULLS OFF. The one great thing about a query after SET ANSI_NULLS OFF is that WHERE ? IN (1, 2, 3, NULL). will work as if NULL is just another value. Thanks for that. |
|
| Friday, January 06, 2012 - 7:18:19 AM - pipTheGeek | Read The Tip |
|
@Brent, it isn't that there would be no space saving, using the nullable bit field uses DOUBLE the space of a non-nullable tinyint becuase you have a byte allocated to store the null bitmap AS WELL AS the byte allocated to store the actual bit value. *Unless you are using sparse columns, but they would not normally be used for only a few nullable columns. |
|
| Friday, January 06, 2012 - 9:13:46 AM - GrumpyOldDBA | Read The Tip |
|
Sometimes I really worry! ANSI NULLs are required for a number of features to work, indexed views, filtered indexes, partitioning and so on. The whole concept of nullable bits is contrary to every aspect of data typing and integrity, sorry but I absolutely stand against using either nullable bits or using nulls to indicate a value. To further clarify nullable columns create wider indexes ( when indexed ) a very good reason to make sure that nulls are never applied to columns which are indexed. |
|
| Friday, January 06, 2012 - 9:25:32 AM - ScottPletcher | Read The Tip |
|
As to disk space requirements for making a column NULLable, I thought SQL Server now reserved the NULL bit for every column, regardless of whether the column was (initially) specified as NULLable or not. Maybe they figured that since it only cost one bit per column, it was more effective overall to assign the bit so that making the column NULLable later did not require adjusting every data page to add the NULL bit for that column. |
|
| Tuesday, February 21, 2012 - 11:19:56 AM - Alex B | Read The Tip |
|
From MSDN: In a future version of SQL Server, ANSI_NULLS will always be ON and any applications that explicitly set the option to OFF will generate an error. Avoid using this feature in new development work, and plan to modify applications that currently use this feature. So back to ISNULL with text types ISNULL(,'A')!=ISNULL(,'B') or with numeric types or bit type (no need to cast tinyint) ISNULL(,1)!=ISNULL(,0) |
|
|
privacy | disclaimer | copyright | advertise | about authors | contribute | feedback | giveaways | user groups Some names and products listed are the registered trademarks of their respective owners. Edgewood Solutions LLC | MSSharePointTips.com | MSSQLTips.com |