Dealing with a No NULL Requirement for Data Modeling in SQL Server


By:   |   Updated: 2020-01-28   |   Comments (2)   |   Related: More > Database Design


Problem

I am working with a system that doesn’t allow for NULLs in the SQL Server data model. However, I have clear cases where I’m missing information. For instance, I’m dealing with building out our academic system into a new platform and we have columns for final grade in a course as well as several date columns for various reasons. Under our current system, we are allowed to use NULLs, but that’s not allowed in the new system.

How do I handle these types of requirements in this no NULL allowed SQL Server database system?

Solution

The need for NULLs has been argued all the way back to the originator of relational database systems, E.F. Codd, who argued for the inclusion of NULLs. However, if the system you are using doesn’t support NULLs for its data model, here are the two typical options:

  1. Use default values
  2. Break up your entities into further entities

There are pros and cons with each of these approaches.

Use Default Values

Let’s take the case of the final grade. If we have a column for the final grade, let’s set a default value of 0. Here’s a paired down example containing only the 3 relevant columns:

class grade data

This default value works if there’s a final date when that final grade becomes the actual grade. For instance, most academic institutions have a final date where incomplete grades revert to a final status. Anyone working with the data should be aware of what those dates are because they influence what the data means. Reports will have to take those dates into account.

Pros

The main reason to choose this option is we don’t have to have a lot more tables to create and manage. It simplifies access to the data and being able to bring information together.

Cons

The biggest con is that anything working with the data has to know that there is a default value and handle any calculations accordingly. If there are cases where a default value becomes the permanent value after a certain trigger, like a date, anyone working with the data will need to understand what those triggers are.

For instance, during the exam period, instructors for courses with exams scheduled first should be the ones who are able to put a real value for FinalGrade. However, until the final exam is completed and the professor is able to enter in grades, that 0 is meaningless. Any reports or calculations done during the exam period (and even shortly thereafter) should take into account that some of those final grade values are still default values. There’s probably another table which contains the final exam schedule, or better, when grades must be in. That table is critical if someone is trying to do a calculation before the final exam period ends.

But what if there isn’t a defined period? For instance, what if it’s when a student graduates? Let’s look at that example:

class grade data

Default values for dates works well when you can put a date that’s obviously not correct. For instance, someone who has entered in 2018 or 2019 can’t possibly have graduated in 1990. We still have the issues with anything using the data and being aware of what has default values and what those values are. For instance, if a report writer is not aware that 1/1/1990 is a default value and then builds a report on graduating classes, all students who haven’t graduated will appear as the Class of 1990, even if the institution didn’t exist in 1990.

One concern with default values for dates is the dates must be before or after what is actually possible. For instance, if the academic institution has existed since 1890, then default dates with 1990 or even 1900 for GradDate will potentially cause a data integrity issue.

Break Up Entities into Further Entities

To avoid using default values, additional tables would have to be generated. Here’s an example involving the final grades again. If we care about seeing a final grade and not having to perform a calculation of all grades each time we need to report on the final grade, we’ll need another table just for final grade. We might look at the data visually like so:

class grade data

We have three tables from top-to-bottom with only the relevant columns for this example:

  • ClassRegistration
  • ClassGrade
  • ClassFinalGrade

ClassRegistration contains information about student registration for each class. ClassGrade has the information on every grade received for each course. Finally, ClassFinalGrade is where we store the actual final grade value. Since we’re not using default values and we aren’t allowing NULL, if a row that we’d expect to be in the table isn’t there, that means the data is missing.

Note that under ClassRegistration there are 3 students listed. All 3 students have grades in the ClassGrade table but there are only two students listed in ClassFinalGrade. The student identified by StudentNo 42562 doesn’t appear. If we look more closely at ClassGrade, we see that that there was a graded entry for November 15, 2019 but StudentNo 42562 doesn’t have a grade listed. That student has a grade missing. As a result, the professor has not entered a final grade for that student.

Pros

The biggest pro is there is nothing to understand about default values because those aren’t used unless they have some real meaning. For instance, if there is a column in the system to mark whether a student is an undergrad or a graduate student and the institution only offers undergraduate degrees, then a default value marking every student as an undergraduate makes sense.

Since default values are not being used for missing values, there is nothing to pass on with regards to default values to anyone working with the data.

Cons

Everything that could have a missing value has to be broken out in some way. For instance, while we might want a single table to track class registration with a final grade as part of that table, like we did with the default values method, we can’t do this if we want to avoid default values. That’s why there are two separate tables.

Also, anyone working with the data has to understand that if the data isn’t known, there won’t be an entry in the appropriate table. That obviously has an impact if someone is using an [INNER] JOIN but needs to keep information from one or both of the tables on either side of the JOIN.

Next Steps


Last Updated: 2020-01-28


get scripts

next tip button



About the author
MSSQLTips author K. Brian Kelley K. Brian Kelley is a SQL Server author and columnist focusing primarily on SQL Server security.

View all my tips





Comments For This Article




Thursday, February 06, 2020 - 11:06:28 AM - Will Back To Top (84220)

Breaking up the tables is a nicer approach as it adheres to database normalisation.  I agree about having a non-null constraint on the final grade column too.  I would rather communicate the model to anyone that needs to use the model rather than compromise on the data model design with regards to joins etc.  You could produce a view as a compromise that people could use that instead of the underlying tables if you wanted to return something for the non-presence of a final grade ie "awaiting grade".  This may be good for reporting purposes.


Wednesday, January 29, 2020 - 12:03:06 PM - Brahmanand Shukla Back To Top (84036)

If the requirement is clear that there won't be Null values then it's better to have NOT NULL constraint. Even if we know what default value to put it will be good to have NOT NULL constraint and specify default value in the INSERT script. This approach will help to maintain the data quality.

If the requirement is to INSERT first and UPDATE any specific value later on like Student Master and Student Score then it'll be good to have separate entities instead of a single entity with all the attributes. Reason is very simple. First of all having NOT NULL column for this purpose is not good and UPDATE is most costliest operation in SQL Server. Having separate entity also helps in having more data in a Page and if limited columns are requested then there will be lesser logical/physical reads.

However, every solution has some pros and cons.

Like,

Having single table makes read comparatively lighter if almost all the cooumns are requested each time but it'll add overhead during write specially during UPDATE.

Or

Having multiple tables will make read tough and complicated due to JOIN involved sometimes horribly when there is too many records. But write will be definitely lighter.

Making right choice depending upon the use case is an art of Data Modeling.



download





Recommended Reading

Find and Remove Duplicate Rows from a SQL Server Table

Working with SQL Server Extended Properties

Surrogate Key vs Natural Key Differences and When to Use in SQL Server

What is a GUID in SQL Server

SQL Server Database Diagram Tool in Management Studio








get free sql tips
agree to terms


Learn more about SQL Server tools