By: Armando Prato
Overview
Before you dive into creating your tables and writing any code, you should initially dedicate time up front to develop a logical data model that attempts to achieve at least 3rd Normal Form.
Explanation
Logical data modeling is where you will define the business entities which will eventually become tables, it's attributes (columns) and how the entities inter-relate with each other. This initial step involves the process of normalization. Normalization is where you break down your entities and remove redundancy from them, eliminate any repeating data from them, and where eventually each entity has a primary key where the attributes all rely solely on this key. This process is usually performed leveraging data modeling tools which can visualize the model.
There are 3 basic normalization ("normal") forms:
- 1st Normal Form (1NF) (a primary key is identified, attributes reduced to their smallest component, and repeating data is identified and removed)
- 2nd Normal Form (2NF) (non primary key attributes not dependent on the entire primary key are removed)
- 3rd Normal Form (3NF) (remaining non primary key attributes that have dependency on other non primary key attributes (i.e. transitive dependencies) are removed
There are additional normal forms such as Boyce-Codd normal form (BCNF), 4th normal form (4NF), 5th normal form (5NF), etc. I generally find that achieving 3rd normal form is adequate. Your mileage may vary.
Why go through all this effort to normalize? Normalization can help reduce the amount of data in your tables leading to fewer I/Os to retrieve data for queries as well as memory savings in SQL Server's buffer pool. It can also help prevent issues with data modification anomalies where data is updated in one column but perhaps missed in other columns. Lastly, referential integrity, such as foreign key validation, becomes easier to maintain.
Depending on your design, there might be a performance penalty to normalizing to 3NF and, especially, beyond since many JOINs could be required to satisfy a query request. This shouldn't preclude you from initially designing to at least 3NF. If a query issue is later identified, you do have the option of introducing controlled redundancy into your design. In this process, you explicitly introduce redundancy to solve a performance issue. However, don't automatically add denormalization to a design; instead allow a real performance situation to dictate it.
Additional Information
- Database Normalization Basics
- Understanding First Normal Form in SQL Server
- Getting started with SQL Server database diagrams