What is a Relational Database Management System

By: Nai Biao Zhou | Comments | Related: > Database Design

Problem

After being invented by Codd in 1969 (Date, 2019), the relational model, the scientific foundation for database technology, became one of the most widely used database models. Because many database design principles seem simple, people may naturally apply the principles in building a database without taking the proper steps. Without a solid, formal foundation in database design theory, databases can be built with potential problems. For example, these databases may not provide good query performance when databases grow to gigabytes and even terabytes. Ideally, everyone in a database-driven application development team should understand what makes a good database design (Stephens, 2008). Many concepts and technical terms in the design theory make the theory complicated; therefore, impeding people from learning the theory in depth. To remove this impediment, we should explain these concepts using plain language.

Solution

Many IT professionals have experience working with databases. However, not all of them have had a chance to build a database from scratch. Most developers work on projects using existing databases. They modify database tables or create new tables. They then embed database CRUD operations, i.e., Create, Read, Update, and Delete operations, in their programs. They may obtain knowledge about databases through practice, but lack the database design theory. The potential problems in databases may not be drawn to their attention. For example, we may find overloaded columns in legacy databases. An overloaded column containing different types of facts is a source of data quality issues.

Moreover, database modifications by careless developers could add more problems, such as data redundancy. Data redundancy occurs when the same fact is stored in multiple places. This practice can lead to several problems. For example, when a fact needs to be updated, we have to perform database operations on several tables. Besides, a database may give different answers to one question. These potential problems increase the cost of software application maintenance. The problems also can doom applications to fail. These problems emphasize the importance of database design prior to implementation.

A practical method of learning the database design theory is to read books written by credible authors. However, some books look intimidating because we see complex mathematics symbols and abstruse technical jargon. One must have some grasp of the technical terms before reading these books. This article uses plain English to explain these database design terms. The reference section at the end of this article provides a list of books and articles for database design.

This article first briefly reviews the path of the database system evolution. It takes some time to develop these database systems that we use today. We then explore the concepts of database and database management system. The article presents several types of databases and some database management systems. Next, we cover some terms used in relational models: relation, tuple, attribute, domain, schemas, and key. Finally, we briefly introduce SQL, the structured query language. We use SQL commands to communicate with a database without knowing database storage.

A Brief History of Database Systems

Data, representing facts, figures, and ideas, are commonly used in everyday life. To use data effectively, we need to manage data and make data available when and where it is needed. In the seventeenth century, people started to think of processing data automatically by using devices. One of the earliest such devices was produced by Blaise Pascal in the 1640s. As time went on, more and different kinds of devices and data processing techniques appeared. In 1963, Charles W. Bachman developed the Integrated Data Store (IDS). Starting in 1966, IBM, working together with Rockwell and Caterpillar, designed IMS, an information management system.

The IDS and IMS are described as the forerunners of navigational databases (Foote, 2017). These two systems did not support high-level query language (Garcia-Molina, Ullman & Widom, 2008). Database users need to know the physical structure of databases in order to perform database operations. The pioneers in database research continuously improved database systems. Codd wrote a series of papers (Codd 1969; 1970) to propose a novel technique to construct relational databases. Through the relational database systems, users can access a view of data called relations. To design a relational database, Codd developed the normalization process in 1972. Codd borrowed the term "normalization" from President Nixon, who "normalized" relations with China. Codd figured that if Nixon could normalize relations with a country, we should be able to normalize data relations as well (Oppel, 2009).

With relational database management systems (RDBMS), programmers can perform database operations without knowing data storage details. Many companies today use RDBMS because the systems can efficiently store and process structured data. However, as our economy changes, so do the needs of these companies. Nowadays, businesses need to store and analyze a colossal amount of unstructured data. For example, images and documents. The database technologies continue to evolve and NoSQL ("Not only" Structured Query Language) solutions appear. According to McCreary & Kelly (2013), "NoSQL is a set of concepts that allows the rapid and efficient processing of data sets with a focus on performance, reliability, and agility" (p. 4).

Introduction to Database Systems

When companies recognize that valid, consistent, and accurate data is vital to them, data becomes a valuable asset. They want to find ways to store, index, and retrieve data efficiently. Furthermore, data profoundly impacts everyday operations and decision-making processes; therefore, the number of database users in companies increases. Effective database systems have become a crucial component of information systems.

Database Definition

Datum is a single piece of factual information of interest to us. Data, the plural of datum, is a collection of information. The information may reflect on physical or conceptual objects. We can generally classify data into three categories: transaction data, component data, and classification data. Transaction data can be any facts recorded during business events. For example, transaction data may represent that a customer ordered three laptops on October 25, 2020. Component data reflects the characteristics of an object, such as movie titles. Classification data are the characterization of the transaction data and component data. For example, we use the genre to categorize movies.

After knowing the meaning of data, we define a database as an organized collection of logically related data from which users can retrieve the desired information. A database contains descriptions of its structure. These descriptions are metadata, data about data. Most applications built on databases fall into two categories: online transaction processing (OLTP) systems and online analytical processing (OLAP) systems. Taking a closer look at the database definition, we notice three characteristics of a database: persistent, interrelated, and shared.

First, we store data in permanent storage and we can find them in a reasonable amount of time. Nowadays, we often store data on stable media such as hard disk drives (HDD) and solid-state drives (SSD). Next, the storage must contain data and relationships between these data items. If we use a software application to manage a list of books, the list is not a database because it does not contain any relationships between books. The last characteristic is that a database must be shared. Many users can access a database and perform database operations.

There are two major types of databases: relational and non-relational. Relational databases are by far the most commonly used databases today and thus some IT professionals may not be exposed to other databases. Even though relational databases are handy in many situations, sometimes a different kind of database may make more sense for a specific project. For example, if data is naturally hierarchical, hierarchical databases may be the appropriate choice. Thus, it is worth taking some time to learn about other kinds of databases. The following are several types of databases in use:

Flat file databases: Stores data in permanent files that mostly are in text form;
Hierarchical databases: Arranges data in a tree-like structure;
Object databases: Represents information in the form of objects as used in object-oriented programming;
Relational databases: Contains a set of tables in which data are related;

Database Management Systems

We can use database management systems (DBMSs) as an interface to manage databases. A DBMS is a collection of software components designed to create and maintain databases and control all access to them. We use the DBMS to provide an effective method of performing database operations, troubleshooting database issues, and restricting data access. After years of evolutions, the DBMS can now provide timely access to data, which means users can access data when and where it is needed. Relational Database Management System (RDBMS), which is still popular today, is an advanced version of a DBMS system. Many database vendors claim their DBMSs are relational. Codd defined thirteen criteria, called Codd’s twelve rules (Codd, 1985), which determine whether a DBMS is a relational database management system.

There is a wide range of DBMS software available today. All users, including large-scale enterprises, small businesses, and individual users, can find products that satisfy their needs. While some DBMSs are expensive, we also can find robust open-source DBMSs on the web. According to DB-Engines Ranking, as of October 2020, Table 1 presents the top 10 DBMSs based on popularity. However, it is hard to tell which database management systems (DBMB)s are the most popular. With time, new issues and approaches surface regularly. The DB-Engines site updates the ranking every month.

Rank	DBMS	Primary Database Model	License	Developer
1	Oracle	RDBMS	Commercial (restricted free version is available)	Oracle
2	MySQL	RDBMS	Open Source	Oracle
3	Microsoft SQL Server	RDBMS	Commercial (restricted free version is available)	Microsoft
4	PostgreSQL	RDBMS	Open Source	PostgreSQL Global Development Group
5	MongoDB	Document store	Open Source	MongoDB, Inc
6	IBM Db2	RDBMS	Commercial (free version is available)	IBM
7	Elasticsearch	Search engine	Open Source	Elastic
8	Redis	Key-value store	Open Source	Salvatore Sanfilippo
9	SQLite	RDBMS	Open Source	Dwayne Richard Hipp
10	Cassandra	Wide column store	Open Source	Apache Software Foundation

Table 1 Top 10 Database Management Systems According to Their Popularity in October 2020 (DB-Engine, 2020)

Relational Models

A data model is a blueprint for building a database. In database design phases, we use data models to represent data. In the database design and implementation process, we first build a conceptual data model that identifies user views at a high-level. Conceptual data models are close to the way people perceive data; therefore, in practice, we share the conceptual models with non-technical users. Based on the conceptual data models, we then create logical data models representing how to store data logically. We usually communicate with technical users by using logical data models. Finally, we convert the logical design models into physical data models that show all table structures. The physical data model varies from DBMS to DMBS. When we say database design, we usually mean the logical database design unless the context demands otherwise. When we represent data as relations, we produce relational models.

Relations, Tuples, Attributes and Domains

Each relation in a relational model represents a specific type of entity. An entity is an object of interest to us and we store data about the object. For example, a movie is an entity as we store movie related information such as title, year of release, length, and genre. To make the term "relation" more palatable, we sometimes use a user-friendly name "table" in conversations. We can loosely claim that a relation is a two-dimensional table used to divide data. The example in Table 2 demonstrates a relation called MOVIE. Each row in the table illustrates a tuple representing one instance of the entity MOVIE, and each column defines an attribute of the instance.

TITLE	YEAR	LENGTH	GENRE
Gone with the Wind	1939	231	Drama
Star Wars	1977	124	Sci-Fi
Wayne’s World	1992	95	Comedy

Table 2 An Example Relation (Garcia-Molina et al., 2008)

However, strictly speaking, a relation is not a table, and a row is not a tuple (Date, 2015). The study of genealogical data led to the invention of the relation theory. A relation, such as children-of, occupation-of, and marital-status-of, is a correspondence between members of two sets (Mealy, 1967). In mathematics, a set is a collection of distinct objects. We call these objects in the set members or elements of the set. An empty set does not have any elements. Table 2 contains four sets:

S1 = {"Gone with the Wind", "Star Wars", "Wayne’s World"}

S2 = {1939, 1977,1992}

S3 = {95, 124, 231}

S4 = {"Comedy", "Drama", "Sci-Fi"}

Therefore, the relation illustrated in Table 2 is a relation defined on these four sets. A tuple is a finite ordered list of elements. An n-tuple is an ordered list of n elements, where n is a non-negative integer. If we select one element from every set shown in Table 2, we create a 4-tuple:

("Gone with the Wind", 1939, 231, "Drama")

Given n sets of elements, denoted by S1, S2, …, Sk, …, Sn, we can construct an n-tuple in which the first element is from S1, and the second element is from S2, and so on. By repeating the n-tuple construction process, we obtain a set of n-tuples. This set of n-tuples is a relation on those n sets (Codd 1969; 1970). We also say that the relation is of degree n, and the set, Sk, is the relation's Kth domain. As illustrated in Table 2, the domain of TITLE is the set of character strings that represent titles of movies. An attribute is a name paired with a domain. The selected data elements are the attribute values of each entity. Informally, an attribute is a column of relation for which each row has a value assigned. Date claims that a column is not an attribute (Date, 2015).

The MOVIE relation reflects the fact that a "GENRE" movie with the title "TITLE" released in the "YEAR" is "LENGTH" minutes long. The relation has four attributes: TITLE, YEAR, LENGTH, and GENRE. There are three tuples in the relation and each of them represents an instance of the movie entity. We also observe four domains that define the types of values for each attribute. If we add a duplicate row into Table 2, the table cannot represent a relation because all tuples in a relation should be distinct from one another in content (Codd, 1990).

An IT professional may work on conceptual database design, logical database design, and physical database design to build a database. The object at the logical level and the object at the physical level are not always a one-to-one correspondence. Therefore, we recommend getting in a habit of using logical terms such as relation, tuple, and attribute in logical database design. Even though we may find it easier to think of physical objects in the logical design, we should consistently use logical terms in the logical database design.

Relation Schemas

At the logical model design level, a relation schema consists of a relation name followed by a list of its attributes. A database schema is a set of schemas for a database's relations (Garcia-Molina, Ullman & Widom, 2008). The schema of a relation may not change, but the relation, which is a variable, changes over time. At a particular moment, we can obtain a value of the relation, such as the MOVIE relation demonstrated in Table 2. To present the schema of the relation MOVIE, we can write the schema in this format:

MOVIE (TITLE, YEAR, LENGTH, GENRE).

The sequence of attributes is immaterial. We can change the sequence of the attribute list without changing the relation. However, the ordering of these attributes is significant since it corresponds to the ordering of tuple components. Thus, when we change the order of attributes, we should change the order of components in the tuples accordingly.

The concept of schema may have different meanings. A schema may mean a container that segregates database objects for different applications at the physical model design level. In this case, we can use the schema to manage database security. Besides, a conceptual schema, which is usually represented by the entity-relationship diagram (ERD), describes the entire database's structure. The conceptual schema (including entities, relationships, and constraints) is readily translated to relations.

Keys

All tuples in a relation must be distinct, and tuple ordering is immaterial (Codd, 1969). Therefore, every relation should have at least one unique identifier to maintain the uniqueness of tuples. Assuming any two movies cannot have the same title and the same year of release, we can find a tuple in the MOVIE relation by a movie title and release year. The combination of these two attributes, TITLE and YEAR, can uniquely identify a tuple. We call this combination the key of the relation. A key can have one or more attributes. We can always consider a key to be a set, even though it has one attribute. When a key consists of more than one attribute, we call the key a compound key or composite key. There are several types of keys in a relation model.

A super-key is a subset of attributes in a relation that can uniquely identify a tuple. In the MOVIE relation shown in Table 2, the attributes TITLE, YEAR, LENGTH, and GENRE together form a super-key because no two tuples have the same combination of TITLE, YEAR, LENGTH, and GENRE values at any time. We also notice that the attributes TITLE, YEAR, and LENGTH form a super-key. When we determine whether a subset of attributes can identify a tuple, we look at all possible relation values. For example, the YEAR attribute has distinct values in Table 2, but we cannot consider the attribute to be a super-key. Many movies likely have the same year of release.

A key is a minimal super-key. If we remove any attribute from a key, the remaining attributes are not unique. If a relation has more than one key, each is called a candidate key. A prime attribute is an attribute that belongs to some candidate keys. A non-prime attribute is an attribute that does not belong to any candidate key. For example, the combination of attributes TITLE and YEAR in the MOVIE relation form a key. We then consider the attributes TITLE and YEAR to be prime attributes. The attributes LENGTH and GENRE are non-prime attributes. When knowing prime attributes of a relation, we can indicate these attributes with underlines in the schema of the relation:

MOVIE (TITLE, YEAR, LENGTH, GENRE).

When a relation has more than two candidate keys, we arbitrarily choose one as the primary key. We usually select a candidate key that least likely receives a change in its value. Another practice is to select the most straightforward candidate key. The other candidate keys, which are not selected as the primary key, are alternative keys. Date claims that primary keys are nice but not essential (Date, 2019). Table 3 summarizes some keys discussed in this section.

Type	Definition
Super-key	A subset of attributes in a relation that can uniquely identify a tuple.
Key	A minimal super-key.
Candidate key	A relation may have more than one key. Each is called a candidate key.
Primary key	A candidate key that is arbitrarily chosen for a relation.
Alternative key	All candidate keys that are not chosen as the primary key.
Composite Key	A key that consists of more than one attribute.

3 Types of Keys in Relational Model

Introduction to SQL

End-users, who are not database professionals, want to perform database operations without knowing the details of database access methods and data storage. Therefore, they need a high-level language that hides physical implementation details. There comes the SQL, Structured Query Language, developed by IBM at its San Jose Research Laboratory in the early 1970s (Harrington, 2016). The language used to be known as SEQUEL (Structured English Query Language) and was initially pronounced "sequel". Even though we use SQL as the name of the language today, we have not changed the pronunciation. Many of the currently available RDBMSs support SQL. ANSI (the American National Standards Institute) and ISO (International Standards Organization) both accept it as a standard query language for relational databases.

SQL is an English-like language used for data management in relational databases. SQL does not use terms such as relation, tuple, and attribute, defined in the logical database design phase. We use friendly terms. For example, table, row, and column, in SQL. Through SQL commands, we can perform database operations on a database. Generally, there are four types of SQL commands:

Data Definition Language (DDL): defines the database's structure or schema. For example, create schema, tables, and views;
Data Manipulation Language (DML): manipulates data. For example, perform CRUD operations;
Data Control Language (DCL): controls access to data. For example, grant and revoke access to parts of a database;
Transaction Control Language (TCL): manages transactions in a database. For example, group DML statements into a logical transaction;

Most users only have access to the DML and TCL. Some users, such as database administrators (DBAs), have a chance to use the DDL and DCL. When executing SQL commands on a database, we can issue commands to the database interactively through some database tools. For instance, Microsoft SQL Server Management Studio. We can also wrap all SQL commands into a text file and execute all these statements in a batch. Application programmers can embed SQL commands into their programs and enable their programs to communicate to the database.

Summary

Database design is not easy. Before we learn database design theory it is worth taking some time to know some design theory terminology. Although we may have some database design and development experience, knowing these background concepts and definitions can broaden our database design horizons and give us some knowledge of database trends and directions.

The author briefly reviewed the history of database systems. Even though some database design theory principles seem common sense, we should know that many innovators have made contributions to this theory. The article covered some definitions in logical database design after exploring fundamentals in databases and database management systems. We prefer to use these formal terms (such as relation, tuple, and attribute), during the logical model design phase. Finally, we scratched the surface of SQL, which acts as an interface between ender-users and databases.

Reference

Codd, E. F. (1969). Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks. IBM Research Report RJ599 (August 19th, 1969).

Codd, E. F. (1970). A Relational Model of Data for Large Shared Data Banks. Communications of the ACM 13, No. 6 (June 1970).

Codd, E. F. (1985, October). The 12 rules. Retrieved from https://reldb.org/c/index.php/twelve-rules/.

Codd, E. F. (1990). The Relational Data Model for Database Management, Version 2. Addison-Wesley.

Date, C. J. (2015). SQL and Relational Theory: How to Write Accurate SQL Code(3^rd ed.). O'Reilly Media.

Date, C. J. (2019). Database Design and Relational Theory: Normal Forms and All That Jazz (2^nd ed.). Apress.

DB-Engine (2020, October). DB-Engines Ranking. Retrieved from https://db-engines.com/en/ranking.

Foote, D. K. (2017, March 23). A Brief History of Database Management. Retrieved from https://www.dataversity.net/brief-history-database-management/.

Garcia-Molina, H., Ullman D. J. & Widom, J. (2008). Database Systems: The Complete Book. Pearson.

Gillenson, L. M. (2011). Fundamentals of Database Management Systems (2^nd ed.). Wiley.

Harrington, L. J. (2016). Relational Database Design and Implementation (4^th ed.). Morgan Kaufmann

Kahate, A. (2004). Introduction to Database Management Systems. Pearson India.

Moss, J & Davidson L. (2016). Pro SQL Server Relational Database Design and Implementation (5^th ed.). Apress.

McCreary. D., J & Kelly A. (2013). Making Sense of NoSQL: A guide for managers and the rest of us. Manning Publications.

Mealy, G. H. (1967) Another Look at Data. AFIPS, pp. 525-534, 1967 Proceedings of the Fall Joint Computer Conference, 1967. http://doi.ieeecomputersociety.org/10.1109/AFIPS.1967.112.

Silverston, L. & Simsion, G. (2001) The Data Model Resource Book, Volume 1. Wiley

Stephens, R. (2008). Beginning Database Design Solutions. Wrox.

Oppel, A. (2009). Data Modeling, A Beginner's Guide. McGraw-Hill.

Next Steps

Bertrand Russell said: "Writing can be either readable or precise, but not at the same time." The author introduced some concepts and technical terms in the database design theory. With an understanding of this technical jargon, the design theory is not incredibly complicated. However, to make these terms understandable, this article did not explain these terms precisely. The reference section provides some classic articles and books about database design. Readers are encouraged to read some of them to master the design theory. We welcome database professionals contributing knowledge about database design in the comment section. Relational Database design is a science. According to Date (2019): "Nobody could, or should, claim that database design is easy" (p.3).
Check out these related tips:

About the author

Nai Biao Zhou is a Senior Software Developer with 20+ years of experience in software development, specializing in Data Warehousing, Business Intelligence, Data Mining and solution architecture design.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips