How to use Factless Fact Tables in a Data Warehouse Business Intelligence Solution

By:   |   Updated: 2022-01-17   |   Comments (2)   |   Related: More > Data Warehousing


   Free MSSQLTips webinar - "Efficient Monitoring and Management of SQL Server" (click to register)

Problem

As a SQL Server Business Intelligence Developer, I want to understand how and when to use a Factless Fact table in a data warehousing solution.

Solution

The solution is to go through the basics of a Factless table from a business intelligence (BI) perspective along with the proper understanding of how to use it considering a simple scenario.

About Factless Fact Table

This tutorial assumes that the reader is familiar with SQL Server database and data warehouse business intelligence concepts.  As well as is comfortable with the common terms used and applied in this field (of business intelligence).

Let us try cover what a Factless Fact table is before we dive into a scenario that requires such a table in the building of a data warehouse business intelligence solution commonly known as a DWBI solution.

What is a dimension table?

A dimension table is just like a reference table in a data warehouse business intelligence solution. For example, Employee, City, Department are all dimensions that help us understand a fact properly.

What is a fact table?

A Fact table is typically a table created in a data warehouse which contains facts such as total number of employees in an organization or average sales figures for all the products and so on.

How a dimension is related to a fact table?

Dimensions are just like reference tables that are referenced in a Fact table along with the calculated or computed facts it contains. For example, you may find EmployeeKey (data warehouse generated id) from Employee dimension, DepartmentKey from Department dimension being passed to the FactSalary table.

How a dimension is related to a Factless table?

Traditionally, a dimension is related to a Factless table in the same way it is related to Fact table. However, the only difference is that a Factless table may contain multiple dimensions, but no fact (calculated column).

Can a dimension be linked with a fact and Factless table at the same time?

Yes, a dimension can be linked with both a Fact and Factless table at the same time.

When to use Factless FACT Tables

In order to understand the use of Factless FACT table we have to refer to, Kimball Group, one of the earliest pioneers in the field of Data Warehouse. According to Kimball Group, "It is possible that the event merely records a set of dimensional entities coming together at a moment in time. For example, an event of a student attending a class on a given day may not have a recorded numeric fact, but a fact row with foreign keys for calendar day, student, teacher, location, and class is well-defined."

In other words, we can easily capture numeric values known as facts (to be put into Fact table) in a business process, but when we want to capture out-of-the-box information focused on dimensions itself to extract interesting but beneficial information we turn towards Factless tables.

There may be many business-focused legitimate reasons that require you to build a Factless FACT table as an essential part of your data warehouse business intelligence solution architecture, but let us discuss the common ones to get an idea of this interesting DWBI concept.

Missing Factor

A Factless table can help your business to understand "missing factors" often overlooked or not considered. The simplest example that I can think of is related to product sales. Now, a data warehouse is smart enough to show the business the total sales per year, per month or per week. However, what about the products that were not sold but were on display and it may not be easy to find out what products were not sold per week or per month while the others were quickly going off the shelf.

Negative Analysis

The term "Missing Factor" mentioned above is more broadly accepted as Negative Analysis and so the Factless table actively provides information about the things, processes or events which did not occur.

Now, another example of using a Factless table is if we think of a company providing services to the customers and was approached by a customer who decided not to buy the service ultimately although the customer was assigned a sales person who booked a product demo with him and the customer was shown the product demo but then for some reason the customer chose not to complete the business transaction. Please remember this information is crucial as well because we are in an era where organizations race to have a competitive edge over their competitors in business while on the other side they are very keen to know what is not happening and why.

Identifying an Activity or Event

Factless tables can be also used for information extraction regarding an activity or event not predefined in the system. For example, we require a Factless tables to tell us which students did not attend the exam or simply which students were absent from a class presentation and so on.

Coverage

Now this simply means that one set of information is not enough so a paired set of the information is formed and then finding the odd ones out tell us the required information. For example, we need a Fact table that contains the information about total leaves taken by all students and then another set (Factless table) is required that stores the information about total students registered in a particular class. Then if we subtract those students who took a leave from the total number of registered students, we can understand which students did not bother to take leave.

How to use FactLess FACT Table

Let us now look at a common example of how to use a Factless Fact table.

Student-Exam Scenario

Think of a student-exam scenario where many students are registered but not all of them appear at the examination. Now, a Factless table can help us identify which students did not appear for the examination. However, there can be many other ways to use this approach.

Setup FactlessFactDW Database

Let us build a sample database called FactlessFactDW based on the following dimensions:

  1. Student
  2. Exam
  3. Date
  4. Subject

The data warehouse database centers around a Factless table called FactlessExam.

The database design is illustrated as follows:

Student-Exam scenario using Factless FACT Table

Please setup a FactlessDW database with the above objects (tables) by running the following T-SQL code against your SQL instance:

-- (1) Setup FactlessFactDW database
CREATE DATABASE FactlessFactDW;
GO
 
USE [FactlessFactDW]
GO
 
/****** Object:  Table [DimDate]    Script Date: 14-Dec-21 9:30:38 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [DimDate](
    [DateId] [datetime] NOT NULL,
    [YearNumber] [int] NULL,
    [MonthNumber] [int] NULL,
    [DayNumber] [int] NULL,
PRIMARY KEY CLUSTERED 
(
    [DateId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object:  Table [DimExam]    Script Date: 14-Dec-21 9:30:39 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [DimExam](
    [DimExamId] [int] NOT NULL,
    [Name] [nvarchar](50) NOT NULL,
PRIMARY KEY CLUSTERED 
(
    [DimExamId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object:  Table [DimStudent]    Script Date: 14-Dec-21 9:30:39 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [DimStudent](
    [StudentId] [int] NOT NULL,
    [Name] [nvarchar](50) NOT NULL,
    [Age] [int] NOT NULL,
PRIMARY KEY CLUSTERED 
(
    [StudentId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object:  Table [DimSubject]    Script Date: 14-Dec-21 9:30:39 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [DimSubject](
    [SubjectId] [int] NOT NULL,
    [Name] [nvarchar](50) NOT NULL,
PRIMARY KEY CLUSTERED 
(
    [SubjectId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object:  Table [FactlessExam]    Script Date: 14-Dec-21 9:30:39 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [FactlessExam](
    [FactlessExamId] [int] NOT NULL,
    [ExamId] [int] NOT NULL,
    [StudentId] [int] NOT NULL,
    [DateId] [date] NOT NULL,
    [SubjectId] [int] NOT NULL,
PRIMARY KEY CLUSTERED 
(
    [FactlessExamId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT [DimDate] ([DateId], [YearNumber], [MonthNumber], [DayNumber]) VALUES (CAST(N'2021-01-01T00:00:00.000' AS DateTime), 2021, 1, 1)
GO
INSERT [DimDate] ([DateId], [YearNumber], [MonthNumber], [DayNumber]) VALUES (CAST(N'2021-01-02T00:00:00.000' AS DateTime), 2021, 1, 2)
GO
INSERT [DimDate] ([DateId], [YearNumber], [MonthNumber], [DayNumber]) VALUES (CAST(N'2021-01-03T00:00:00.000' AS DateTime), 2021, 1, 3)
GO
INSERT [DimDate] ([DateId], [YearNumber], [MonthNumber], [DayNumber]) VALUES (CAST(N'2021-01-04T00:00:00.000' AS DateTime), 2021, 1, 4)
GO
INSERT [DimDate] ([DateId], [YearNumber], [MonthNumber], [DayNumber]) VALUES (CAST(N'2021-01-05T00:00:00.000' AS DateTime), 2021, 1, 5)
GO
INSERT [DimExam] ([DimExamId], [Name]) VALUES (1, N'Summer Exam 2021')
GO
INSERT [DimExam] ([DimExamId], [Name]) VALUES (2, N'Winter Exam 2021')
GO
INSERT [DimStudent] ([StudentId], [Name], [Age]) VALUES (1, N'Asif', 35)
GO
INSERT [DimStudent] ([StudentId], [Name], [Age]) VALUES (2, N'Mike', 32)
GO
INSERT [DimStudent] ([StudentId], [Name], [Age]) VALUES (3, N'Sarah', 27)
GO
INSERT [DimStudent] ([StudentId], [Name], [Age]) VALUES (4, N'Peter', 28)
GO
INSERT [DimStudent] ([StudentId], [Name], [Age]) VALUES (5, N'Adil', 36)
GO
INSERT [DimSubject] ([SubjectId], [Name]) VALUES (1, N'Data Warehouse')
GO
INSERT [DimSubject] ([SubjectId], [Name]) VALUES (2, N'Business Intellignece')
GO
INSERT [DimSubject] ([SubjectId], [Name]) VALUES (3, N'Data Analysis')
GO
INSERT [DimSubject] ([SubjectId], [Name]) VALUES (4, N'Databases')
GO
INSERT [DimSubject] ([SubjectId], [Name]) VALUES (5, N'Database Testing')
GO
INSERT [FactlessExam] ([FactlessExamId], [ExamId], [StudentId], [DateId], [SubjectId]) VALUES (1, 1, 1, CAST(N'2021-01-01' AS Date), 1)
GO
INSERT [FactlessExam] ([FactlessExamId], [ExamId], [StudentId], [DateId], [SubjectId]) VALUES (2, 1, 2, CAST(N'2021-01-01' AS Date), 1)
GO
INSERT [FactlessExam] ([FactlessExamId], [ExamId], [StudentId], [DateId], [SubjectId]) VALUES (3, 1, 3, CAST(N'2021-01-01' AS Date), 1)
GO

Please note that to make it simple, I am not using the foreign keys to link dimensions with Factless Fact table but in a standard solution you must use foreign keys to link dimensions with the Fact.

Student Attendance for the Examination

Let us now view all those students who actually sat in the examination first by running the following T-SQL script:

-- View all students who appeared in examination
SELECT S.[StudentId]
      , S.Name, E.ExamId, E.SubjectId
FROM [dbo].[FactlessExam] E
  INNER JOIN DBO.DimStudent S ON S.StudentId=E.StudentId

The output is as follows:

All those students who appeared in exam

View All Students Who Did Not Appear in the Examination

Now the Factless table can also help us identify the students who were registered but did not appear at the examination by running the following T-SQL script:

-- View all students who registered but did not appear in examination
SELECT S.[StudentId]
      ,S.Name,E.ExamId,E.SubjectId
FROM [dbo].[FactlessExam] E
  FULL JOIN DBO.DimStudent S ON S.StudentID=E.StudentId
WHERE ExamId IS NULL

The output can be seen as follows:

All those students who did not appear in the exam

Congratulations, you learned about Factless FACT tables in a data warehouse using a simple scenario. See how you can apply this same approach for your BI projects.

Next Steps



Get Started Now - Click here to get your free 14 day trial of SolarWinds Database Insights




get scripts

next tip button



About the author
MSSQLTips author Haroon Ashraf

Haroon Ashraf's interests are Database-Centric Architectures and his expertise includes development, testing, implementation and migration along with Database Life Cycle Management (DLM).

View all my tips



Article Last Updated: 2022-01-17

Comments For This Article




Tuesday, January 25, 2022 - 1:16:52 PM - Haroon Ashraf Back To Top (89696)
This is a very good question.
ExamId column in the factless table can help us to store multiple exams in the same table so we can perform negative analysis on all or some of these exams as well.
The Exam dimension on the other side helps us to understand what each ExamId refers to as it contains the information about different exams that we store from time to time in it and then pass their reference (ExamId) in the factless table.
We can make it bit more interesting by (adding more data and) finding out those exams which did not take place or those subjects which had no exam up till now.

Sunday, January 23, 2022 - 2:41:47 AM - Nick Holt Back To Top (89685)
This model and query assumes that all the students were registered for that one exam doesn't it? What if there were numerous exams? You'd need to have something to handle which exams students were registered for or one of these factless favtvtables for every exam.


download














get free sql tips
agree to terms