By: Siddharth Mehta
Overview
A dimensional model is developed in accordance to the analytical requirements of the user and alignment with the data available in the source data warehouse. The model developed is expected to easily host data and should be flexible enough to support future changes. The end output of this model is a relational database that would act as the source system for a SSAS data model. Hence this dimensional model should also facilitate direct and quick data access. In this chapter, we will look at some general guidelines for developing a dimensional model.
Explanation
Before we move ahead with the discussion of a dimensional model, take a close
look at the below diagram. Our entire discussion of the dimensional modeling is
focused on this diagram. This is an excerpt of a data model in AdventureWorks DW.
You can download it from
here and once you restore this database backup, you can analyze the database
model by creating a database diagram.
1) In practice, there are generally two theories for developing a dimensional model - Kimball method and Inmon method of data modeling. We will be focusing on the Kimball method of dimensional modeling. Consider reading this article to understand more on the similarities and differences of these two methods.
2) The Kimball method of dimensional modeling is developed using a BUS architecture. In this architecture, a common set of dimensions are identified for a set of business processes which would be shared across a set of data marts. In our business scenario we are focusing on the sales process. If we look at the OLTP database and the tables we have modeled, we will find some of the common dimensions that would be required for sales are Products, Customers and Sales Geography.
3) There are two types of dimensional model schemas - star schema and snowflake schema. For simple dimensional models a star schema should be sufficient, but as your dimension tables start growing in terms of number of attributes as well as data, you may want to normalize the data and then it takes the shape of a snowflake schema. Consider reading more about snowflake schemas here.
4) Dimensions can be perceived as master tables which contain qualitative attributes related to the transactions. Facts can be considered as transaction tables that store quantitative and at times few qualitative attributes of the transaction. Each attribute of a Fact table is known as a measure. For example, SalesAmount in the FactInternetSales table is a measure. Closely observe the fields in the fact table as well as dimension tables and you will be able to relate the quantitative and qualitative nature of the fields.
Dimension tables encapsulate the attributes associated with facts and separate these attributes into logically distinct groupings, such as time, geography, products, customers, and so forth. A dimension table may be used in multiple places if the data warehouse contains multiple fact tables or contributes data to data marts. For example, a product dimension may be used with a sales fact table and an inventory fact table in the data warehouse, and also in one or more departmental data marts.
The records in a dimension table establish one-to-many relationships with the fact table. For example, there may be a number of sales to a single customer, or a number of sales of a single product. The dimension table contains attributes associated with the dimension entry; these attributes are rich and user-oriented textual details, such as product name or customer name and address.
5) Observe the numerous keys in the fact table. In the dimensions you will find a key that is the primary key of the table. It is called a surrogate key that is used maintain the data in the dimension if the key attributes of the dimension change. You can read more about surrogate keys here.
6) Granularity of the data (also known as grain) in facts and dimension tables should be at the same level. For example, it's not advisable to store data at an yearly level in some fact/dimension tables and some data at monthly level in other fact/dimension tables. While sourcing data through the ETL process into the warehouse, it should ensure that the grain of the data in the entire dimensional model is kept at the same increment.
7) Date is a special dimension which is assured to remain constant and can be pre-populated. We can add attributes to the date dimension depending upon the attributes on which we intend to slice and dice the data in the fact table. AdventureWorks has a standard and elaborate date dimension in the DW database.
Additional Information
- Consider reading this article for extensive explanation on Data Warehouses.
- Consider using this technical reference for dimensional modeling related techniques.