A dimensional model is developed in accordance to the analytical requirements of the user in alignment to the data available. The model developed is expected to easily host data and should be flexible enough to support future changes. The end output of this model is a relational database that would act as the source system for data marts / cubes. If you recall from the architecture diagram, there is no Extract, Transform and Load (ETL) to facilitate data transfer between the Data Mart and the Data Warehouse. Hence this dimensional model should also facilitate direct and quick access to the data to a great extent. We would look at how to develop a dimensional model for our business scenario.
Before we move ahead with the discussion of a dimensional model, take a close look at the below diagram. Our entire discussion of the dimensional modeling is focused on this diagram.
1) In dimensional modeling practice, there are generally two theories for developing a dimensional model - Kimball method and Inmon method of data modeling. We would be focusing on the Kimball method of dimensional modeling. Consider reading this article to understand more on the similarities and differences of these two methods.
2) Kimball method of dimensional model is developed using a BUS architecture. In this architecture, a common set of dimensions are identified for a set of business processes which would be shared across a set of data marts. In our business scenario we are focusing on the sales process. And if we look at the OLTP database and the tables we have modeled, we would find the some of the common dimensions that would be required for sales are Products, Customers and Sales Geography.
3) There are two types of dimensional model schemas - star schema and snowflake schema. For simple dimensional models a star schema should be sufficient. But as your dimension tables start growing in terms of number of attributes as well as data, you may want to normalize the data and then it takes the shape of a snowflake schema. Consider reading more about snowflake schema from here.
4) Dimensions can be perceived as master tables which contain qualitative attributes related to the transactions. Facts can be considered as transaction tables that stores quantitative and at times few qualitative attributes of the transaction. Each attribute of a Fact table is known as a measure. For example, SalesAmount in the FactInternetSales table is a measure. Closely observe the fields in the fact table as well as dimension tables and you would be able to relate the quantitative and qualitative nature of the fields.
Dimension tables encapsulate the attributes associated with facts and separate these attributes into logically distinct groupings, such as time, geography, products, customers, and so forth. A dimension table may be used in multiple places if the data warehouse contains multiple fact tables or contributes data to data marts. For example, a product dimension may be used with a sales fact table and an inventory fact table in the data warehouse, and also in one or more departmental data marts.
The records in a dimension table establish one-to-many relationships with the fact table. For example, there may be a number of sales to a single customer, or a number of sales of a single product. The dimension table contains attributes associated with the dimension entry; these attributes are rich and user-oriented textual details, such as product name or customer name and address.
5) Observe the numerous keys in the fact table. In the dimensions you would find a key that is the primary key of the table. It is called a surrogate key that is used maintain the data in the dimension if the key attributes of the dimension change. You can read more about surrogate keys from here.
6) Granularity of the data (also known as grain) in facts and dimension tables should be at the same level. For example, its not advisable to store data at an yearly level in some fact/dimension tables and some data at monthly level in other fact/dimension tables. While sourcing data through the ETL process into the warehouse, it should be ensured that the grain of the data in the entire dimensional model is kept in sync.
7) Date is a special dimension which is assured to remain constant and can be pre-populated. We can add attributes to the date dimension depending upon the attributes on which we intend to slice and dice the data in the fact table. AdventureWorks has a standard and elaborate date dimension in the DW database.
8) Consider the below report and try to associate different measures with dimension members by studying the above data model. For example, can we join the InternetSalesAmount field from FactInternetSales table with EnglishProductCategoryName in the DimProductCategory table. This will help you answer whether you would be able to analyze any particular measure with the intended dimension member.