SQL Server Analysis Services Interview Questions Part II - Dimensions
Data Warehousing (DW) and OLAP are very vast areas and are very essential areas when it comes to supporting critical business decisions. It is important to have a good understanding of the subject before attending an interview in the DW/OLAP space. Hence it is essential to understand all the major components with each one in as much detail as possible. Dimensions being one of the major aspects of OLAP/SSAS, it is good to understand some basic Dimension Concepts and Types of Dimensions, which we will outline in this tip.
In the previous tip, I have covered some of the questions on Basic Concepts, Data Sources, and Data Source Views in SQL Server Analysis Services. In this tip, I will be covering some of the basic Dimension Concepts and will discuss some of the most common types of dimensions.
What is a SQL Server Analysis Services Dimension?
A Dimension represents a set of contextual data about the transactional data that is stored in the fact table(s). It basically forms the story line around the data by providing context and thereby helping the users to understand and make sense out of the data.
Example: Let us say that there is a retail chain and the CFO of the chain announces that the "Total Sales is $10,00,000". In this case, $10,00,000 is merely a number and does not provide any valuable information and does not make any sense to the user.
Now let us say that the CFO of the chain makes a slight change in the above announcement and says "Total Sales for FY 2012 is $10,00,000". This makes some sense and provides some amount of context which in this case is sales amount mentioned is for Financial Year 2012 (FY2012). FY2012 represents a date (time) component and hence it represents a Date (Time) Dimension.
Similarly, we can go on and add additional Dimensions to this data (Fact Table/Cube) to provide more context about the data (in this scenario $10,00,000 is the data we are talking about). Say we change the above statement to something like "Total Sales of Facial Tissues for FY 2012 in California is $10,00,000". This gives much better context compared to the previous two statements. In this statement, Facial Tissues represents a Product which introduces us to Product Dimension and California represents a Geographical Location which introduces us to Geography Dimension. In this manner, we can add as many dimensions as we want, to give better context to the data, so that end users can analyze the data from different dimensions and take more effective decisions.
Here are some highlights of Dimension(s):
- It represents contextual information which adds context/meaning to the data being analyzed.
- Helps in viewing/analyzing the data from different dimensions/angles to get a better understanding of the data.
- A dimension is organized in the form of Attributes and Hierarchies.
- Here are some examples of a Dimension:
- Time Dimension
- Product Dimension
- Geography Dimension
- Customer Dimension
- Employee Dimension
- Here are some examples of a Dimension Attribute
- Year, Quarter, Month etc. in case of a Time Dimension
- Color, Size etc. in case of a Product Dimension
What are Natural Key and Surrogate Key in SQL Server Analysis Services?
A Natural Key is a type of key in a table which uniquely identifies each record and has a business meaning attached to it.
Example: Incident/Ticket Number, SSN, Employee Email Alias, etc.
Here are some highlights of a Natural Key:
- This can be numeric, string, a combination of both etc. depending on the type of data you are dealing with. In case of non-numeric data, JOINs are slower and occupies more space than an integer.
- History cannot be maintained if used as a Primary Key.
- Merging data from different source systems can be difficult when there is a conflict in the type of value coming from each source system.
A Surrogate Key is a type of key in a table which uniquely identifies each record, but has no business meaning attached to it. It is merely a value used to uniquely identify a record in a table.
Example: Identity Columns, GUID (Unique identifier), etc.
Here are some highlights of a Surrogate Key:
- Most of the times this is a numeric value (like INT, BIGINT, etc.) and hence the JOINs are faster and occupies less space than character based data.
- Allows for maintenance of history data (SCD Type 2) when used as a Primary Key.
- Data coming from multiple source systems can be easily integrated (especially in the scenarios like Mergers, Acquisitions etc. in the industry).
What is a Hierarchy in SQL Server Analysis Services? What are the different types of Hierarchies? Explain each one of them.
A Hierarchy is a collection of one or more related Attributes which are organized in a Parent-Child fashion. An Attribute at a higher level is a parent of an Attribute at the next level and so on. They provide a very convenient way to perform roll up/drill down analysis and helps in rolling up and drilling down the numeric data in a very effective manner.
In analogy, it is like your Bing maps. If you want to locate a particular place and if you want to do it manually (without doing a text search) then you usually first locate the Country, then locate the State within the Country, after that locate the City within the State, and finally locate the place you are looking for within the City.
Example: A Calendar Hierarchy might contain Attributes like Year, Quarter, Month, and Day which are organized as a Hierarchy with Year as a parent of Quarter, Quarter as a parent of Month, and Month as a parent of Day.
Here are the common types of Hierarchies:
- Natural Hierarchy: A Natural Hierarchy is a Hierarchy in which Every Child has a Single Parent. Example: A Calendar Hierarchy.
- Balanced Hierarchy: A Balanced Hierarchy is a Hierarchy in which no matter through what path we traverse the Hierarchy, there is a Member at every level and every path has the same number of Levels. Example: A Calendar Hierarchy.
- Unbalanced Hierarchy: An Unbalanced Hierarchy is a Hierarchy in which number of members and number of Levels varies across different branches of the Hierarchy. In this type of a Hierarchy, Leaf Level Members might belong to different Levels. Example: An Employee/Organization Hierarchy where in say 1 manager (Manager 1) has 2 or more people reporting to him and who belong to different Levels say L1 & L2. On the other hand say there is another manager (at the same level as Manager 1) who has 2 or more people reporting to him and all of them belonging to the same Level say L1.
- Ragged Hierarchy: A Ragged Hierarchy is a Hierarchy in which irrespective of the path you use to traverse, every path has the same number of Levels but not every level is guaranteed to have members except for the Top most Level and Bottom most Level (Leaf Level). Example: A Geography Hierarchy.
What is a Slowly Changing Dimension in SQL Server Analysis Services? What are the different types of Slowly Changing Dimensions?
The Slowly Changing Dimension (SCD) concept is basically about how the data modifications are absorbed and maintained in a Dimension Table. In an SCD the contents/members change over a period of time. Apart from the existing members which are modified, new records are added to the dimension similar to any other dimension.
Here are the major 3 types of Slowly Changing Dimensions:
- Type 1: In this type of SCD, modifications to the dimension members are absorbed by overwriting the existing member data present in the Dimension Table. Example: Say you have an Address Dimension which contains addresses of all the employees in an organization. When the address of an employee changes, then the old data (address) in the Dimension is overwritten with the new data (address) and there is no way to get the old address from the table.
- Type 2: In this type of SCD, modifications to the dimension members are absorbed by adding it as a new record into the Dimension Table. Meaning, the history of all the changes happening to the dimension members is maintained by creating a new record each time a dimension member is modified. The new (modified) record and the old record(s) are identified using some kind of a flag like say IsActive, IsDeleted etc. or using Start and End Date fields to indicate the validity of the record. New dimension members are inserted into the Dimension Table with appropriate flag value which indicates that this is the active/current record. Unlimited amount of history can be maintained in this type of SCD. Example: Considering the above example of an Address Dimension. When the address of an employee changes, then the old data (address) in the Dimension Table is retained as it is and the flag is updated to indicate that this is a historical record. The modified data (record) is inserted as a new record into the table and the flag is updated to indicate that this is the current record.
- Type 3: In this type of SCD, modifications to the dimension members are absorbed with the help of additional columns in the Dimension Table. In this SCD type, the amount of history data (number of modifications) that can be maintained is limited by the number of columns (fixed) in the Dimension Table, which are used for tracking the history details. This is decided at the time of designing the Data Mart. Example: Considering the above example of an Address Dimension. When the address of an employee changes, then the old data (address) in the Dimension is retained as it is and the new address is updated in a separate column to indicate that this is the latest address.
Apart from the above listed 3 types of SCDs, there are other types like Type 0, Type 4, Type 5, & Type 6 etc. which are used rarely.
What is a Parent-Child Dimension in SQL Server Analysis Services? Explain with an example.
A Parent-Child Dimension is a Dimension in which two attributes in the same dimension are related to each other and they together define the linear relationship among the dimension members. The first attribute which uniquely identifies each dimension member is called the Member Key Attribute and the second attribute which identifies the parent of a dimension member is called a Parent Key Attribute.
Example: Consider an Employee Dimension which has EmployeeID as a Primary/Unique Key which uniquely identifies each Employee in the organization. The same Employee Dimension contains another attribute as ManagerID which identifies the Manager of an Employee and ManagerID contains one of the values present in the EmployeeID since Manager is also an Employee, hence he/she also has a record for self in the Employee Table. In this scenario, EmployeeID is the Member Key Column and ManagerID is the Parent Key Column.
Here are some highlights of a Parent-Child Dimension:
- Both the Parent and the Child members exist in the same dimension.
- There is a Parent-Child relationship between different members (rows) of the same dimension.
- Parent Key Column in the dimension table is a Foreign Key Column (Can be physical or logical) which refers to the Member Key Column which is a Primary/Unique Key Column in the same dimension table.
What is a Role-Playing Dimension in SQL Server Analysis Services? Explain with an example.
A Role-Playing Dimension is a Dimension which is connected to the same Fact Table multiple times using different Foreign Keys. This helps the users to visualize the same cube data in different contexts/angles to get a better understanding and make better decisions.
Example: Consider a Time Dimension which is joined to the same Fact Table (Say FactSales) multiple times, each time using a different Foreign Key in the Fact Table like Order Date, Due Date, Ship Date, Delivery Date, etc. Essentially there is only one single physical dimension called Date Dimension. However, it is joined multiple times to the Fact Table to help the users to visualize the cube data in the context of different dates.
Here are some highlights of a Role-Playing Dimension:
- It is a single physical Dimension Table.
- Same Dimension Table connects to the same Fact Table multiple times using different Foreign Keys from the Fact Table.
- When a Role-Playing Dimension is added to the cube, it appears as a different dimension (one instance for each Foreign Key to which it is joined) to the end users and hence playing multiple roles.
What is a Conformed Dimension in SQL Server Analysis Services? Explain with an example.
A Conformed Dimension is a Dimension which connects to multiple Fact Tables across one or more Data Marts (cubes). A Confirmed Dimension is physically implemented across multiple Data Marts with exactly the same structure, attributes, values (dimension members), meaning and definition. Each instance means exactly the same thing providing the exact same context irrespective of the Fact Table to which it is connected or the Data Mart to which it belongs.
Example: A Date Dimension has exactly the same set of attributes, same members and same meaning irrespective of which Fact Table it is connected to or to which Data Mart it belongs to as long as it belongs to the same organization. For instance, a Fiscal Calendar is exactly the same with same start and end dates across all the departments within an organization.
Here are some highlights of Conformed Dimensions:
- Same Dimension joins to multiple Fact Tables or is used across multiple Data Marts.
- It is a master dimension and is used across multiple dimensional models.
- Each instance of Conformed Dimensions are exactly the same in every aspect including the attribute names, definitions, etc.
What is a Degenerate Dimension in SQL Server Analysis Services? In what scenarios do you use it?
A Degenerate Dimension is a Dimension which is derived out of a Fact Table and it appears to the end user as a separate/distinct Dimension, its data is actually stored in the Fact Table. It's a Dimension table which does not have an underlying physical table of its own.
Degenerate Dimensions are commonly used when the Fact Table contains/represents Transactional data like Order Details, etc. and each Order has an Order Number associated with it, which forms the unique value in the Degenerate Dimension.
Example: Degenerate Dimensions having unique Order Numbers can be used to identify the various items sold as part of a particular order.
Here are some highlights of Degenerate Dimension:
- It is derived from the Fact Table and does not have an underlying physical Dimension Table of its own.
- It is also called as a Fact Dimension.
- Since these dimensions are built on top of Fact Table, these are usually very large dimensions.
- The attribute of a Degenerate Dimension is not a Foreign Key in the Fact Table.
What is a Junk Dimension in SQL Server Analysis Services? In what scenarios do you use it?
A Junk Dimension is often a collection of Unrelated Attributes like indicators, flags, codes, etc. This Dimension usually contains data which cannot be created as a separate Dimension as they tend to be too small (often) and are not worth having a separate Dimension.
These Dimensions can be used in various scenarios, but one of the common scenarios is when a Fact Table contains a lot of Attributes which are like indicators, flags, etc. Using Junk Dimensions, such Attributes can be removed/cleaned up from a Fact Table.
Example: It contains values like Yes/No, Pending/In Progress/Completed, Open/Resolved/Closed, etc.
Here are some highlights of a Junk Dimension:
- It is also called as a Garbage Dimension.
- Junk Dimensions are usually small in size.
- It is a convenient way to consolidate many Smaller Dimensions which are not really worth having as a separate Dimension.
- Check out the SQL Server Analysis Services tutorial
- Check out the following additional resources:
About the author
View all my tips