Learn more about SQL Server tools

mssqltips logo
giveaway
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Comparing Data Warehouse Design Methodologies for Microsoft SQL Server


By:   |   Read Comments (8)   |   Related Tips: > Analysis Services Development

Attend a SQL Server Conference for FREE >> click to learn more


Problem

In my last couple of tips, I talked about the importance of a Business Intelligence solution, why it is becoming priority for executives, what a typical Business Intelligence system architecture looks like, etc. In this tip, I going to talk in detail about how a data warehouse is different from operational data store and the different design methodologies for a data warehouse.

Solution
This tip is going to cover Data Warehouses (DW, sometime also called an Enterprise Data Warehouse or EDW), how it differs from Operational Data Store (ODS) and different Data Warehouse design methodologies.

Enterprise Data Warehouse (EDW or DW) Vs. Operational Data Store (ODS)

The purpose of the Data Warehouse in the overall Business Intelligence Architecture is to integrate corporate data from different heterogeneous data sources in order to facilitate historical and trend analysis reporting. It acts as a central repository and contains the "single version of truth" for the organization that has been carefully constructed from data stored in disparate internal and external operational databases\systems. For better performance, mostly data in data warehouse will be in de-normalized form which can be categorized in either star or snowflake schemas (more on this in the next tip).

The purpose of the Operation Data Store (ODS) is to integrate corporate data from different heterogeneous data sources in order to facilitate real time or near real time operational reporting. Often data in the ODS will be in structured similar to the source systems, although during integration it can involve data cleansing, de-duplication and can apply business rules to ensure data integrity. An ODS is mainly intended to integrate data quite frequently at the lowest granular level for operational reporting in a close to real time data integration scenario. Normally, an ODS will not be optimized for historical and trend analysis on huge set of data.

Let's summarize the differences between an ODS and DW:

  • An ODS is meant for operational reporting and supports current or near real-time reporting requirements whereas a DW is meant for historical and trend analysis reporting on a large volume of data
  • An ODS is targeted for low granular queries whereas a DW is used for complex queries against summary-level or on aggregated data
  • An ODS provides information for operational, tactical decisions about current or near real-time data acquisition whereas a DW delivers feedback for strategic decisions leading to overall system improvements
  • In an ODS the frequency of data load could be hourly or daily whereas in an DW the frequency of data loads could be daily, weekly, monthly or quarterly

Data Warehouse Design Methodologies

There are two different methodologies normally followed when designing a Data Warehouse solution and based on the requirements of your project you can choose which one suits your particular scenario. These methodologies are a result of research from Bill Inmon and Ralph Kimball.

Bill Inmon - Top-down Data Warehouse Design Approach

Bill Inmon is sometimes also referred to as the "father of data warehousing"; his design methodology is based on a top-down approach and defines data warehouse in these terms

  • Subject oriented - The data in a data warehouse is categorized on the basis of the subject area and hence it is "subject oriented".
  • Integrated - Data gets integrated from different disparate data sources and hence universal naming conventions, measurements, classifications and so on used in the data warehouse.  The data warehouse provides an enterprise consolidated view of data and therefore it is designated as an integrated solution.
  • Non-volatile - Once the data is integrated\loaded into the data warehouse it can only be read. Users cannot make changes to the data and this practice makes the data non-volatile.
  • Time Variant - Finally data is stored for long periods of time quantified in years and has a date and timestamp and therefore it is described as "time variant".

Bill Inmon saw a need to integrate data from different OLTP systems into a centralized repository (called a data warehouse) with a so called top-down approach. Bill Inmon envisions a data warehouse at center of the "Corporate Information Factory" (CIF), which provides a logical framework for delivering business intelligence (BI), business analytics and business management capabilities.

Enterprise Data Warehouse (EDW or DW) Vs. Operational Data Store (ODS)

This top-down design provides a highly consistent dimensional view of data across data marts as all data marts are loaded from the centralized repository (Data Warehouse). The top-down design has also proven to be flexible to support business changes as it looks at the organization as whole, not at each function or business process of the organization. Generating a new dimensional data marts against the data stored in the data warehouse is a relatively simple task. Though there are some challenges for the top-down approach, for example it represents a very large project with a very broad scope and hence the up-front cost for implementing a data warehouse using the top-down methodology is significant. Further, the duration of time from the start of project to the point that end users start experience initial benefits of the solution can be substantial. Also, the top-down methodology can be inflexible and unresponsive to changing departmental or business process needs (a concern for today's dynamically changing environment) during the implementation phase.

Ralph Kimball - Bottom-up Data Warehouse Design Approach

Ralph Kimball is a renowned author on the subject of data warehousing. His design methodology is called dimensional modeling or the Kimball methodology. This methodology focuses on a bottom-up approach, emphasizing the value of the data warehouse to the users as quickly as possible. In his vision, a data warehouse is the copy of the transactional data specifically structured for analytical querying and reporting in order to support the decision support system. As per his methodology, data marts are first created to provide reporting and analytical capabilities for specific business\functional processes and later on these data marts can eventually be unioned together to create a comprehensive enterprise data warehouse. The bottom-up approach focuses on each business process at one point of time so the return on investment could be as quick as first data mart gets created. Though if not carefully planned, you might lack the big picture of the enterprise data warehouse by missing some dimensions or by creating redundant dimensions, etc. when you are too focused on an individual business process.

Data Warehouse Design Methodologies

Ralph Kimball's bottom-up approach proposes to create a business matrix which should contain all the common elements (that are used by data marts such as conformed\shared dimension, measures, etc.) defined for the enterprise as whole. With this, the user can design and develop solutions which supports doing analysis across the business processes for cross selling. You can learn more about the matrix here.

For a person who wants to make a career in Data Warehouse and Business Intelligence domain, I would recommended studying Bill Inmon's books (Building the Data Warehouse and DW 2.0: The Architecture for the Next Generation of Data Warehousing) and Ralph Kimball's book (The Microsoft Data Warehouse Toolkit).

Next Steps


Last Update:


signup button

next tip button



About the author
MSSQLTips author Arshad Ali Arshad Ali is a SQL and BI Developer focusing on Data Warehousing projects for Microsoft.

View all my tips





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


SQL tips:

*Enter Code refresh code     



Thursday, September 15, 2016 - 1:12:51 PM - kanya Back To Top

 excellent explanation.

Thank you.

 


Monday, September 08, 2014 - 12:34:27 PM - AZJim Back To Top

I have attended both training methodologies and prefer Kimball's.  I found it much more straight forward and "ready to go".  But this is a subjective statement and each database architect might have their own preferences.  But Kimball has the benefit of starting small and growing.  When my old company tried the Inmon approach, it failed.  We could not get enough upper management support to build a glorious data warehouse in the Inmon fashion.  It was too big a task and data administrators ended up with "analysis paralysis".  Afterwards, we started again on a smaller scale and it was successful.  Sure, we had duplicate data elements across the various data marts.  But then it got the various organizations to understand who was the true data owner -- a decision that no DBA or Data Adminstrator should make by themselves.  When the final "data warehouse" was built, it had a consensus by management.


Wednesday, July 03, 2013 - 10:40:04 AM - Satish Back To Top

Thank you Arshad !

I will follow your articles regularly. Thank you again for sharing your knowledge.




Thursday, June 27, 2013 - 5:08:58 AM - Hennie de Nooijer Back To Top

A couple of years ago I've investigated the differences between an Inmon- and a Kimball like architecture in more detail. There are even scientific papers available. Please read my blog about a comparison betweeen Kimball en Inmon: http://bifuture.blogspot.nl/2010/10/kimball-vs-inmon-part-ii-its-now.html

 

Greetz,

 

Hennie de Nooijer

 


Wednesday, June 26, 2013 - 2:39:47 PM - Frank Back To Top

 

Arshad, thank you for the article

The Kimball methodology is certainly, as you wrote, based, on start schemas and multidimensional modeling. In addition, the Kimball paradigm is more suitable for designing and developing Cubes, than the Inmon methodology.

In my opinion, Kimball is better for OLAP than Inmon because it reduces the number of joints improving the retrieval of datasignificantly, as denormalized databases are better for DQL (SELECT), which is the main target of OLAP. Finally, Kimball is presented in the vocabulary of business and, therefore, it is easy to understand it by business people.

Despite the fact that Kimball recommends to start small, which is in tandem with a data mart approach, the methodology does not enforce top or bottom up development. Kimball methodology is widely used in the development of Data Warehouse.

Advances in technology are making the traditional DW obsolete as well as the needs to have separated ODS and DW. The differences between operational data store ODS and DW have become blur and fuzzy.


Tuesday, June 25, 2013 - 9:29:47 AM - Arshad Back To Top

Hi Jim Frayer and Hennie de Nooijer,

Thanks for bringing out additional design methodologies, these will be helpful for the readers.


Tuesday, June 25, 2013 - 7:53:33 AM - Hennie de Nooijer Back To Top

Agree. Don't forget Anchor modeling...

Please read my blog : http://bifuture.blogspot.nl/2012/03/four-different-datamodeling-methods.html

 

 


Monday, June 24, 2013 - 3:52:05 PM - Jim Frayer Back To Top

 

Arshad, your data and methodologies are very outdated.

 

Concerning Bill Inmon:

 

Further, the duration of time from the start of project to the point that end users start experience initial benefits of the solution can be substantial. Also, the top-down methodology can be inflexible and unresponsive to changing departmental or business process needs (a concern for today's dynamically changing environment) during the implementation phase.

 

This was accurate 10-15 years ago but not now. We deliver agile phases every 3-4 weeks now using the Data Vault methodology that Bill Inmon supports and talks about.

 

http://inmoninstitute.com/about/index

  Data Warehousing concepts: Kimball vs. Inmon vs. Hybrid vs. Data Vault

  • Ralph Kimball - bottom-up design: approach data marts are first created to provide reporting and analytical capabilities for specific business processes.
  • Bill Inmon - top-down design: 1st author on the subject of data warehouse, as a centralized repository for the entire enterprise. Data warehouse design using normalized enterprise data model.
  • Hybrid design: data warehouse solutions often resemble hub and spoke architecture. Legacy systems feeding the DW/BI solution often include CRM and ERP, generating large amounts of data. To consolidate these various data models, and facilitate the ETL process, DW solutions often make use of an operational data store (ODS). The information then parsed into the actual DW.
  • Data Vault Modeling: is a hybrid design, consisting of the best of breed practices from both 3rd normal form and star-schema.

 Since you represent a vendor and not a methodology the least you can do is present the current technology and all the facts about the industry.

 


Learn more about SQL Server tools