Learn about Data Warehousing for Analytical Needs

By:   |   Updated: 2023-08-09   |   Comments (1)   |   Related: More > Data Warehousing


Problem

How can an organization analyze its heterogeneous and normalized data in real-time for decision making purposes?

Every mid-scale to enterprise-level organization has data in many operational systems built over the years with different technologies. These operating systems are built to target one or a few operational processes. However, when analyzing these data is needed, organizations find many difficulties, including technical and business. Further, databases are designed to store short-term data while analysis needs short time and, but primarily, long-term data. In addition, most of the transaction and operational systems are designed for highly normalized data that will lead to negative performance impact when you want to analyze the data as it requires a large number of tables to join.

Solution

A data warehouse is one of the solutions to facilitate the above said problems. A data warehouse is a collection of comprehensive technologies such as ETL tools for data integration from the data sources, data storage, data staging, reporting, cubes, dashboards, etc. It consists of an Enterprise-wide data analysis framework with access to any level of information to the key people who make the strategic decisions in the organization. When designing a data warehouse, it is important to note that it is designed and the framework heavily depends on the domain you are implementing. For example, the decisions you make to implement a data warehouse for a supermarket are totally different from implementing a teaching institute.

diagram of data warehouse

The above data warehouse architecture diagram shows that the data warehouse extracts from multiple data sources, including relational databases, files, web services, images, and videos. Most data warehouses focus on extracting data from standard data such as ERP and HR systems. However, you can enhance the data warehouse's analytical capabilities by extracting social media data and loading it into the data warehouse to increase your business intelligence capabilities.

Designing and bringing data to the data warehouse is one of the most challenging tasks. Once data is in the warehouse, reporting and analysis tasks are much easier. If you wish to perform data mining or predictive tasks, connecting to the data warehouse is better than running them directly on the source systems.

Why Data Warehouses are Essential in Today's World

Just for a minute, imagine that you are running a report from the source systems without a data warehouse. First, you may have to execute queries connected to multiple sources that require distributed transactions. Considering the system's security, operational owners may not allow you to run distributed transactions. Apart from the security concern, distributive queries will add additional overhead to the source systems. As your paramount task is to operate your operational systems smoothly, distributive queries are not entertained.

Further, some databases may not support distributed queries, such as NOSQL databases. In those database systems, you cannot execute distributed queries.

Apart from the distributive queries, operational databases are designed with normalized structures, as operational systems have an equal number of writes and reads. Due to the normalization of table structures, you may have to join multiple tables to achieve some reports, as shown in the following query.

SELECT      Production.ProductCategory.Name,
            YEAR(Sales.SalesOrderHeader.OrderDate) YEAR,
            SUM(Sales.SalesOrderDetail.LineTotal) Amount 
FROM        Sales.SalesOrderDetail 
 INNER JOIN Sales.SalesOrderHeader 
         ON Sales.SalesOrderDetail.SalesOrderID = Sales.SalesOrderHeader.SalesOrderID 
 INNER JOIN Production.Product 
         ON Sales.SalesOrderDetail.ProductID = Production.Product.ProductID 
 INNER JOIN Production.ProductSubcategory 
         ON Production.Product.ProductSubcategoryID = Production.ProductSubcategory.ProductSubcategoryID 
        AND Production.Product.ProductSubcategoryID = Production.ProductSubcategory.ProductSubcategoryID 
 INNER JOIN Production.ProductCategory 
         ON Production.ProductSubcategory.ProductCategoryID = Production.ProductCategory.ProductCategoryID 
        AND Production.ProductSubcategory.ProductCategoryID = Production.ProductCategory.ProductCategoryID
GROUP BY
            Production.ProductCategory.Name,
            YEAR(Sales.SalesOrderHeader.OrderDate)

However, as the data warehouses mainly cater to business end users, it is difficult to imagine that they will execute this type of query. In data warehouse design, denormalized or relaxed normalizations are preferred as the data warehouse has more reads. The above query can be rewritten in data warehousing to achieve the same results.

SELECT      DimDate.CalendarYear,
            DimProduct.EnglishProductName,
            SuM(FactInternetSales.SalesAmount) Amount
FROM        FactInternetSales 
 INNER JOIN DimDate 
         ON FactInternetSales.OrderDateKey = DimDate.DateKey 
 INNER JOIN DimProduct 
         ON FactInternetSales.ProductKey = DimProduct.ProductKey
GROUP BY
            DimDate.CalendarYear,
            DimProduct.EnglishProductName

Predictive Analytics on Data Warehouse

More and more organizations are looking at the options of predictive and prescriptive analytics. Since data warehouses have more clean and organized data, it will be much preferred to model predictive models on the data warehouse. Further, predictive analytics need a large volume of historical data. Typically, operational systems store short-term data. Hence predictive analytics will not be able to execute on the operational systems.

Differences Between a Data Warehouse and Operational Systems

As most users are more familiar with operational systems, it is worth comparing and contrasting a data warehouse's technical and business features with an operational system.

Type of Difference Operational System Data Warehouse
Age of data Current with limited historical Historical
Orientation Process Subject
Consumption Transactional and structured Ad hoc queries and reporting
Granularity Detail records Aggregated
Record Requests Few Many
Database Design Mostly Normalized Denormalization or relaxed normalization
Data Model Relational (OLTP) Multi-Dimensional (OLAP)
Number of Users Many Few
Transactional Processing Primary data from transactions Transformed secondary data with limited derived data
Data Ownership Data owned by Transactional systems Only a consumer with limited ownership
High Availability and Disaster Recovery Extremely needed as data is generated in the transactional systems Not essential, but nice to have

Considering the above differences between the data warehouse and the operational system, it is safe to declare that the data warehouse needs a separate environment rather than using the same operational system for data analytics.

What are the Microsoft Solutions for an Enterprise Data Warehouse (EDW)?

Microsoft is one of the primary organizations providing data warehouse solutions. Since a data warehouse has comprehensive technologies, many components will cover the requirements of a data warehouse, as shown in the following table.

Component On-Premises Solution Cloud Data Warehouse Solution
Extract-Transform-Load SQL Server Integration Services (SSIS) Azure Data Factory (ADF)
Storage SQL Server Relational Database Engine Azure DW Azure DB Azure Data Lake (ADL)
Cube SQL Server Analysis Service – Tabular SQL Server Analysis Service – MDM Azure Analysis Services
Visualization SQL Server Reporting Services Power BI

Sample Data Warehouse

Microsoft has a sample data warehouse for developers to identify the features of a data warehouse. Microsoft has provided mainly two databases: AdventureWorks and AdventureWorksDW. You can download those databases from AdventureWorks sample databases - SQL Server | Microsoft Learn.

Next Steps
  • Examine the structure of the Microsoft sample data warehouse and identify the differences between operational and data warehouse structures.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Dinesh Asanka Dinesh Asanka is a 10 time Data Platform MVP and frequent speaker at local and international conferences with more than 12 years of database experience.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2023-08-09

Comments For This Article




Friday, February 16, 2024 - 8:01:53 AM - Victoria Mostova Back To Top (91976)
I appreciate your concise explanation of what a data warehouse is and its significance in business intelligence. Your article complements the detailed insights I gained from another source at https://www.linkedin.com/pulse/what-enterprise-data-warehouse-its-value-business-intelligence-nhogf/. Both pieces emphasize the pivotal role of a well-structured data warehouse in enhancing data analytics and decision-making processes for organizations. Well done!














get free sql tips
agree to terms