Overview of Big Data Stores in the Cortana Intelligence Suite
I have read the earlier tips on Introduction to Microsoft Cortana Intelligence Suite and Introduction to Information Management in Cortana Intelligence Suite and now I would like to know more about the data storage aspects of the Cortana Intelligence Suite.
Data Storage is another important pillar of the Cortana Intelligence Suite and is important in determining how securely, reliably, and durably data can be stored and how efficiently and effectively the data can be accessed. In this tip we will look at the two major Azure offerings which are meant for this very purpose of data storage in the Cortana Intelligence Suite.
Overview of Big Data Stores
Azure Blob Storage and Azure SQL Database have been around for quite some time for storage of regular structured / unstructured and relational data. These are designed to be more of general purpose stores for the respective data formats. The Cortana Intelligence Suite offers the following two offerings, as part of the Big Data Stores pillar, specifically designed and optimized for Big Data and Analytics Workloads with the ability to store and manage the modern data volumes and formats at scale and with high degree of elasticity. We will cover: Azure Data Lake Store and Azure SQL Data Warehouse.
Azure Data Lake Store
The Azure Data Lake Store is a hyper-scale data storage repository specifically designed and optimized for Big Data workloads. It is designed to be a common repository with the ability to capture and store the incoming data with Big Data's 3V characteristics (Volume, Velocity, and Variety).
Following is a block diagram showing the typical usage of Azure Data Lake Store and the various data sources / formats that can be stored in the Data Lake Store and which all downstream systems / applications can connect to Data Lake Store.
As we can see from the above diagram, both structured as well as unstructured data from sources like Applications, Sensors, Devices, Social Media, Web Logs, and various other Big Data Sources and IoT Systems can be extracted and stored in the Data Lake Store in its native format to be further processed using other Azure offerings like Azure Data Lake Analytics, Azure HDInsight, Azure Stream Analytics, Azure Machine Learning, Power BI, etc.
Here are the highlights of Azure Data Lake Store:
- Incoming data in any format, structured or unstructured, can be stored in its native format in the Data Lake Store
- There is no limit on the size of files that can be stored on Data Lake Store and files can range from Gigabytes to Terabytes to Petabytes
- It is a Hadoop Distributed File System (HDFS) compatible storage in the Azure Cloud
- Optimized for Big Data and Analytics Workloads offering massive throughputs
- It is a highly secure, durable, and reliable storage offering high availability
- It is integrated with Azure Active Directory enabling simplified identity and access management
- Data can be organized in a hierarchical directory structure and access to the individual directories and sub-directories including the root directory can be controlled using Access Control Lists (ACLs)
- Offers monitoring and auditing capabilities and all the data access and management activities are logged
- Data Lake Store offers the following different ways for account management and file system management in Data Lake Store: Azure Portal, PowerShell, REST APIs, .NET SDK, Java SDK, Azure CLI, and Node.js
Refer to the following resources to learn more about Azure Data Lake Store:
- Azure Data Lake Store Pricing
- Getting Started with Azure Data Lake Store
- Azure Data Lake Store Learning Path
Azure SQL Data Warehouse
The Azure SQL Data Warehouse is a fully managed Data Warehouse, capable of storing, managing, and processing a large volume of data, in the Azure Cloud. It is an enterprise ready offering built on Massively Parallel Processing (MPP) Architecture.
Following is a block diagram showing the typical usage of Azure SQL Data Warehouse with various data sources / formats that can be stored / managed in the SQL Data Warehouse and various downstream systems / applications that can connect to SQL Data Warehouse and consume data from it.
As we can see from the above block diagram, the Azure SQL Data Warehouse can connect to and query the data from various storage systems including Azure SQL Database, Azure Table Storage, Azure Blob Storage, Azure Data Lake Store (which in-turn stores and manages data coming from various sources including streaming sources and systems like Azure Event Hubs, Azure Stream Analytics etc.), and other compatible sources.
Data from Azure SQL Data Warehouse can be consumed by Reporting, Business Intelligence, and Advanced Analytics applications including Power BI, Microsoft Excel, SQL Server Reporting Services (SSRS), Azure Machine Learning, and other compatible systems.
Here are the highlights of Azure SQL Data Warehouse:
- It is a distributed Data Warehouse with storage and compute scaled across multiple nodes
- It is built on the SQL Server Relational Database Engine
- Comes with PolyBase integration enabling storage, querying, processing, and management of relational as well as non-relational data
- Allows scaling storage and compute separately which are independent of each other
- Allows scaling up, scaling down, pausing, and resuming compute in a matter of seconds, thereby enabling significant cost savings
- Offers a highly scalable, elastic, available, durable, reliable, and secure storage system and ensures fault tolerance through automatic backups
- Allows querying using the familiar Transact-SQL programming language and supports various traditional SQL constructs including Stored Procedures, User Defined Functions, Partitioning, Indexing, etc.
- Provides Clustered Columnstore Indexes by default for better storage compression and high query performance
- Resources are allocated in Data Warehouse Units (DWUs) which is a measure of the amount of CPU, Memory, and IOPS allocated
- Fully integrated with familiar SQL Server BI tools - SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS), and SQL Server Reporting Services (SSRS)
Refer to the following resources to learn more about Azure SQL Data Warehouse:
- Azure SQL Data Warehouse Pricing
- Getting Started with Azure SQL Data Warehouse
- 3rd Party Tools that can integrate with Azure SQL Data Warehouse
- Sign up for free trial of Azure Subscription, if you don't have one already, and start giving the above services a try
- Stay tuned to learn more about the other major components of the Cortana Intelligence Suite
- Check these previous Cortana tips
Last Updated: 2016-09-13
About the author
View all my tips