I have read the earlier tips on the Microsoft Cortana Intelligence Suite and started exploring this technology. Now I would like to know more about the technologies that enable data processing and gaining insights into this data and some pointers on getting started.
Data Analytics in today's world involves processing large volumes of data coming in from heterogeneous data sources at varying speeds and making it available for further processing and consumption by algorithms, intelligent systems, and reporting applications. In this tip, we will look at the third pillar of Cortana Intelligence Suite which contains offerings to do exactly this.
Overview of Machine Learning and Analytics
Cortana Intelligence Suite comprises of the following offerings as part of Machine Learning and Analytics pillar, enabling businesses to process large volumes of data, gain insights, and to derive predictions using machine learning algorithms:
- Azure Stream Analytics
- Azure HDInsight
- Azure Data Lake Analytics
- Azure Machine Learning
Azure Stream Analytics
Azure Stream Analytics is a fully managed cloud based real-time event processing engine enabling businesses to gain insights from streaming data in real-time. Following is a typical use case representation of Azure Stream Analytics.
As we can see from the above diagram, Azure Stream Analytics can be used in conjunction with Azure Event Hubs which can ingest millions of events per second and make the events available to services like Azure Stream Analytics for further processing to gain insights into the data.
Here are few highlights of Azure Stream Analytics:
- Real-time analytics and computations can be performed on data coming from a variety of streaming data sources including devices, sensors, IoT systems, and other such sources.
- Offers a SQL-like language to perform computations on the streaming data in real-time.
- Real-time analytics and computations can be really helpful in various scenarios including fraud detection, asset tracking, stock market analysis, vehicle traffic analysis, anomaly detection, and many more.
- Data from the following sources / systems can be fed into Stream Analytics as input:
- Azure Event Hubs
- Azure IoT Hubs
- Azure Blob Storage
- Output from Stream Analytics after performing the necessary computations can be stored / fed into the following systems:
- Azure SQL Database
- Azure SQL Data Warehouse
- Azure Blob Storage
- Azure Event Hub
- Azure Table Storage
- Azure Service Bus Queue
- Azure Service Bus Topic
- Azure Document DB
- Azure Data Lake Store
- Power BI
Refer to the following resources to learn more about Azure Stream Analytics:
- Azure Stream Analytics Pricing
- Getting Started with Azure Stream Analytics
- Azure Stream Analytics Learning Path
Azure Data Lake Analytics
Azure Data Lake Analytics is a hyper-scale data processing service, in the Azure Cloud, specifically designed and optimized for analytics workloads thereby simplifying Big Data Analytics. Offers on-demand processing power which can scale up or down depending upon the need and the data to be processed.
Azure Data Lake Analytics can talk to various data sources and process different formats of data and we can even do reporting directly on top of the data present in Azure Data Lake Store using Power BI as shown in the above diagram.
Here are few highlights of Azure Data Lake Analytics:
- It is highly elastic and the resources are provisioned and scaled up / scaled down dynamically as per the request and resource requirements of the data processing job(s) thereby making it a very cost effective option for big data processing.
- Users don't need to be concerned about provisioning, scaling, and management of resources as it is taken care by the service thereby enabling the users to focus on writing the actual code and the business problem to be solved.
- Users only pay for the compute power used for the duration of the data processing jobs execution.
- Tightly integrated with Visual Studio enabling effective development, debugging, testing, and optimization of code.
- Same code with the business logic works irrespective of the amount of data to be processed.
- Offers a visual interface to view the flow of execution of the jobs enabling users to easily identify the bottlenecks and resource intensive sections of the code.
- Offers U-SQL as a language to interact with data and write the data processing jobs - U-SQL is a combination of familiar SQL and C# languages and combines the power of both Declarative SQL Language and Object-Oriented C# Language.
- Comes with out of the box monitoring and auditing capabilities enabling necessary insights into the data processing jobs.
- Tightly integrated with Azure Active Directory (AAD) enabling control over security and access management.
- Can interact with and process the data across various Azure Services including Azure Blob Storage, Azure SQL Database, and Azure Data Lake Store.
- Offers high throughput and best performance when used with Azure Data Lake Store which is optimized for big data workloads and specifically optimized to work with Azure Data Lake Analytics.
Refer to the following resources to learn more about Azure Data Lake Analytics:
- Azure Data Lake Analytics Pricing
- Getting Started with Azure Data Lake Analytics
- Azure Data Lake Analytics Learning Path
Azure HDInsight is Microsoft's Hadoop offering in the Azure Cloud. Hadoop is no more a single framework, but is a whole ecosystem of frameworks and projects. Similarly, HDInsight offers various services including Apache Hadoop, Apache Spark, Apache Storm, Apache HBase, Apache Hive, Apache Pig, and Apache Sqoop among other services.
The above diagram shows a typical use of HDInsight in Cortana Intelligence ecosystem. By provisioning appropriate type of HDInsight cluster, we can use it for regular Batch Processing, In-memory high performance parallel processing, processing real-time data from sensors and devices, and a high performance NoSQL storage.
Here are few highlights of Azure HDInsight:
- HDInsight is built on top of Hortonworks Data Platform (HDP).
- HDInsight is 100% compliant with Apache Hadoop.
- HDInsight is tightly integrated with Azure Cloud and various other Microsoft Technologies.
- Hadoop in HDInsight can be installed on the Windows OS unlike the majority of the distributions, available in the market, which are based on the Linux OS.
- HDInsight can be configured to store the data either on Hadoop Distributed File System (HDFS) within HDInsight cluster nodes or on Azure Blob Storage. The most common approach is to use Azure Storage to store the data, intermediate results, and the output and not store data on individual nodes.
- User data (Data to be processed) and job metadata resides in Windows Azure Storage - Blob (WASB). WASB is an implementation of HDFS on Azure Blob Storage.
- HDInsight Clusters can be created / configured for Apache Hadoop, Apache Spark, Apache Spark, or Apache HBase.
- Additional components can be installed on the clusters including R Language, Solr, etc.
- Various additional components are installed on the HDInsight cluster, to enable different types of interaction with / management of data and cluster, like Oozie, Hive, Pig, Avro, Ambari, ZooKeeper, etc.
Refer to the following resources to learn more about Big Data and Azure HDInsight:
- Hadoop on HDInsight
- Spark on HDInsight
- Storm on HDInsight
- HBase on HDInsight
- R Server on HDInsight
- Azure HDInsight Pricing
- Learn Hadoop on HDInsight on Windows
- Learn Hadoop on HDInsight on Linux
Azure Machine Learning
Azure Machine Learning is a fully cloud-based predictive analytics offering in the Azure Cloud enabling creation and deployment of analytical solutions comprising of predictive models and machine learning algorithms to solve complex business problems.
We can see from the above diagram that Azure Machine Learning plays a very important role in predictions, forecasting, and intelligence which feeds the Automated and Intelligence Systems.
Here are few highlights of Azure Machine Learning:
- Enables building and deploying end-to-end machine learning solutions in the cloud.
- Offers a library of ready-to-use algorithms with a wide range of commonly used algorithms.
- An intuitive machine learning studio with drag and drop functionality largely simplifying development of machine learning models.
- Allows deploying the machine learning models as a Web Service for easy consumption.
- Cortana Intelligence Gallery offers a wide range of Solutions contributed by Microsoft and other users.
- Azure Market Place offers a wide range of web services for purchase and consumption.
- Allows building algorithms in multiple machine learning languages like R and Python.
Refer to the following resources to learn more about Data Science and Azure Machine Learning:
- Data Science for Beginners Video Series
- Browse the Analytics Solutions on Cortana Intelligence Gallery
- Machine Learning Modules in Azure Machine Learning Studio
- Azure Machine Learning Studio Capabilities
- Azure Machine Learning FAQ
- Sign up for free trial of Azure Subscription, if you don't have one already, and start giving the above services a try.
- Try Azure Machine Learning for free (Guest Workspace also available for 8 hours without Sign-In).
- Stay tuned to learn more about the other major components of Cortana Intelligence Suite.
- Check out these other Cortana tips
Last Update: 10/18/2016
About the author
View all my tips