Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Overview of Machine Learning and Analytics in Cortana Intelligence Suite


By:   |   Read Comments   |   Related Tips: > Cortana

Quickly Resolve Performance Problems for IIS, .NET and SQL Server       >>>   Get Started


Problem

I have read the earlier tips on the Microsoft Cortana Intelligence Suite and started exploring this technology. Now I would like to know more about the technologies that enable data processing and gaining insights into this data and some pointers on getting started.

Solution

Data Analytics in today's world involves processing large volumes of data coming in from heterogeneous data sources at varying speeds and making it available for further processing and consumption by algorithms, intelligent systems, and reporting applications. In this tip, we will look at the third pillar of Cortana Intelligence Suite which contains offerings to do exactly this.

Overview of Machine Learning and Analytics

Cortana Intelligence Suite comprises of the following offerings as part of Machine Learning and Analytics pillar, enabling businesses to process large volumes of data, gain insights, and to derive predictions using machine learning algorithms:

  • Azure Stream Analytics
  • Azure HDInsight
  • Azure Data Lake Analytics
  • Azure Machine Learning

Azure Stream Analytics

Azure Stream Analytics is a fully managed cloud based real-time event processing engine enabling businesses to gain insights from streaming data in real-time. Following is a typical use case representation of Azure Stream Analytics.

Overview of Azure Stream Analytics

As we can see from the above diagram, Azure Stream Analytics can be used in conjunction with Azure Event Hubs which can ingest millions of events per second and make the events available to services like Azure Stream Analytics for further processing to gain insights into the data.

Here are few highlights of Azure Stream Analytics:

  • Real-time analytics and computations can be performed on data coming from a variety of streaming data sources including devices, sensors, IoT systems, and other such sources.
  • Offers a SQL-like language to perform computations on the streaming data in real-time.
  • Real-time analytics and computations can be really helpful in various scenarios including fraud detection, asset tracking, stock market analysis, vehicle traffic analysis, anomaly detection, and many more.
  • Data from the following sources / systems can be fed into Stream Analytics as input:
    • Azure Event Hubs
    • Azure IoT Hubs
    • Azure Blob Storage
  • Output from Stream Analytics after performing the necessary computations can be stored / fed into the following systems:
    • Azure SQL Database
    • Azure SQL Data Warehouse
    • Azure Blob Storage
    • Azure Event Hub
    • Azure Table Storage
    • Azure Service Bus Queue
    • Azure Service Bus Topic
    • Azure Document DB
    • Azure Data Lake Store
    • Power BI

Refer to the following resources to learn more about Azure Stream Analytics:

Azure Data Lake Analytics

Azure Data Lake Analytics is a hyper-scale data processing service, in the Azure Cloud, specifically designed and optimized for analytics workloads thereby simplifying Big Data Analytics. Offers on-demand processing power which can scale up or down depending upon the need and the data to be processed.

Overview of Azure Data Lake Analytics

Azure Data Lake Analytics can talk to various data sources and process different formats of data and we can even do reporting directly on top of the data present in Azure Data Lake Store using Power BI as shown in the above diagram.

Here are few highlights of Azure Data Lake Analytics:

  • It is highly elastic and the resources are provisioned and scaled up / scaled down dynamically as per the request and resource requirements of the data processing job(s) thereby making it a very cost effective option for big data processing.
  • Users don't need to be concerned about provisioning, scaling, and management of resources as it is taken care by the service thereby enabling the users to focus on writing the actual code and the business problem to be solved.
  • Users only pay for the compute power used for the duration of the data processing jobs execution.
  • Tightly integrated with Visual Studio enabling effective development, debugging, testing, and optimization of code.
  • Same code with the business logic works irrespective of the amount of data to be processed.
  • Offers a visual interface to view the flow of execution of the jobs enabling users to easily identify the bottlenecks and resource intensive sections of the code.
  • Offers U-SQL as a language to interact with data and write the data processing jobs - U-SQL is a combination of familiar SQL and C# languages and combines the power of both Declarative SQL Language and Object-Oriented C# Language.
  • Comes with out of the box monitoring and auditing capabilities enabling necessary insights into the data processing jobs.
  • Tightly integrated with Azure Active Directory (AAD) enabling control over security and access management.
  • Can interact with and process the data across various Azure Services including Azure Blob Storage, Azure SQL Database, and Azure Data Lake Store.
  • Offers high throughput and best performance when used with Azure Data Lake Store which is optimized for big data workloads and specifically optimized to work with Azure Data Lake Analytics.

Refer to the following resources to learn more about Azure Data Lake Analytics:

Azure HDInsight

Azure HDInsight is Microsoft's Hadoop offering in the Azure Cloud. Hadoop is no more a single framework, but is a whole ecosystem of frameworks and projects. Similarly, HDInsight offers various services including Apache Hadoop, Apache Spark, Apache Storm, Apache HBase, Apache Hive, Apache Pig, and Apache Sqoop among other services.

Overview of Azure HDInsight

The above diagram shows a typical use of HDInsight in Cortana Intelligence ecosystem. By provisioning appropriate type of HDInsight cluster, we can use it for regular Batch Processing, In-memory high performance parallel processing, processing real-time data from sensors and devices, and a high performance NoSQL storage.

Here are few highlights of Azure HDInsight:

  • HDInsight is built on top of Hortonworks Data Platform (HDP).
  • HDInsight is 100% compliant with Apache Hadoop.
  • HDInsight is tightly integrated with Azure Cloud and various other Microsoft Technologies.
  • Hadoop in HDInsight can be installed on the Windows OS unlike the majority of the distributions, available in the market, which are based on the Linux OS.
  • HDInsight can be configured to store the data either on Hadoop Distributed File System (HDFS) within HDInsight cluster nodes or on Azure Blob Storage. The most common approach is to use Azure Storage to store the data, intermediate results, and the output and not store data on individual nodes.
  • User data (Data to be processed) and job metadata resides in Windows Azure Storage - Blob (WASB). WASB is an implementation of HDFS on Azure Blob Storage.
  • HDInsight Clusters can be created / configured for Apache Hadoop, Apache Spark, Apache Spark, or Apache HBase.
  • Additional components can be installed on the clusters including R Language, Solr, etc.
  • Various additional components are installed on the HDInsight cluster, to enable different types of interaction with / management of data and cluster, like Oozie, Hive, Pig, Avro, Ambari, ZooKeeper, etc.

Refer to the following resources to learn more about Big Data and Azure HDInsight:

Azure Machine Learning

Azure Machine Learning is a fully cloud-based predictive analytics offering in the Azure Cloud enabling creation and deployment of analytical solutions comprising of predictive models and machine learning algorithms to solve complex business problems.

Overview of Azure Machine Learning

We can see from the above diagram that Azure Machine Learning plays a very important role in predictions, forecasting, and intelligence which feeds the Automated and Intelligence Systems.

Here are few highlights of Azure Machine Learning:

  • Enables building and deploying end-to-end machine learning solutions in the cloud.
  • Offers a library of ready-to-use algorithms with a wide range of commonly used algorithms.
  • An intuitive machine learning studio with drag and drop functionality largely simplifying development of machine learning models.
  • Allows deploying the machine learning models as a Web Service for easy consumption.
  • Cortana Intelligence Gallery offers a wide range of Solutions contributed by Microsoft and other users.
  • Azure Market Place offers a wide range of web services for purchase and consumption.
  • Allows building algorithms in multiple machine learning languages like R and Python.

Refer to the following resources to learn more about Data Science and Azure Machine Learning:

Next Steps


Last Update:


signup button

next tip button



About the author
MSSQLTips author Dattatrey Sindol Datta has 8+ years of experience working with SQL Server BI, Power BI, Microsoft Azure, Azure HDInsight and more.

View all my tips





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


SQL tips:

*Enter Code refresh code     



Learn more about SQL Server tools