Real-Time Analytics, Advanced Analytics and Reporting in Azure Synapse Analytics
While choosing a cloud unified data and analytics platform, organizations and developers are interested in understanding their capabilities and service offerings as it related to analytics and reporting. Azure Synapse Analytics workspace is a unified data and analytics platform that supports a variety of use cases around real-time analytics, advanced analytics, and reporting with Power BI. Oftentimes, there is quite a lot of technical information around these topics which can become overwhelming, and decision makers are seeking to understand a summarized view of the capabilities of real-time analytics, advanced analytics, and reporting within Azure’s Synapse Analytics workspace to determine if it is a good fit for their organizations.
Synapse Analytics supports a number of real-time analytics and advanced analytics scenarios. With Synapse Link, a direct connect link can be established between Synapse Analytics and a variety of transactional source systems such as CosmosDB, Dataverse, and on-premises SQL Server 2022. Azure Data Explorer has also been integrated into Synapse Analytics to support the querying and analytics of log and telemetry data. Structured Streaming and Change Data Capture use cases are also supported within Synapse Analytics. From an advanced analytics perspective, Synapse ML supports the development of production grade ML Models. Synapse Analytics workspaces databases and datasets can also be connected to Power BI. Furthermore, PBI reports can be developed from within the linked Synapse Analytics workspaces. In this article, you will learn more about these various real-time analytics, advanced analytics, and reporting capabilities in Azure Synapse Analytics.
With the increase of volume and velocity of data flowing into the Lakehouse, there is a growing demand for real time analytics for quicker insights. Synapse offers many capabilities around Structured Streaming, Live Links to source systems, change data capture (CDC) capabilities with Tumbling Window Triggers in Synapse Pipelines and more. In this section, you will learn more about Structured Streaming and Live Synapse Links within Synapse Analytics workspaces.
With Synapse Analytics Structured Streaming capabilities, integrating with IoT
and Event Hubs is seamless with its libraries for Spark which can be called within
a notebook by using
spark.readStream and the transformed data can be written out as Delta format within your Lakehouse.
With the integration of Structured Streaming within Synapse Notebooks, it
is possible to perform analytics on real-time data. You can then create a Lakehouse
external table to directly query the data in either Synapse Analytics workspaces
or within Power BI.
With Synapse Link, operational data stores including Azure Cosmos DB, Dataverse, On-Premises SQL Server 2022, and Data Explorer can be directly connected to Synapse Analytics to support real-time analytics use cases. The image below shows the Synapse Link architecture for one of these technologies, Cosmos DB. Azure Synapse Link for Azure Cosmos DB is a cloud-native hybrid transactional and analytical processing (HTAP) capability allowing users to run near real-time analytics on operational data in Azure Cosmos DB. Data Engineers, Business Analysts and Data Scientists now have the ability to use Spark or SQL pools to get near real-time insights into data without impacting performance of transactional workloads in Cosmos DB.
There are a number of advantages to Azure Synapse Link for Azure Cosmos DB including reduced complexity since a near real-time analytical store reduces or eliminates the need for complex ETL or change feed job processes. Additionally, there is little to no impact on operational workloads since the analytical workloads are rendered independently of transactional workloads and does not consume the provisioned operational throughput. In addition, it is optimized for large scale analytics workloads by leveraging the power of Spark and SQL on-demand pools which makes it cost effective due to highly elastic Azure Synapse Analytics compute engines. With a column oriented analytical store for workloads on operational data including aggregations and more, along with decoupled performance for analytical workloads, Azure Synapse Link for Azure Cosmos DB enables and empowers self-service, near real-time insights on transactional data.
Similar to Cosmos DB, Synapse Link is also available for Dataverse, SQL Server 2022, and Data Explorer. Synapse Link for Dataverse supports integrating data from Power Apps and Dynamics 365 into Synapse Analytics workspaces. With Synapse Link for SQL Server 2022, you’ll be able to run near real-time analytical queries against your on-premises SQL Server data with minimal impact to your operational system. Finally, from a real-time analytics perspective, Synapse Analytics data exploration of your logs and telemetry data with its Data Explorer offering. You’ll be able to unlock insights and trends on your log and time-series data by running queries in real-time.
Unified Data and Analytics Platforms such as Synapse Analytics workspaces bring with it a vast feature set around advanced analytics, specifically for Machine Learning and Cognitive Services. Synapse ML simplifies the end-to-end machine learn model development and deployment process through its open-source library. It unifies ML Frameworks into a scalable API that can be accessed by Python, R, Scala, and Java scripting languages.
A variety of Machine Learning and Cognitive Services use cases can be achieved by using Synapse ML. With its pre-built APIs for Cognitive Services, Synapse ML provides ready to use libraries on Apache Spark such as Form Recognizer, Anomaly Detection, Vision, Translator, Text Analytics, and more. Synapse ML also offers support for Open Neural Network Exchange (ONNX) frameworks and runtimes for ML use cases such as Deep Learning and more. With access to ONNX tools and model hubs, you will have easy access to over 120 pre trained models.
In addition to SynapseML, Synapse Analytics workspaces offer many other features that support ML and Cognitive services use cases. With the Cognitive Services wizards, you’ll be able to select from pre-trained models for Anomaly Detection, Sentiment Analysis and more, to make predictions on your dataset. Simply right click on your dataset and select Machine Learning > Predict with a model. Notice that you will also have the option to Train a new model using Classification, Regression, and Time series forecasting. Once configured, the wizard will generate PySpark code within a new notebook that is connected to your dataset. Finally, with Synapse Analytics workspaces, you will be able to integrate and execute Azure Machine Learning pipelines and code within Synapse Pipelines and Notebooks.
Your Synapse Analytics workspaces and all databases can connect to Power BI through its out of the box connectors. The connectors support previewing of data before import, identification of related tables and import of relationships, data transformation using Power Query, and more. Synapse Analytics can also connect to Azure Analysis Services. This is valuable for organizations that used Analysis Services as an intermediary between the data and Power BI reporting layers. With Power BI’s Azure Active Directory Single Sign-on (AAD SSO) integration, security and identity management is seamless, so only users who have access to data will be able to directly query and retrieve it from Power BI.
With Lake Databases that leverage ADLSgen2 files in either Delta or Parquet format, connecting directly to Power BI further promotes the Lakehouse Paradigm. It is possible to query Parquet files directly from SQL endpoints within Power BI. Simply retrieve the endpoints from the overview tab within your Synapse Analytics workspace blade in the Azure Portal. The endpoints would be similar to the following and can be specified within a SQL Server Database data source connection in Power BI for either Import or Direct Query connectivity modes. When working with the preferred Parquet file format, creating pre-aggregated views in Serverless SQL Pools will lead to better performance and lower costs. For analytics workloads that require many queries across large datasets, it may be beneficial to use Dedicated SQL pools as the storage layer since you would pay a fixed cost which may result in being lower than paying per query for this particular scenario.
Dedicated SQL endpoint synapse-rl-001.sql.azuresynapse.net Serverless SQL endpoint synapse-rl-001-ondemand.sql.azuresynapse.net Development endpoint https://synapse-rl-001.dev.azuresynapse.net
In your Synapse Analytics workspace, you have the capability of connecting to a Power BI workspace for report development through a Linked Service. This is available in the Manage Hub of the workspace. Once connected, you will have access to all of your datasets and reports within the linked Power BI workspace and you will be able to develop reports in the Development Hub of the Synapse Analytics workspace.
- Read more about Getting Started with Azure Synapse Link for Cosmos DB (mssqltips.com)
- Read more about Processing Real Time Data Streams with Azure Synapse Analytics (mssqltips.com)
- Read more about Azure Synapse Link - Power Apps | Microsoft Docs
- Read more about Machine Learning in Azure Synapse Analytics - Azure Synapse Analytics | Microsoft Docs
- Read more about Quickstart: Linking a Power BI workspace to a Synapse workspace - Azure Synapse Analytics | Microsoft Docs
About the author
View all my tips
Article Last Updated: 2022-03-17