Azure Cloud Data Resources
As a data leader or professional in the modern era of cloud computing, the number of technologies, cloud services and cloud platforms available for use is staggering. This is a double-edged sword. On one side, we have access to more tools than ever to derive value from our data to make the digital transformation. Conversely, it can be difficult to pick the right tooling given that there are so many options in the public cloud.
The Microsoft Azure cloud is a great example of this. It is a platform with many tools that can drive immense value, but at times it can be difficult to determine what tool should be used where.
The goal of this article is to demystify the Azure Cloud for Data Professionals and provide what options are available for building modern data architectures (IAAS, PAAS, Hybrid Cloud, etc.) in the Azure Cloud. This article will stay surface level on most topics and future articles will dive into much more detail on each sub-section.
The following article is broken down by resource type and will aim to illuminate each Azure data resource and what it does at a surface level.
Please note that there are additional Azure tools that might not be fully explored in this article. I will be delving into the core Azure data services as those are the most critical to understand.
Azure Cloud Services – Artificial Intelligence and Machine Learning
These Azure solutions focus on services that enable building or implementing out of the box AI (Artificial Intelligence) and ML (Machine Learning) solutions.
Azure Machine Learning
- Azure Machine Learning is a service that supports the end-to-end machine learning lifecycle, ranging from developing models to deploying them at scale for widespread use in your organization. This includes some key features such as notebooks, Auto-ML, drag and drop ML development, and ML ops for CI/CD.
- This tip offers an intro to this service: Introduction to Microsoft Azure Machine Learning
- Azure Databricks is a unified analytics platform that offers a home for all data personas: data analysts, data scientists, and data engineers. Databricks offers a scalable Spark platform with collaborative notebooks, multi-language support (Scala/Python/SQL/R), machine learning, data science, and streaming capabilities, and a SQL workbench for ad hoc queries and dashboards. Managed ML Flow is also built in, which provides all the tools needed to take the machine learning process from development through prod.
- This tip offers an intro to this service: What is Azure Databricks?
Data Science Virtual Machines
- Data Science Virtual Machines offer pre-built virtual machines in Azure that are ready for AI and ML Development. Managing a physical computer for data science dependencies can be difficult, and it can be even more difficult to integrate with data sources or other services such as Azure ML. Data Science virtual machines are a turnkey solution for providing data scientists a development environment that is preconfigured and can be easily integrated with your Azure estate.
- Here is an article that discusses transferring files between these virtual machines and Azure Data Lake Storage: Transfer files between a Data Science Virtual Machine and Azure Data Lake Storage
Azure Bot Services
- Azure Bot Services is your one stop shop for creating chat bots in Azure. It integrates natively with other Azure AI services, such as Azure Cognitive Services, and allows developers to create robust chat bots without the need for any machine learning experience.
- There are many other ML and AI Services on Azure that fall into two main
- 'Out-of-the-box AI' – there are many Azure services that offer AI 'out-of-the-box'. These are services which can be integrated with your other Azure Applications that handle certain common AI tasks.
- Other niche AI and ML Services built for specific industry solutions.
- These additional services can be found here: https://azure.microsoft.com/en-us/services/#ai-machine-learning
Azure Cloud – Analytics
These Azure solutions focus on building and delivering analytics to the business.
Azure Analysis Services
- Azure Analysis Services is a cube-based analytics engine for the cloud. For those familiar with the legacy Microsoft analytics stack, this is SQL Server Analytics Services in cloud form with all the benefits of cloud, including ease of use, managed infrastructure, elastic scale, and secured access.
- The following link has several articles on this technology: SQL Server Azure Tips
- Also mentioned in the first section, Azure Databricks is a unified analytics platform – therefore it could technically end up in most sections of this article. From an analytics perspective, Databricks offers a SQL Service, which is made up of a SQL workbench for writing queries and building dashboards, and under the hood, SQL Endpoints containing a completely rewritten Spark engine called Photon. The Photon engine is optimized for SQL queries on the data lake. Your favorite BI tool can connect to these endpoints as well to build dashboards on top of the data lake.
Azure Synapse Analytics
- Azure Synapse Analytics is previously SQL Data Warehouse but wrapped in several other services to form a more robust offering. Core components include a distributed cloud warehouse, integrated Apache Spark and SQL engine, integrated Data Factory for code free data integration, and end to end management and monitoring.
- Here is a great Synapse overview, and even more articles can be found on the MSSQLTips Azure page here.
Azure Data Factory
- Azure Data Factory is Azure's answer to a cloud ETL/ELT tool. For those familiar with the traditional SQL Stack, Azure Data Factory is akin to SSIS in the cloud, but a much more robust and powerful version. ADF contains both simple data copy activities for change data capture, but also has code free 'Data Flows' which allow for big data ETL processing in a code free tool.
- The following link has several articles on this technology: SQL Server Azure Tips
- Azure Power BI is the de facto data visualization and reporting tool on Azure for building line of business reports and dashboards.
- There are many available tips on MSSQLTips on Power BI, all of which can be found here: SQL Server Power BI Tips
- Azure Purview is a new Azure service which offers a robust data cataloguing and governance service for your entire data estate. You can generate a clear view of your data estate, track data lineage, and enable data users to find the data they need faster.
Azure Stream Analytics
- Azure Stream Analytics allows you to do real-time analytics on mission critical workloads on streaming data. This is a SQL based tool but can be used with custom code and ML as well. This allows you to quickly build streaming pipelines with analytics built in for real time alerting and dashboarding.
- Here is a great article on real-time anomaly detection in Azure Stream Analytics: Real-Time Anomaly Detection Using Azure Stream Analytics
Azure Data Lake Analytics
- Azure Data Lake Analytics is a fully elastic and scalable big data processing engine on Azure. You can use U-SQL, R, Python, or .Net to write big data programs for ETL, querying, analytics, ML, image processing, etc., at scale.
- HDInsight is a managed platform for provisioning traditional Hadoop, Spark, R Server, HBase, and Storm clusters. While it can be used for lifting and shifting these open-source technologies from on-prem into the cloud, there are other cloud tools within the Azure ecosystem which handle these same workloads in a more optimized way (Databricks, Synapse, etc.).
- There are other Analytics offerings as well that are more niche in use:
- Azure Data Explorer: A data analytics service that specializes in ingesting and performing real-time analytics on IoT, Big Data Logs, and other streaming data source.
- Azure Data Share: A service used for data sharing with external organizations.
- Data Catalog: This is Azure's answer to a data catalog, but with Azure Purview in place there is less need for this service.
Azure Cloud – Databases
These services are Azure's various database offerings, each of which can serve specific use cases. MSSQLTips has many articles that cover all of these services, so make sure to check them out here: SQL Server Azure Tips.
Azure SQL Server (Azure SQL Database)
- Azure SQL Database is as straightforward as it gets – SQL Server in the cloud, with the benefits of the cloud. This makes it an RDBMS which is fully managed, has elastic capabilities, and built-in tools to make the focus on building the database, not managing it.
Azure Cosmos DB
- Azure Cosmos DB is Azure's managed NoSQL offering in the cloud. When paired with the right use case, Cosmos DB can provide millisecond response times.
- Here is a great introduction to the topic: Introduction to Azure Cosmos DB database and the SQL API
Azure SQL Managed Instance
- Azure SQL Managed Instance is similar to Azure SQL in that it is utilizing the same underlying engine. However, it is more focused on migrating existing on premises SQL Instances running on the Windows operating system in your data center to the cloud with ease. Once in the cloud, it offers some of the managed service benefits such as automatically keeping the SQL version up to date, yet still being integrated with the tools and experience your DBAs and developers are already comfortable with.
- See this document for a laundry list of differences.
- Here is a great time introducing SQL Managed Instances: Introduction to Azure SQL Database Managed Instances
Azure Cache for Redis
- Azure Cache for Redis is a fully managed in-memory data store for improving concurrency and scalability of your applications. It is a speed data layer that can take your applications to the next level as traffic and users might increase over time.
Additional Azure Databases
- The remainder of Azure's database offerings are mostly managed versions of other popular databases, such as MariaDB, Postgres, etc. You can find the other database offerings at this link: https://azure.microsoft.com/en-us/services/#databases
Azure Cloud – Storage
Beyond databases, listed below are Azure's cloud storage capabilities for application or other general data storage needs.
Before we get started, I wanted to define Blob storage which is a term you will see throughout:
- BLOB = Binary Large Object Storage
- Essentially, Blob storage is 'file system' which you can store any type of data on images, parquet files, CSVs, etc. It is multi-purpose, cheap, secure, and scalable which is why it is used so predominantly in the field.
- Archive Storage is the cheapest object storage available on Azure and is used for storing rarely accessed data in the cloud. Perfect for regulatory requirements or legacy data that is mandated to be saved for a certain period of time.
Azure Data Lake Storage
- Azure Data Lake Storage is exactly how it sounds – object storage in the cloud optimized for acting as a data lake and handling big data workloads. ADLS is built on top of Blob Storage with additional features that allow it to scale and be secured at a more granular level, such as supporting role based access control (RBAC).
Azure Blob Storage
- Azure Blob Storage is Azure's answer to cloud object storage. Blob storage is cheap and multi-purpose, and since it is built on top of REST APIs it can be used in your applications or in your data workloads.
Azure Storage Explorer
- Azure Storage Explorer is an application you can use to connect to your Azure account to browse and update/delete data in your various Blob, data lake, and other data stores.
- Here is a great overview of Azure Storage Explorer: Azure Storage Explorer Overview
AAdditional Storage Options
- There are a series of other specific storage solutions for more niche workloads. Please read about these options.
There are many additional services offered on Azure which are not necessarily data services but will be often be found in data platform reference architectures.
- Azure Devops is a tool for management, automation and development of CI/CD into your applications, whether they be web applications or data applications. Having a CI/CD process is critical in any modern data application and Azure Devops integrates natively with almost all of Azure's development services.
Azure Log Analytics and Azure Monitor
- Azure Log Analytics in Azure Monitor is a tool that allows you to query and track logs for your Azure applications. This can be used in auditing, security compliance, or building performance insights on data applications that you deploy.
Azure Event Hub
- Azure Event Hub is a managed ingestion service for streaming events and making them available for other applications to perform real time analytics. This is often paired with streaming services such as Apache Kafka.
- Here is a great article on the topic, comparing Event Hub and IoT Hub: The tale of two Azure Hubs - IoT Hub and Event Hub
Azure Event Grid
- Azure Event Grid is an event-based service for managing the infrastructure of routing events. An example of this might be an Event Grid configured on a data lake location. Whenever a new file lands in that location, Event Grid notifies some other service to pick that file up and load it into another location of the data lake.
Azure Service Bus
- Azure Service Bus is a cloud messaging service that allows you to send reliable cloud messaging between your applications. In practice the core difference between Event Hubs and Service Bus is that Event Hub will be used for data and analytics streaming use cases and logging, whereas Service Bus would be used more as an intra application communication device with the tradition messaging framework of queues and topics.
Azure IoT Hub
- Azure IoT Hub is a centralized platform for managing IoT workloads within Azure. It allows you to link your IoT applications to the devices they manage in a single location, simplifying the entire process.
- Let me know if there are any other Azure Services you would like to see added to any section above, or in more detail in future articles.
- Think about your upcoming Azure use cases and use this article to help pick the right tools and build out your reference architecture.
- MSSQL Tips hosts a series of articles all related to these topics, find them here:
- For additional information about other cloud providers beyond Microsoft please see these resources:
Last Updated: 2021-06-14
About the author
View all my tips