Security, Governance, and CI / CD in Azure Synapse Analytics Workspace

By:   |   Updated: 2022-03-11   |   Comments   |   Related: > Azure Synapse Analytics


Problem

When working in cloud based unified analytics platforms, security, governance, and CI / CD are oftentimes a critical need. Synapse Analytics workspaces is a unified analytics platform on Microsoft Azure that offers robust security, governance, and CI / CD capabilities. As organizations and developer begin their journey with Synapse Analytics workspaces, they are interested in learning more about these various features around security, governance, and CI / CD within the workspace.

Solution

From its tight integration with Azure Active Directory, Managed Identity, Private Endpoint capabilities and more, Synapse Analytics offers extremely robust security features. From a data governance standpoint, Synapse Analytics can be integrated with Azure Purview to discover assets, report lineage, and more. Synapse Analytics also integrates well with Azure DevOps for implementing continuous integration and deployment pipelines. In this article you will learn more about these security, governance, and CI / CD capabilities of Azure Synapse Analytics to help with understanding of how it can fit within your Azure Data Lakehouse.

Security

With Synapse Analytics being an Azure native Platform as a service (PaaS) solution offering, it brings with it the Azure security baseline controls for private network access with Private Endpoints, network attack protection with Firewalls, implementing network security rules, and securing domain name services. With Private endpoints, a private IP address is used from within a virtual network (vNet) to connect to Synapse Analytics workspace endpoints. Workspaces can configure outbound data traffic to resources in any Azure AD tenant over private endpoints. The figure below shows how to create a private link enabled Synapse workspace from the Azure Portal.

SynapsePrivateLinkHubs Synapse Analytics Private Link Hubs in Azure Portal

From an identity management standpoint, Synapse Analytics offers the capability of integrating Azure Active Directory (AAD) within its platform for centralized identity and authentication management by securing and automating application identities with Managed Identity, and offering an AAD Single Sign On (SSO) experience for application access. With Managed Identities, Azure resources can authenticate to Synapse Analytics without storing credentials in code and can have multiple users assigned managed identities. The figure below shows how access control to the Synapse workspace can be granted and managed from the 'Manage' tab in Synapse Analytics.

AccessControlsSynapse Access Controls in Synapse Analytics workspace

With its Role based access controls (RBAC), it brings with it the principle of privileged access. From a data protection perspective, Synapse Analytics supports the protection of sensitive data through Dynamic Data Masking policies, Transparent Data Encryption (TDE) to protect data at rest and in transit, and robust monitoring capabilities. With TDE, SQL Pools in a workspace can be encrypted with a second layer of encryption with service managed keys and can be enabled at the individual SQL Pool-level. Other security features that Synapse Analytics offers includes regular automated backups and recoveries, endpoint security, posture and vulnerability management, logging and threat detection, and asset management.

Azure SQL Auditing can monitor SQL Pool events and log them to an ADLSgen2 account. Finally, Synapse Analytics is well integrated with Microsoft Defender, which offers vulnerability assessments for SQL resources, advanced security, threat protection alerts, regulatory compliance tracking and more. The figure below shows how to access and enable Microsoft Defender within the Security section of the Synapse Analytics workspace blade within the Azure Portal. Also notice the various other security features within the security section which can be enabled and further configured.

MicrosoftDefenderSynapseSQL Microsoft Defender for Synapse SQL in Azure Portal 

Governance

Purview is an Azure native Data Governance offering from Microsoft. With Purview, data can be discovered, tracked, cataloged, and governed to help businesses map and view their data. Purview can be integrated with Synapse Analytics to discover, classify, map, and evaluate data in workspaces, dedicated and serverless SQL pools. To configure Synapse within Purview, you would simply need to select Azure Synapse Analytics within the 'Register sources' UI, and from there you'll be able to specify details related to your Azure Subscription, Synapse Analytics workspace, and more to register and scan your workspace to identify assets and classify data across dedicated or serverless resources.

SynapseAzurePurview Synapse Analytics source in Azure Purview

There are a variety of other benefits and features of integrating Purview with Synapse Analytics. Purview's Apache Atlas Spark Connector is also available to track and register Spark SQL and DataFrame lineage and metadata changes to Purview, when needed. Also, from a security stand-point, Private Endpoints can be used with Purview to secure access from a virtual network (VNet) over a Private Link. With its tightly coupled integration with Azure Active Directory, identity and credential management is seamless with a variety of options including Managed Identity and Service Principle. Additionally, with once your Synapse Analytics workspace is registered with Purview, you will have the capability of tracking Synapse Pipeline lineage within Purview.

Synapselineage Synapse Pipeline lineage in Azure Purview

Thus far, we have discussed and explored how we can connect Synapse Analytics to Purview for the purpose of registering, scanning, discovering, and tracking Synapse assets all within the Purview experience. Synapse Analytics and Purview also support the option of discovering Synapse assets that have been registered with Purview directly from the Synapse Analytics workspace. The figure below shows how to connect your Purview account to the Synapse Analytics workspace from the 'Manage' tab.

PurviewtoSynapse Connect Azure Purview to Synapse Analytics workspace

After your Synapse Analytics account has been registered with Purview, you can also connect to your Purview account from the Synapse Analytics workspace. Once connected, you'll be able to search for and discover assets directly from the Synapse Analytics workspace search bar. For Synapse Pipelines, Purview registered lineage can also be tracked from the monitoring UI.

Purviewlinkedassets Explore Purview linked assets from Synapse Analytics workspace

Continuous Integration and Deployment

After your workspace is linked with an Azure DevOps repo, you'll be able to commit and publish the relevant artifact changes. Azure's continuous integration and deployment pipelines would then orchestrate the testing and promotion of the incremental changes from DEV to UAT to PROD. Within your build pipeline, you'll need to first add a Copy Files task to copy your synapse publish templates from your GIT repo to the artifact staging repo. You will also need a Publish Artifact: drop task to publish the artifact to Azure Pipelines.

Once the build pipeline is completed, the release pipeline will need to be created. The VS Marketplace offers a variety of tasks that can be integrated with your ADO pipelines, and the Synapse workspace deployment task can be configured to deploy the workspace to higher environments. Note that this deployment task will deploy the Synapse Analytics workspace and assets within in, however since Synapse Analytics also includes Dedicate SQL Pools, there will need to be a different CI / CD process for this.

SynapseVSMarketplace Synapse workspace deployment task in VS Marketplace

Dedicated SQL Pools is the Synapse Analytics flagship MPP Datawarehouse, therefore its continuous integration and deployment process will be similar to the process of incrementally deploying a SQL database dacpac file from development to production. The pattern is common across SQL databases where a developer would be using their local development software such as Visual Studio to develop their Datawarehouse code. This local environment would be synced with ADO's git repo which commit dacpac file changes and this dacpac file would then be integrated with Azure CI and CD pipelines to deploy the dev Datawarehouse to the upper environments by using dacpac deployment tasks in the CD release pipeline.

CICDSynapseAnalytics CI CD Flow for Synapse Analytics Database deployment
Next Steps


Related Articles




get scripts

next tip button



About the author
MSSQLTips author Ron L'Esteve Ron L'Esteve is a seasoned Data Architect who holds an MBA and MSF. Ron has over 15 years of consulting experience with Microsoft Business Intelligence, data engineering, emerging cloud and big data technologies.

View all my tips


Article Last Updated: 2022-03-11

Comments For This Article

















get free sql tips
agree to terms