Getting Started with Azure Purview for Data Governance


By:   |   Updated: 2021-06-08   |   Comments   |   Related: > Azure


Problem

Numerous organizations are needing to establish data governance processes, standards and methodologies and have been able to do this with on-premises SQL Server tools such as Master Data & Quality Services, however there has been a major gap in the Azure space for such data governance products. Previously, Microsoft has attempted to bring data governance to Azure through hosting an MDS database on an Azure managed instance or through Azure Data Catalog which hasn't really been a full-fledged unified data governance product. Microsoft Azure recently announced the public preview release of Azure Purview to centrally manage data governance across your data estate, spanning both cloud and on-premises environments. How can we get started with Azure Purview?

Solution

Azure Purview's easy to use UI and catalog makes data sources easily discoverable and understandable by the users who manage the data assets. Azure Purview provides a cloud-based SAAS service into which users can register data sources while maintaining a copy of the indexed metadata as well as a reference to the source location. Additionally, this metadata can be further enriched in Purview through tags, descriptions and more. Azure Purview is intended to address some of the challenges for data consumers and producers that have been captured in the Microsoft's article What is Azure Purview? In this article we will explore how to get started with Azure Purview and then explore some of the features within Purview Studio.

Create Azure Purview

Since Azure Purview is now publicly available for preview, it can be accessed via the Azure Portal.

PurviewNew Create New Purview

As with all Azure resources, Purview also requires both project and instance details, as depicted below.

NewPurviewDetails Create New Purview Details

Additional configuration options are also available for selection.

Platform size: The following link captures capacity units, which are a provisioned set of resources to keep Purview Data Map up to date and running. A minimum of four units need to be selected.

Catalog features: While in preview, there is no option to select these features yet but may be related to feature set levels that will be defined as this product approaches GA.

PurviewConfig Create New Purview config details

Initially, users may run into errors when trying to create Azure Purview instance as Purview will need to be added as a provider at the subscription level. Also, users must have full AD permissions or ability to view and search AD.

CreatePurviewError Error when creating new purview

Below are the steps to add Purview as a resource provider at the subscription level.

RegisterPurview Register in Resource Providers

Once added as resource provider, we can see that Purview has successfully validated and ready to be created.

PurviewValidationPassed Validation passed for creating purview

Explore Azure Purview

Once created, clicking 'Open Purview Studio' will launch the studio.

OpenPurview Click to open purview from portal

Create and Register Data Source

Once in the Azure Studio, the data sources can be added and registered with your Purview account.

PurviewStudio Purview Studio display

To add a new source we must first create a collection by: 1) clicking sources, then 2) new collection, 3) give it a name and 4) click finish.

CreateCollection Steps to create new purview collection

Next, a source will need to be registered by searching from and selecting from the list of registered sources (Azure Blob Storage, Azure Cosmos DB (SQL API), Azure Data Explorer (Kusto), Azure Data Lake Storage, Azure Data Factory, Azure SQL Database, SQL Server, etc.).

RegisterSources Step to register sources

Also, the details for the source will need to be entered.

RegisterADLS register adls

As we can see, the sources begin to be added to the collection under which they are registered. A collection can have multiple sources and there can be multiple collections in the canvas.

PurviewCollections Display of purview collection of sources

Manage Credential and Access

Within Azure Purview, credentials are needed to quickly reuse and apply saved authentication information to your data source scans. Additionally, Purview enforces the need to use key vault to store passwords and secrets as a minimum requirement. For more detail, please read: Create and manage credentials for scans - Azure Purview | Microsoft Docs

PurviewCredentials Steps to register purview credentials and key vault

Purview will display a message a how to grant Purview access to Key Vault.

GrantAccess Steps to grant access to purview

The image below illustrates steps taken to provide the necessary GET access permissions to Purview from Azure Key Vault.

KeyVaultAccess Key Vault access policies for purview
AddPurviewAccess Add access to purview for kv

Once the connections are created and verified in Purview, they will appear in the 'Manage Key Vault Connections' section, as depicted below.

ManageKVConnection Key vault now visible in purview

The key vault connection can then be configured within the credential authentication UIs when adding new data sources and connections. For example, the below image uses key vault to extract the account key for the respective ADLS2 account.

ADLCred Add credential for ADLS2

Create a Scan

Within the Azure Purview catalog, there is the capability to create scan rule sets to enable users to quickly scan data sources within the organization.

A scan rule set is a container for grouping a set of scan rules together to easily associate them with a scan. For more detail, read Create a Scan Rule Set.

A new scan can be set up for the various sources by clicking the scan icon as illustrated below and then populating the required credential details.

ScanSources Step to scan sources for Purview

Additionally, the scope of the scan can be customized and altered to either select or de-select objects within the source.

ScopeScan can select tables as needed

A scan rule set can utilize either a default rule set to include all supported system classification rules.

CreateScanRuleSet Can select a scan ruleset

Here is a sample list of classification system rules available.

SelectRules Can select multiple rules

Alternatively, a new customized scan rule set can be created to specify the rules that should be applied from either a list of available rules or custom rules that can be defined.

The final step to creating a scan rule set is to set either a manual (one time) or recurring (defined by schedule) trigger. Note that there is also an option to set the recurring end date.

SetTrigger Trigger can be manual or recurring based on schedule

Once the scan rule set is configured, it can be reviewed, saved and run.

ReviewScan Review the scan and save/run

Similar to the Azure SQL Database Source, a new scope and scan rule set can be added for the ADLS2 account as well.

ScopeADL Scope the scan for adls
ADLRuleSet Set the rules for ADLS2

Once the scans complete, the overview section provides additional details related to the number of scans performed, along with the assets scanned.

ScanOverviewSQL Overview of successfully completed scan for SQL
ScanOverviewADL Overview of successfully completed scan for ADL

Explore the Glossary

Purview's business glossary lets users define and manage glossary terms easily. For more detail, please read 'Understand business glossary features in Azure Purview'.

The following image illustrates the steps required to create glossary terms.

GlossaryTerms Add new glossary terms

Once the glossary terms have been created, additional data, contacts, related items can be added and lined to the terms to enrich the terms. Additionally, these glossary terms can be linked to assets to further enrich the assets' meta data.

GlossaryOverview Display of glossary
GlossaryTerms2 Additional display of glossary

Browse Assets

Finally, once assets have been registered in Purview, they can be accessed from the home page by searching for them in the search box.

PurviewHomeSearch Steps to search for assets in purview home screen page.

Alternatively, assets can be accessed from the 'Brow Assets' icon on the home page.

BrowseAssets Steps to browse assets in purview

The registered list of assets will be available at a granular level.

ADLAssets ADL assets are available in purview

Details along with the data lineage can be explored, edited and enriched in these sections.

ADLOverview Overview of data in ADLS
BrowseSQL Steps to browse SQL tables in purview

Note that the classification section auto-captures the fields that adhere to the defined sensitivity and privacy rule sets.

SQLDetail Detail that can be browsed in purview about sql tables.

Interestingly, within the schema section, there is an option change column names and data types of the tables so this should be used sparingly and well governed from an access perspective.

Additionally, every field in the sample Customer table can be linked to a glossary term, capture descriptions at the column level, and alter column level classifications as needed, which help to further enrich the meta-data and truly provide a detailed and unmatched data governance experience within Azure that has been missing in this space for quite some time.

EditSQlinPurview Steps to edit SQL tables from Purview.
PurviewAssetLineage Lineage of assets in purview
Next Steps





get scripts

next tip button



About the author
MSSQLTips author Ron L'Esteve Ron L'Esteve is a seasoned Data Architect who holds an MBA and MSF. Ron has over 15 years of consulting experience with Microsoft Business Intelligence, data engineering, emerging cloud and big data technologies.

View all my tips


Article Last Updated: 2021-06-08

Comments For This Article





download














get free sql tips
agree to terms