Explore Your Data Lake Data Easily with Azure Synapse Analytics


By:   |   Updated: 2020-09-18   |   Comments   |   Related: More > Azure


Problem

Data exploration is an important step in understanding and getting insights. I mentioned some of the exciting data exploration features coming with Azure Synapse Analytics in a previous article, so let's dig into some of these features and learn how we can benefit from them.

Solution

In a previous article, I explained how to create Azure Synapse Analytics workspace and use Synapse Studio to navigate through its main interface. We will now look at how to use some of the features in Azure Synapse Analytics.

Explore data in the Data Lake

Let's navigate to Synapse Studio and open the Data pane. This pane has the following two tabs:

  • Workspace - This tab contains your internal objects, like the SQL DW clusters, Spark clusters and will be empty initially.
  • Linked - This tab contains your external objects, like Data Lake accounts, Cosmos Db and will initially contain the default storage account, associated with your workspace (the one you specified while creating the workspace). In addition, this tab will also contain the datasets you created for the data integration purposes.

Here is the screenshot:

Azure Synapse Analytics workspace

I will use an open source climate CSV file for the purpose of exploration, which you can download here.

Next, navigate to the Linked tab, open the default Data Lake account and the container we specified as a file system folder inside it. Create a folder inside it and upload the climate CSV file, using the top menu buttons:

Azure Synapse Analytics upload data

Once the file is uploaded, right click on its name, and select New SQL script and Select TOP 100 rows, as follows:

Azure Synapse Analytics upload data

As a result, Synapse opens a new tab and automatically generates a SELECT statement to read from this file. Click the Run button on the top menu to browse the results, as displayed below:

Azure Synapse Analytics upload data

As you can see, Synapse has allowed to browse the content of your file, without even requiring its schema, which is great especially if you are dealing with files with an unknown schema.

Notice the Connect to dropdown list at the top of the query window displays the SQL on demand pool, which is a serverless pool provisioned when you created a workspace. This service allows executing your ad-hoc queries. Also, note that you can save a script in your workspace, using the Publish button-this reminds me of the change management tooling included with the Azure Data Factory.

Alternatively, you can persist or discard all your queries at once, using Publish all or Discard all buttons:

Azure Synapse Analytics upload data

As you may have noticed, the results from auto generated script include the column names in the first row. So, letís modify our query by specifying FIRSTROW=2 parameter, adding column aliases and filtering, as follows:

SELECT
    TOP 100 C6 AS LOCAL_DATE, C11 AS MEAN_TEMPERATURE , C13 as MIN_TEMPERATURE, C15 as MAX_TEMPERATURE
FROM
    OPENROWSET(
        BULK 'https://mysynapsestg.dfs.core.windows.net/myfs/test/climate-daily.csv',
        FORMAT = 'CSV',
        FIRSTROW=2,
        PARSER_VERSION='2.0'
    ) AS [result]
WHERE C6 between '2019-01-03' and '2019-01-30'

Next, run this query and change the output results from tabular to charts, using the Chart button located at the top of results pane. Also, let's change the chart properties as follows:

Azure Synapse Analytics chart

Attaching multiple storage accounts to your workspace

You can attach more storage accounts to your workspace, but they must be Azure Data Lake Storage Gen2.

Here are the required steps:

Create a general purpose v2 account from the Azure Portal (see this article for details).

Next, we will need to grant access to the Synapse workspace’s managed identity on this storage account. So, open the Access control tab from the storage account page and use the Add button inside the Add a role assignment tab:

Azure Synapse Analytics access control

Select the Storage Blob Data Contributor role from the Role drop down list. Type your Synapse workspace name in the Select box and click Save button at the bottom of screen:

Azure Synapse Analytics access control

Open the Role assignments tab and confirm that Synapse’s managed identity has been granted the required access:

Azure Synapse Analytics access control

Scroll down and find the Containers button under the Blob service section and use +Container button at the top of the screen, to add a new container:

Azure Synapse Analytics access control

Next, we will need to create a linked service to storage account we just created. Return to Synapse Studio, select Add new resource button at the top left corner and select Connect to external data command, as follows:

Azure Synapse Analytics connect to data

Select the Azure Data Lake Storage Gen 2 option:

Azure Synapse Analytics connect to data

Provide the linked service name, select Azure subscription name and storage name from drop down list and validate connection using Test connection at the bottom of the dialog window, as follows:

Azure Synapse Analytics connect to data

Once the account is linked to your workspace, you can browse its contents and explore the data.

Azure Synapse Analytics connect to data
Next Steps


Last Updated: 2020-09-18


get scripts

next tip button



About the author
MSSQLTips author Fikrat Azizov Fikrat Azizov has been working with SQL Server since 2002 and has earned two MCSE certifications. Heís currently working as a Solutions Architect at Slalom Canada.

View all my tips
Related Resources





Comments For This Article





download





Recommended Reading

Adding Users to Azure SQL Databases

Connect to On-premises Data in Azure Data Factory with the Self-hosted Integration Runtime - Part 1

Azure Database for MySQL

Managing Azure Blueprints with PowerShell

Azure DevOps CI CD using GitHub Repo and Visual Studio Azure SQL Database Project








get free sql tips
agree to terms