Data Transfer from Snowflake to Azure Blob using Synapse Analytics

By:   |   Updated: 2023-05-18   |   Comments (3)   |   Related: > Azure Synapse Analytics


Problem

One premise of a modern warehouse is to easily interconnect cloud services to process and analyze data from any provider. Two of the top cloud services right now are Azure Synapse and Snowflake. Whatever your organization is implementing or if you need to transfer data to/from those services, it is important to know how to interconnect them and be able to transfer data between services.

Solution

Snowflake is a native-cloud, self-managed service for enterprise-level modern data applications. It is becoming one of the top data services in the world because of its capabilities and ease of use.

Azure Synapse is an enterprise-level analytics service. It provides analytics and big data capabilities using multiple languages and engines, with native support with other services like Power BI, Databricks, and machine learning and cognitive services.

This tip will provide a simple example of how to copy data from Snowflake using the Azure Synapse Analytics pipeline. The next tip will discuss how to push data to Snowflake.

Example Set Up

Today, we will assume that we have a Synapse Analytics service running, a Snowflake account, and at least one blob storage account.

How to create a Synapse Analytics service is outside the scope of this tip, but you can go to your Azure account at https://portal.azure.com and create a Synapse service:

Azure marketplace to create synapse service

For a Snowflake account, you can go to https://www.snowflake.com and create a new account using the Start for Free button.

Create a new snowflake trial account

Example: Copying Data from Snowflake to Blob Storage using Synapse

I have created a sample database in Snowflake with one table named SHIPPING_INFO.

Sample Data to export

We will write the table contents to a parquet file on the blob storage using Synapse Analytics.

To open Synapse Analytics, go to https://web.azuresynapse.net and select your workspace.

On the home page, configure Snowflake as a linked service. Go to Manage > Linked Services > New.

How to create new linked services

In the window that opens, search for Snowflake.

snowflake linked service

In the New linked service properties window, input the connection details for your Snowflake account:

new snowflake linked service

Note: If you do not know your Snowflake account name, you can obtain it on the Snowflake portal at the bottom left of the home page:

obtain snowflake account URL

Then we need to add an additional linked service for the destination, in our case, blob storage, since a direct copy from Snowflake only supports blob storage at the time of the writing of this tip.

Add a new linked service and select blob storage:

new blob storage linked service

Only SAS authentication is supported at the time of this article, so select SAS URI as the authentication type:

blob storage linked service properties

Note: If you do not know how to generate a SAS URL and token, you can do it from the Azure portal here. Select the destination folder in your Blob storage and choose the Generate SAS option.

how to generate SAS url and token

You can select the permissions you want. Remember you need at least create, write permissions to create files.  Also, choose the expiration and other options. Once you have made your selections, click on Generate SAS token and URL option:

obtain SAS URL and Token

Once you have obtained the information, you can go back to Synapse and copy/paste it:

copy SAS url and token

Once you have created your linked services, you can create a pipeline for the data copy.

In the Synapse portal, go to Integrate on the left side, and then choose new Pipeline:

create new pipeline

As a first step, create a meaningful name for the pipeline. I named it CopyfromSnowflake.

Create new pipeline: name

We will use a Copy Data activity. Just select it and drag and drop it on the right side:

Create new pipeline: add activity

Under the General tab, add a meaningful name to the task. You can leave the other fields as the defaults.

Create new pipeline: name of the activity

Under Source, select a new dataset by clicking +New:

Create new pipeline: source

Search for Snowflake and choose from the results:

Create new pipeline: source snowflake

In the new window, add a meaningful name, and select the linked service created earlier:

Create new pipeline: source linked service

Select the table name and leave the other options as default. Click OK.

Create new pipeline: source table

Note: In future tips, we will see how we can dynamically obtain table and schema names. For now, we will use a static table to understand the basics easily.

Select new Dataset for Sink:

Create new pipeline: sink

For direct copy from Snowflake, only Blob Storage is supported at the time of the writing of this article. Select the blob storage we configured earlier:

Create new pipeline: sink blob storage

Select the file format, in our case, a Parquet file:

Create new pipeline: sink file format

Put a destination name and select the linked service name for the blob storage created earlier:

Create new pipeline: sink linked service

Select the location where you want to save the file.

Important: Since we will dynamically create the file name, do not select a destination file. And for the Import Schema option, choose None.

Create new pipeline: sink properties

Double-click on the file name field to generate the formula for the file name and select Add Dynamic content.

Note: If this option is unavailable the first time, save the destination, and edit it again so that you can see it.

Create new pipeline: sink dynamic content

In the new window, input this formula for the file name. You can modify it as you want.

@concat(pipeline().RunId, '_snowflake.parquet')

It will be the run id plus the file name extension. Once ready, click OK.

Create new pipeline: sink filename expression

We are ready to evaluate the pipeline, so the last step is to click Validate.

Validate Pipeline

If everything is ok, validation will be successful.

validate pipeline output

If you have followed the steps on this tip and you find an error at this point, it is something related to permissions. Review the error message and troubleshoot any missing permissions.

To save the pipeline, now click on publish all:

publish changes

For the execution, we can evaluate it with the Debug option or create a trigger to automatically execute once or on a scheduled basis.

First, try the Debug option to troubleshoot any potential runtime issues.

debug new pipeline

You will see in the Output tab that the pipeline has been queued.

debug in progress

Depending on your table size, if everything is ok after a short period, you will see a successful execution.

debug completed

Now, to ensure the file has been properly created, browse the destination folder using the Azure portal.

Validate File exists in blob storage

We can see that the file was properly created.

Now, let’s create a trigger.

Return to the Pipeline editor and select the Add Trigger option.

Add trigger option

Trigger now is for ad-hoc executions, and New/Edit is for scheduled executions. We can choose the second one.

Add new trigger

Let us create a new trigger. You have many options for the type of trigger, like schedule, tumbling window, and events.

Trigger types

For simplicity, let’s choose a Schedule type:

new scheduled trigger

Now you have created your Synapse pipeline and the trigger.

How to Monitor

In Synapse, select the Monitor option.

Synapse monitor option

You can use both Pipeline or Trigger runs to check the execution outcomes:

Monitor triggers and pipelines

If you want to modify or disable the trigger, you can go to Manage and select Triggers:

Trigger management and edit

The next tip will instruct how to copy data to Snowflake.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Eduardo Pivaral Eduardo Pivaral is an MCSA, SQL Server Database Administrator and Developer with over 15 years experience working in large environments.

View all my tips


Article Last Updated: 2023-05-18

Comments For This Article




Thursday, June 15, 2023 - 10:25:36 AM - Eduardo Pivaral Back To Top (91298)
Regarding Key Vault, yes that is the proper way to do it. In this case I use SAS because I try to keep my tips as simple as possible so people understand what they need, and in base on that they build their own solution.

For Parquet, it is a compressed file format, so it usually takes less storage space compared to .csv, but you can use any of the file formats you want, some people that still need to use older ETL solutions, are forced to use csv, so no problem with that.

Sunday, June 11, 2023 - 7:59:04 PM - Vic Hindocha Back To Top (91272)
Overall great article, I would change connection method to snowflake, pull credentials from KeyVault itís securer.

Friday, June 9, 2023 - 3:51:45 AM - Andrew Back To Top (91269)
Hi Eduardo, thanks for the tip, just about to do this ourselves so a great help.
One question, why did you choose Parquet as the file format? Why that rather than one of the other types? Are there limitations or benefits of that type over the others? We are going to be ingesting into Azure synapse Analytics, probably using an EXTERNAL TABLE, or COPY INTO. Do you have a preference?
Thanks in advance
Andrew