Build your first Azure Dara Factory Pipeline


By:
Overview

We're going to build a pipeline using the Copy Data tool. This tool makes it easier for people starting out with ADF to create their first pipelines. Before we start, we need to make sure some prerequisites are met.

Prerequisites

If you haven't already, follow the steps of the previous part of the tutorial to set up ADF, a storage account with a blob container and an Azure SQL DB.

In the Azure Portal, go to your storage account and then to the "data-input" container we created. Click on the Upload link.

upload a blob file

A pane will open where you can select a local file. Upload the Customers.csv file, which you can download here.

select file to upload

Click Upload to put the file in the blob container.

Create the Pipeline

Go to ADF Studio and click on the Ingest tile. This will open the Copy Data tool.

click on ingest

In the first step, we can choose to simply copy data from one location to another, or to create a more dynamic, metadata-driven task. We'll choose the first option. Parameters and metadata are covered later in this tutorial.

copy data tool first step

You can also choose how the resulting pipeline needs to be scheduled. For now, we're going with "run once now". Schedules and triggers are also discussed later in the tutorial.

In step 2, we need to choose the type of our source data. This will be our csv file in the blob container. Azure Blob Storage is the first option in the dropdown:

select blob storage as source

We also need to define the connection to the blob container. Since we don't have any connections yet in ADF, we need to create a new one by clicking on "New connection". In the new pane, give the new connection a name and leave the default for the integration runtime (also covered later in the tutorial). As authentication type, choose account key. Since the blob storage is in the same Azure tenant as ADF, we can simply choose it from the dropdowns. Select the correct subscription and the correct storage account.

define connection to blob storage

Finally, you can test your connection. If it is successful, click on Create to create the new connection. The screen for step 2 should look like this:

step 2 after creating connection

We now need to select a file from the connection we just created. Click on Browse to open a new pane to select the file. Choose the Customers.csv file we uploaded in the prerequisites section.

select csv file

ADF will automatically detect it's a csv file and will populate most of the configuration fields for you.

configure csv file options

Make sure the first row is selected as a header. You can do a preview of the data to check if everything is OK:

preview data

Now we need to configure our destination in step 3. Search for "sql" and select Azure SQL Database from the dropdown list.

select destination type

Like with the source, we will also need to define a new connection. Give it a name and select the correct subscription, server and database from the dropdowns. If everything is in the same Azure tenant, this should be straight forward.

define sql connection

Choose the authentication type that you configured during the setup of the SQL Server. In the following screenshot, I chose SQL authentication, so I need to supply a username and a password.

supply authentication details

You can test the connection to see if everything works. Make sure you gave Azure Services access to the SQL server – as shown in the previous part of the tutorial – or you will get a firewall error. Once the connection is created, we need to choose the destination table. You can either choose an existing table or let ADF create one for you. Fill in dbo as the schema and Tutorial_StagingCustomer as the table name.

choose table

Next, we need to define the mapping. A mapping defines how each column of the source is mapped against the columns of the destination. Since ADF is creating the table, everything should be mapped automatically.

column mapping

If you want, you can supply a pre-copy script. This is a SQL statement that will be executed right before the data is loaded. In a recurring pipeline, you can for example issue a TRUNCATE TABLE statement to empty the table. Here it would fail, since ADF first needs to create the table. If you try to truncate it, it will fail since the table doesn't exist yet.

Now we're in step 4 of the tool and we can define general settings for the pipeline. You can change the name of the pipeline. Leave everything else to the defaults.

step 4 - settings

In the final step, you can review all the configurations we made in the previous steps.

review step

Click Next. ADF will create the pipeline and will run it once.

pipeline created and run successfully

We can verify a pipeline has been created when we check the factory resources by clicking the pencil icon in the left menu.

our first pipeline!

We can also check in the database itself that a new table has been created and has been populated with the data from the CSV file:

table created with data
Additional Information





Comments For This Article

















get free sql tips
agree to terms