Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Transfer Data to the Cloud Using Azure Data Factory


By:   |   Last Updated: 2019-03-08   |   Comments (5)   |   Related Tips: More > Azure

Problem

In these series of posts, I am going to explore Azure Data Factory (ADF), compare its features against SQL Server Integration Services (SSIS) and show how to use it towards real-life data integration problems. In the previous post, I explained how to create ADF and use the ADF copy tool to add a simple copy pipeline to transfer data between Azure SQL databases. In this post, we will manually create pipeline to import data from a local SQL database to an Azure SQL database and we'll ensure that this pipeline is repeatable (i.e. can run multiple times without causing duplicate Primary Key errors).

Solution

For the purpose of this exercise, I have downloaded and installed the AdventureWorks 2016 database on my local SQL Server. This database contains DimDate table, with 3652 rows and DateKey as PK constraint. I have generated the below script for this table, which I will use to create a target table in Azure SQL database DstDb:

CREATE TABLE [dbo].[DimDate](
   [DateKey] [int] NOT NULL,
   [FullDateAlternateKey] [date] NOT NULL,
   [DayNumberOfWeek] [tinyint] NOT NULL,
   [EnglishDayNameOfWeek] [nvarchar](10) NOT NULL,
   [SpanishDayNameOfWeek] [nvarchar](10) NOT NULL,
   [FrenchDayNameOfWeek] [nvarchar](10) NOT NULL,
   [DayNumberOfMonth] [tinyint] NOT NULL,
   [DayNumberOfYear] [smallint] NOT NULL,
   [WeekNumberOfYear] [tinyint] NOT NULL,
   [EnglishMonthName] [nvarchar](10) NOT NULL,
   [SpanishMonthName] [nvarchar](10) NOT NULL,
   [FrenchMonthName] [nvarchar](10) NOT NULL,
   [MonthNumberOfYear] [tinyint] NOT NULL,
   [CalendarQuarter] [tinyint] NOT NULL,
   [CalendarYear] [smallint] NOT NULL,
   [CalendarSemester] [tinyint] NOT NULL,
   [FiscalQuarter] [tinyint] NOT NULL,
   [FiscalYear] [smallint] NOT NULL,
   [FiscalSemester] [tinyint] NOT NULL,
 CONSTRAINT [PK_DimDate_DateKey] PRIMARY KEY CLUSTERED 
([DateKey] ASC)) ON [PRIMARY]
GO			

Azure Data Factory Data (ADF) Exchange Architecture

ADF leverages a Self-Hosted Integration Runtime (SHIR) service to connect on-premises and Azure data sources. SHIR can run copy activities between a cloud data store and a data store in a private network, and it can dispatch transform activities against compute resources in an on-premises network or an Azure virtual network.

The installation of a self-hosted integration runtime needs to be on an on-premises machine or a virtual machine (VM) inside a private network. SHIR serves as a secure gateway between internal and external networks, it passes encrypted credentials to access internal resources and encrypts outbound data.

As you can see from ADF architecture diagram below, a single IR allows connecting multiple database servers and file servers to a specific Data Factory. It's possible to install multiple instances of SHIR within the same network, to provide high availability for this service and we will cover that configuration in a future post. SHIR does not need to be on the same machine as the data source, however, having the SHIR closer to the data source reduces the time for it to connect to the data source.

database server

There are a few prerequisites for installing and configuring SHIR:

  • The machine hosting IR should have OS version Windows 7 SP1, Windows 8.1, Windows 10, Windows Server 2008 R2 SP1, Windows Server 2012, Windows Server 2012 R2 or Windows Server 2016 and .NET Framework 4.6.1 or later. Installation of the SHIR on a domain controller is not supported.
  • The recommended configuration for the SHIR machine is at least 2 GHz, four cores, 8 GB of RAM, and an 80GB disk.
  • At the corporate firewall level, you need to open outbound port 443 for the following domains:
    • *.servicebus.windows.net
    • *.core.windows.net
    • *.frontend.clouddatahub.net
    • download.microsoft.com

Installing and Configuring Self-hosted Integration Runtime

There are two methods to install SHIR – express setup and manual setup. We will use the express setup, which combines the SHIR installation and configuration.

Here are steps to be executed on the machine which will be hosting SHIR service:

  • Open ADF authoring page and switch to the 'Connections' tab (see previous post for the details)
  • Open 'Integration Runtimes' tab at the top of the page and click the '+' button to open the Integration Runtime Setup dialog window, select 'Perform data movement and dispatch activities...' tab and hit 'Next':
factory resources
  • Select the 'Self-Hosted' option and hit 'Next':
integration runtime setup
  • Enter OnPremIR as the name for the IR service and confirm.
  • Select the web link under 'Option1: Express setup' to download the SHIR installation package.
runtime setup
  • Install the downloaded package. If installation is successful, you will see this screen:
data source credential
  • In order to test SHIR's ability to connect to our local SQL database, open the 'Diagnostics' tab, select SqlServer from the combo box, enter your SQL instance name, database name, Windows credentials and hit 'Test':
sql server
diagnostics

Creating Azure Data Factory Copy Pipeline

ADF pipeline creation involves adding source and destination linked services, datasets, pipeline and activity. Here are the step-by-step instructions:

Linked services

As you may remember from the previous post, copy pipeline requires two linked services: source and destination. We will use destination linked service SqlServerLS_Dst, we created in the previous tip and create a SQL linked service for the source database. Here are the steps:

  • Open 'Connections' tab from 'Factory Resources' panel and open 'Linked services' tab, hit '+' button to open 'New Linked Service' dialog window, type 'SQL', select 'SQL Server' and click 'Continue':
linked services
  • Next, assign the name to the linked service, select OnPremIR from the drop-down list, enter your machine's domain name in the 'Server name' box, fill 'Database name' and credential boxes, as shown in the screenshot. Note: I've used 'Windows Authentication' for this exercise and if you choose to do same, you will need to enter the fully qualified domain user name in the format '[email protected]'. Alternatively, you could also use 'Basic Authentication', if you want to proceed with SQL Authentication. Once all the boxes are filled, click 'Test connection' to test connectivity to the database:
new linked service

Datasets

Once we have source linked service in place, we can add source dataset, here are the required steps:

  • Hit '+' on 'Factory Resources' panel and select 'Dataset'
factory resources
  • Type 'SQL' on 'New Dataset' textbox, select 'SQL server' and confirm:
sql server
  • Type dataset name and open 'Connection' tab:
connection
  • Select source linked service we created in steps 1 and 2, as well as the DimDate table:
connection
  • Open the 'Schema' tab and hit 'Import Schema' button to import the table structure:
import schema
  • Follow the steps above to add the destination dataset for DimDate table, using Azure SQL Database as the dataset type and SqlServerLS_Dst as the linked service and name it as DimDate_DS.

Pipeline and Activity

The last step in this process is adding pipeline and activity. Here are the steps:

  • Hit '+' on 'Factory Resources' panel and select 'Pipeline':
factory resources
  • Assign name to the pipeline:
parameters
  • Expand 'Move & Transform' category on the 'Activities' panel and drag/drop 'Copy Data' activity into central panel:
sql server
  • Select newly added activity and assign a name to it, using the textbox under 'General' tab:
copy data
  • Switch to 'Source' tab and select LocalSQL_DS dataset we created earlier:
source
  • Switch to 'Sink' tab and select 'DimDate_DS' dataset we created earlier. Type query "Delete from DimDate" to ensure the destination table is purged before we insert new rows:
sink
  • Switch to 'Mapping' tab and hit 'Import Schemas' button to generate source/destination field mappings:
import schemas

Publishing Changes

Now that we've added all the required components, the last step is publishing these changes. Hit 'Publish All' and check the notifications area (top right corner) for deployment status:

linked services
trigger now

Execution and Monitoring of Azure Data Factory Pipeline

ADF pipelines can be started either manually or by triggers. We will examine ways to create triggers in one of the future posts, but for now, let's start our new pipeline manually. Click the 'Trigger' button at the top of central panel, select 'Trigger Now' command and hit the 'Finish' button on the 'Pipeline Run' window:

  • To monitor pipeline execution progress, hit the 'Monitor' button on the left side of the screen:
factory resources
  • As you can see from screenshot, pipeline execution has been successful:
pipeline name
  • Querying destination table confirms that all the rows have been transferred successfully:
object explorer
Next Steps


Last Updated: 2019-03-08


next webcast button


next tip button



About the author
MSSQLTips author Fikrat Azizov Fikrat Azizov has been working with SQL Server since 2002. He’s currently working as Senior BI Consultant at BDO Canada.

View all my tips
Related Resources




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Tuesday, March 12, 2019 - 2:05:44 PM - Israel Vives Back To Top

Great article Fikrat,

Best regards!!


Friday, March 08, 2019 - 9:11:41 PM - Fikrat Azizov Back To Top

Thanks Anne,

This link - https://www.howtogeek.com/112564/how-to-create-advanced-firewall-rules-in-the-windows-firewall/ has a good explanation on how to open outbound ports on windows firewall. You may open port 443 for all IP's or limit access to IP addresses of domains listed in the tip.

Hope that helps, good luck!


Friday, March 08, 2019 - 4:02:19 PM - Satya Back To Top

can you please help .

need “main.cmd” file which will execute .msi,.rsp and .ora files and add it in adf integrationruntime.


Friday, March 08, 2019 - 2:11:46 PM - Anne Back To Top

Thanks for the article. It is great tip!

For firewall configuration, could you tell me or point me to an article that instructs how to do the steps below:

At the corporate firewall level, you need to open outbound port 443 for the following domains:

  • *.servicebus.windows.net
  • *.core.windows.net
  • *.frontend.clouddatahub.net
  • download.microsoft.com

Friday, March 08, 2019 - 12:14:55 PM - SatyaNarayana Back To Top

 Hi,

I need to create a Integrationruntime with Oracle client_install.rsp,AttunitySSISOraAdaptersSetup.msi,AttunitySSISOraAdaptersSetupX64.msi and tnsnames.ora.

can you please provide me a main.cmd file for this. 


Learn more about SQL Server tools