Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Scripts          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Transfer On-Premises Files to Azure SQL Database


By:   |   Updated: 2019-05-29   |   Comments   |   Related: More > Azure

Storage and High Availability Options for SQL Server in the Cloud

Free MSSQLTips Webinar: Storage and High Availability Options for SQL Server in the Cloud

This webinar will cover best practices for optimizing cloud storage and cost, how to leverage the cloud for disaster recovery, availability options and requirements for SQL Server and key factors to consider in your selection.


Problem

In these series of posts, I am going to explore Azure Data Factory (ADF), compare its features against SQL Server Integration Services (SSIS) and show how to use it for real-life data integration problems. In the previous post, we learned how to create a pipeline to transfer CSV files between an on-premises machine and Azure Blob Storage. In this post, I’ll explain how to create ADF pipelines to transfer CSV files from Blob Storage into Azure SQL Database.

Solution

Before getting started, I recommend reading the previous tip Transfer On-Premises Files to Azure Blob Storage.

Data Exchange Architecture

Since we have already built a pipeline to transfer CSV files from an on-premises machine into Azure Blob Storage, the next step is to create a pipeline to push these files into Azure SQL Database, see reference architecture below:

data factory

Preparing Destination Tables

ADF will not create tables in the destination database by default, so I have created a FactInternetSales table with the same structure as my source table, in the destination Azure SQL Database (DstDb), using the below script:

CREATE TABLE [dbo].[FactInternetSales](
   [ProductKey] [int] NOT NULL,
   [OrderDateKey] [int] NOT NULL,
   [DueDateKey] [int] NOT NULL,
   [ShipDateKey] [int] NOT NULL,
   [CustomerKey] [int] NOT NULL,
   [PromotionKey] [int] NOT NULL,
   [CurrencyKey] [int] NOT NULL,
   [SalesTerritoryKey] [int] NOT NULL,
   [SalesOrderNumber] [nvarchar](20) NOT NULL,
   [SalesOrderLineNumber] [tinyint] NOT NULL,
   [RevisionNumber] [tinyint] NOT NULL,
   [OrderQuantity] [smallint] NOT NULL,
   [UnitPrice] [money] NOT NULL,
   [ExtendedAmount] [money] NOT NULL,
   [UnitPriceDiscountPct] [float] NOT NULL,
   [DiscountAmount] [float] NOT NULL,
   [ProductStandardCost] [money] NOT NULL,
   [TotalProductCost] [money] NOT NULL,
   [SalesAmount] [money] NOT NULL,
   [TaxAmt] [money] NOT NULL,
   [Freight] [money] NOT NULL,
   [CarrierTrackingNumber] [nvarchar](25) NULL,
   [CustomerPONumber] [nvarchar](25) NULL,
   [OrderDate] [datetime] NULL,
   [DueDate] [datetime] NULL,
   [ShipDate] [datetime] NULL,
 CONSTRAINT [PK_FactInternetSales_SalesOrderNumber_SalesOrderLineNumber] PRIMARY KEY CLUSTERED 
([SalesOrderNumber] ASC,[SalesOrderLineNumber] ASC)
)
GO			

Creating Copy Pipeline

If you have followed the previous posts, you should be able to create linked services, datasets and pipelines, so I will skip these screenshots for some of the minor steps, to save space here. Furthermore, since we have already created datasets and linked services for Blob Storage, we can use them as source components for the new pipeline.

Please follow the below steps to create the pipeline:

  • Linked services - In the previous tips we have already setup linked services for both source and destination, so well proceed to the next step.
  • Datasets - We need blob storage as a source and Azure SQL Database as destination datasets. Here are the required steps to add source and destination datasets. Since the source dataset for this pipeline is going to be similar to the blob storage dataset BlobSTG_DS we created in the previous post, lets select it and create a duplicate using the clone command.
factory resources
  • I named the new dataset BlobSTG_DS2:
azure blob storage
  • To create destination dataset, let’s add a new dataset object of Azure SQL Database type from the Factory Resources panel, assign the name FactInternetSales_DS and switch to the Connection tab
  • Select the SqlServerLS_Dst linked service for Azure SQL Database we created earlier and select the FactIntenetSales table from the drop-down list:
fact internet sales
  • Switch to the Schema tab and import the table structure using the Import Schema button:
import schema
  • Pipeline and activity. The last step in this process is adding the pipeline and activity. Here are the steps:
    • Add new Pipeline object from the Factory Resources panel and assign the name (I named it Blob_SQL_PL)
    • Expand the Move & Transform category on the Activities panel and drag/drop the Copy Data activity into the central panel
    • Select the newly added activity and assign the name (I named it FactInternetSales_AC)
    • Switch to the Source tab and select the BlobStg_DS2 dataset we created earlier:
source dataset
  • Switch to the Sink tab and select the FactInternetSales_DS dataset we created earlier and enter a purge query for the destination table:
fact internet sales
  • Switch to the Mapping tab and select Import Schemas:
import schemas
  • Publishing changes - Finally, let’s publish the changes, using the Publish All button and check the notifications area for the deployment status.

Execution and Monitoring

Let's kick-off the pipeline using the Trigger Now command under the Trigger menu:

trigger now

Now, let's switch to the ADF Monitoring page to ensure that pipeline execution was successful:

pipeline name

To verify that files have been transferred successfully, I’ve opened SSMS and checked the content of the target table in the Azure SQL Database:

select top

Conclusion

In this post, we completed building a data flow to transfer files from an on-premises machine to an Azure SQL Database. As you may have observed, the ADF pipeline building process resembles building data flow tasks for SSIS, where you create source and destination components and build mapping between them.

If you followed the previous posts, you should be able to create simple pipelines to transfer data between various file and database systems. Although these kinds of pipelines are sufficient for transferring small data sets, you will need to build more sophisticated pipelines with incremental data processing capabilities for larger tables. You may also need to automate pipeline executions to run on pre-scheduled timelines or in response to certain events. We’ll discuss all these capabilities in future posts.

Next Steps


Last Updated: 2019-05-29


get scripts

next tip button



About the author
MSSQLTips author Fikrat Azizov Fikrat Azizov has been working with SQL Server since 2002. Hes currently working as Senior BI Consultant at BDO Canada.

View all my tips
Related Resources




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Learn more about SQL Server tools