Connect to On-premises Data in Azure Data Factory with the Self-hosted Integration Runtime - Part 1


By:   |   Updated: 2018-12-10   |   Comments (21)   |   Related: > Azure


Problem

I have created a pipeline in Azure Data Factory. I want to copy data from my SQL Server instance hosted on my local server. I can connect easily to cloud sources, but not to my on-premises sources. How can I achieve this?

Solution

Azure Data Factory is a scalable data integration service in the Azure cloud. In Azure Data Factory, you can create pipelines (which on a high-level can be compared with SSIS control flows). In a pipeline, you can put several activities, such as copy data to blob storage, executing a web task, executing a SSIS package and so on. A common scenario is to copy data from local sources to the cloud. In this case, Azure Data Factory needs to be able to connect to those sources. This can be achieved by using the self-hosted integration runtime.

The self-hosted integration runtime is a service running in Azure Data Factory, but you can add local compute nodes on local servers in your on-premises network. A connection is created between the nodes and the integration runtime within your Azure Data Factory (ADF) in Azure. Through this connection, ADF can reach your local data and copy it securely to the cloud. This set-up is very similar to the Power BI on-premises gateway. In fact, the self-hosted integration runtime used to be called the "data management gateway".

An alternative to using the self-hosted integration runtime (IR), is to use an on-premises ETL tool - like SSIS - to push the data to the cloud. However, this means you need a SQL Server license and a server. The self-hosted IR service is free to use and can be run on any local server. If you want to migrate your data infrastructure to the cloud, Azure Data Factory (and other Azure tools like Logic Apps) are a better choice.

It's important to note that the self-hosted IR will grant ADF access to your on-premises sources, but not the Azure-SSIS IR. If you want SSIS packages to connect to your local environment, you need to join the Azure-SSIS IR to a virtual network.

Setting Up the Self-Hosted Integration Runtime

The first step is setting up the self-hosted IR in ADF. Then we install a node (or multiple nodes) on a local server and connect the node with the integration runtime.

In ADF, go to the edit pane, then to connections (at the bottom) and finally to the Integration Runtimes. There, click on the New button to create a new IR.

create new IR

In the wizard, you get two options for the integration runtimes: one for external computes and the Azure-SSIS IR. We're going to pick the first one. For setting up an Azure-SSIS IR, check out the tip Configure an Azure SQL Server Integration Services Integration Runtime.

pick the correct IR type

You can either create an IR in Azure or a self-hosted IR. We're going to choose the second option.

choose self-hosted IR

Next, we're going to provide a name and a description.

provide a name and description

To configure the self-hosted IR node on your local machine, you can choose between an express setup or a manual install.

express or manual setup

However, I got the following error when I tried to download the express setup:

download error

Manual installation and configuration it is then. You can download the ADF IR from the Microsoft site:

download manual installs

After downloading, run the setup on the server that will host your self-hosted IR node. Ideally, this server is close to your data. There's an easy to follow setup wizard:

connect to on premises data in azure data factory self hosted integration runtime 008

When the setup is finished, you can configure the node. First, we need to register it to the IR we created in ADF. Copy one of the authentication keys which you can find in the ADF IR setup:

copy authentication key

Paste the key into the local IR config menu and click Register:

register IR

Next you can specify a name for your local node of the self-hosted IR.

specify IR node name

If necessary, you can configure remote access as well:

configure remote access

When everything went successfully, you'll get a message stating the local node has been registered successfully to the self-hosted IR.

connect to on premises data in azure data factory self hosted integration runtime 013

After the setup, the configuration manager of the node will be shown. Here you can consult the status of the service and see if it is connected to the cloud service.

node configuration manager

When you go back to the connections in ADF, you can see the self-hosted IR among the other IRs. It's possible the status is unavailable, while the node configuration manager says the connection is successful. In most cases, you simply need to refresh the screen.

IR statuses

You can click on the monitor icon to get more detail about your self-hosted IR:

monitoring the IR

Creating a connection to a local database

Before we can pull data from our on-premises server, we need to create a linked service to the database. In the connection pane, go to Linked Services and click on New.

create new linked service

In the wizard, choose SQL Server as the data store type.

connect to on premises data in azure data factory self hosted integration runtime 018

Click Continue to go to the configuration screen of the linked service. There you'll need to specify a name, server name, database name and connection credentials. Don't forget to select the self-hosted IR from the dropdown.

configure linked service

At the bottom, you can test your connection. Click Finish to create the linked service.

test connection

Conclusion

In the first part of this tip, we've configured a self-hosted integration runtime in Azure Data Factory. Then we added a note to this IR on our local server. We created a linked service in ADF to our local instance of SQL Server and the connection was successful. In the next part of this tip, we'll see how we can use this setup to copy data from the SQL Server database to Azure Blob Storage.

Next Steps





get scripts

next tip button



About the author
MSSQLTips author Koen Verbeeck Koen Verbeeck is a BI professional, specializing in the Microsoft BI stack with a particular love for SSIS.

View all my tips


Article Last Updated: 2018-12-10

Comments For This Article




Wednesday, July 14, 2021 - 2:28:30 AM - Koen Verbeeck Back To Top (89000)
Hi Sam,

I believe it uses public network, which you can secure as described here:
https://docs.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime#tlsssl-certificate-requirements

However, it also seems you could use a private endpoint, as discussed here:
https://stackoverflow.com/questions/67115488/how-do-i-connect-an-azure-self-hosted-integration-runtime-to-a-data-factory-priv

Regarding the scale out using multiple nodes, the only thing I could find is that a single copy activity will partition itself over the different nodes:
https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance

Regards,
Koen

Monday, July 12, 2021 - 1:15:46 PM - sam Back To Top (88994)
Hi,

If I want to configure 4 nodes for my Self hosted IR, how will the workload be distributed between the 4 nodes? Is Azure data gateway also required for using multiple nodes? If yes, how do we integrate data gateway with IR ?

Also does the self hosted IR data transfer uses private network between on-premise and azure vnet or it uses public network ?

Tuesday, April 27, 2021 - 5:47:22 AM - Koen Verbeeck Back To Top (88609)
Hi Santhosh,

I would look into PowerShell for extracting data from AD.

Sunday, April 25, 2021 - 2:45:31 AM - Santhosh Back To Top (88601)
I have been tasked to integrate an on-premise active directory with the cloud Azure SQL database( Managed Instance).

The task is to pull the identity data from on-prem AD and pump it into a SQL table in the managed SQL instance on Azure. Can ADF connect on-prem AD with cloud Azure SQL?. Some pointers, please?

Thursday, April 8, 2021 - 8:07:10 AM - Koen Verbeeck Back To Top (88496)
Hi Santosh,

this is hard to troubleshoot from remote. It might be a firewall issue, as indicated by the error. Can you log in with the user yourself on your local machine? Does the user have permission to access the database? Is the self-hosted IR registered as up and running in the list of integration runtimes in ADF?

Koen

Friday, April 2, 2021 - 6:46:36 AM - Santosh Singh Back To Top (88484)
Hi,
I am begineer in ADF. Please help.

I am facing connection issue while creating linked services for on-premises sql server. Please help. Below is error

Cannot connect to SQL Database: '(LocalDB)\MSSQLLocalDB', Database: 'AdventureWorksDW2016', User: 'santo'. Check the linked service configuration is correct, and make sure the SQL Database firewall allows the integration runtime to access.
Login failed for user 'santo'., SqlErrorNumber=18456,Class=14,State=1,
Activity ID: e1d0b6c6-c54c-45cb-bbfb-78bb936af562

Wednesday, March 10, 2021 - 9:06:46 AM - Nobody Back To Top (88373)
Great article! This really helped me. Thank you!

Tuesday, January 12, 2021 - 10:32:46 AM - Koen Verbeeck Back To Top (88037)
Hi Shashank,

for the moment this does not seem to be possible. Linux is not listed in the system requirements.
https://www.microsoft.com/en-us/download/details.aspx?id=39717

Regards,
Koen

Tuesday, January 12, 2021 - 9:28:43 AM - Shashank Kharade Back To Top (88036)
Can i install Self hosted IR on linux machine?

Monday, October 5, 2020 - 2:30:12 AM - Koen Verbeeck Back To Top (86596)
Hi Archit,
maybe you can try this?
https://prologika.com/solving-configuration-errors-with-adf-self-hosted-ir/

Thursday, October 1, 2020 - 1:59:54 PM - Koen Verbeeck Back To Top (86581)
Hi Archit,
it seems there's an issue with encryption, which is far out of my comfort zone. It might be dependent on how you've set up your network configuration, authentication & authorization etc.

Regards,
Koen

Wednesday, September 30, 2020 - 8:37:48 AM - Archit Sharma Back To Top (86560)
Hi Koen,

First of all thank you for such a elaborated explanation. I have gone through these steps very carefully but getting an error stating "A connection was successfully established with the server, but then an error occurred during the login process. (provider: SSL Provider, error: 0 - The certificate chain was issued by an authority that is not trusted.), SqlErrorNumber=-2146893019,Class=20,State=0,
The certificate chain was issued by an authority that is not trusted".

I have tried every solution but isn't able to solve the issue till now. The server is on the Azure SQL server windows 2019.
please provide a solution, it will be a great help.

Thank you

Thursday, July 9, 2020 - 3:39:52 PM - Koen Verbeeck Back To Top (86114)

Hi FIIS,

I haven't tried it out myself, so I don't know the answer. So I would say: just try it :) (in a dev env of course :)

Koen


Wednesday, July 1, 2020 - 12:39:39 PM - FIIS Back To Top (86077)

First of all thank you for all your posts, i do check your posts and thank you so much for it.

What i want to know is lets say for any reason the SHIR is in failure state or Running Limited state, can we then delete the existing one and create a new SHIR with the same name and link all the linked services to this IR, will it work ? or will it have issues?


Thursday, October 31, 2019 - 8:14:54 AM - Koen Verbeeck Back To Top (82942)

Hi Mohak,

currently there are no alternatives for the self-hosted IR: it's only availabe on Windows. I have no idea if there are any plans to change this.

Regards,
Koen


Wednesday, October 30, 2019 - 8:29:29 AM - Mohak Saxena Back To Top (82933)

We have created self hosted IR using a Windows based on-prem server. My questions is that in today's era of open source and doing away dependencies is there no alternate of using a Linux based server for IR for Azure SQL Warehouse and ADF?

If yes what are the alternatives?

If no, does Microsoft has plans to do this?


Tuesday, October 22, 2019 - 7:33:43 AM - Koen Verbeeck Back To Top (82856)

Hi Richard,

the IR provides a compute environment for your SSIS packages. Think of it as a virtual machine where your packages will run. All other stuff, e.g. designing and configuring your packages is just like before. You create your SSIS packages in Visual Studio, you deploy them to the SSISDB catalog where you also configure them with environments. You can access the catalog with SSMS.

Regards,
Koen


Monday, October 21, 2019 - 1:15:40 PM - Richard Back To Top (82848)

Hi Koen,

I would like to use the Self Hosted IR in conjunction with SSIS in Azure but am stuck after setting up the IR. 

The IR was setup succesfully with help of your article. How, if possible, can I configure a connection manager, in my SSIS package?

All tips and comments are greatly appreciated.

Regards, Richard


Monday, October 21, 2019 - 10:51:12 AM - Sid Back To Top (82845)

Can we install self-hosted integration runtime on a Windows VM sitting inside a AWS or Azure itself?

Thank you


Friday, March 8, 2019 - 9:25:50 AM - Koen Verbeeck Back To Top (79217)

Hi Anne,

you'll probably want to check your firewall settings. It seems the node cannot connec to the endpoint in the cloud.

Regards,
Koen


Thursday, March 7, 2019 - 5:37:01 PM - anne Back To Top (79206)

Thanks for the article!

I tried to register the integration runtime on my local computer, it gives an error:

The integration runtime (self-hosted) node has encoutered an error during registration.

The integration runtime node failed to connect to the cloud service dure to network connectivty issues. failed to connect to wu2.forntend.cloudatahub.net

What shoudl I do then?

Thanks



download














get free sql tips
agree to terms