SHIR Error - Self-hosted node is reconnecting to the cloud service

Problem

In our organization, most data is stored on-premises with a limited set of less critical data is in the cloud. We use Azure to benefit from the cloud environment and Azure Data Factory (ADF) to move data.

With ADF, there are many components that need to integrate within the environment. The data on our on-premises servers needs to be shifted to the cloud periodically and we use Self-hosted Integration Runtime.

Our developers complain an ADF pipeline is failing with error: ‘The Self-hosted Integration Runtime is offline…’ What does this mean?

Solution

The Self-Hosted Integration Runtime (SHIR) is a component which is used by Azure Data Factory (ADF) and Azure Synapse Analytics for enabling seamless integration between Azure cloud services and on-premises servers.

Self-hosted node is reconnecting to the cloud service error

This error occurs when the SHIR server and Azure cannot communicate with each other. The way the SHIR service works is it tries to reach the Azure cloud (explained later) on a periodic time interval. However, sometimes due to a network issue or configuration changes, it cannot reach the cloud.

Apart from the network and configuration issues, there might be other issues like the SHIR server is on high resource usage and/or the authentication or certificates are expired or invalid resulting in the error.

Here is the error:

Self-hosted mode is reconnecting to the cloud service

Reviewing the Applications and Services Log

When the error occurs, the first thing to check is the error log. In Event Viewer, under Applications and Services Logs, you can find the Integration Runtime sub folder with all logged events during this error phase.

Event logs can give you important information regarding the error. In the screenshot below, we can gather some important information regarding the error.

You can see the event name, ID, and time of the error. Also, in the first line of the description says ‘Error occurred while uploading the heartbeat for the node…’ The Heartbeat means that the SHIR service sent a periodic signal to the Azure cloud to indicate that it is operational and connected to the cloud service. Due to a network hiccup, it can’t upload the heartbeat to Azure.

SHIR Server Issue

We already discussed some reasons why the SHIR service is not able to communicate with Azure. In our case, we had a network hiccup that caused the issue. However, in most cases, the SHIR will automatically start communicating with the Azure cloud once the network issue is resolved without any external intervention.

But this didn’t happen in our case. The network hiccup occurred after a patching activity. Since the server needed to be rebooted and all other services were up and running, the status was checked in the services.. The SHIR service was showing that it was up and operational, but Microsoft Integration Runtime Configuration Manager still showed the same error. After conducting a Root Cause Analysis, the reason for the strange behavior from the SHIR Service was the patching of the server, which affected the network connectivity, resulting in the SHIR Service in a hung state. After rebooting the SHIR Service from Configuration Manager, the issue was resolved.

Conclusion

To sum up, while the Self-Hosted Integration Runtime (SHIR) usually starts communicating with the Azure cloud automatically once the network issue is resolved, in our case, the issue was caused by a server patching activity that impacted the network connectivity and caused the SHIR service to enter a hung state.

Even though the SHIR service appeared to be running, it was still unable to connect with Azure, and the Microsoft Integration Runtime Configuration Manager showed the same error. After identifying the root cause, it was clear that the patching affected the network connectivity, and a simple restart of the SHIR service through the Configuration Manager resolved the issue.

This experience highlights the need for careful checks after patching or network changes, especially when dealing with services like SHIR. While the service generally recovers automatically, it’s crucial to monitor the state of the SHIR service and be ready to manually intervene if required.

Restarting the SHIR service was the key step to getting everything back on track.

Next Steps

Check out the following resources:

Resources

Community

Subscribe

SHIR Error – Self-hosted node is reconnecting to the cloud service