How to resolve connectivity issues with SQL Server Availability Groups
After you have completed your AlwaysOn setup and configuration in a multi-subnet environment, you get a number of calls from the application team that they are having intermittent connectivity issues with the database server. You verify that the AlwaysOn IP address is pingable and you are able to connect via SSMS to the Availability Group (AG) name. You advise the application team to issue the command to flush their DNS cache via ipconfig in a command window. This solves the problem for a while and while you are on your next coffee break you get a call again from the application development team that they are having the same problem again.
This intermittent connectivity time out problem might be caused by the multiple IP addresses provided during the setup of the AG listener for the multi-subnet cluster.
During the configuration of your AG listener, you would have to define an IP address for each subnet in your environment. This will have ať dependency for each of the subnet IP addresses.
When your AG is up and running, it is registered in your DNS by the Windows Cluster Service for each subnet. Most organizations have 2 subnets, but there is also the possibility that there may be more. The Windows Cluster will submit all of the IP addresses for the subnet to the DNS server. The DNS server then creates a record for each subnet. I am referring to a Microsoft Windows DNS server, if you have another kind of server used in your organization, it may have another implementation or behavior.
Now from your application, when a client connection is made to the AG listener, the Windows Cluster needs to resolve the AG name to an IP address by searching the DNS server. The DNS server will then return multiple IP addresses registered previously during the configuration of your AG. Remember that there are multiple IP addresses in your DNS list, one for each subnet that you have. The IP address for your primary replica server is online, the rest of the IP addresses for your secondary replicas are offline. Now here lies the connectivity timeout issue. Because not all the IP addresses are online, the IP address that may be returned by the DNS server may be an offline one. Then the client application experiences the timeout problem.
Depending on your DNS server configuration, the client application may have to wait for some time before it returns another IP address that may or may not be online. The default TCP connection attempt is usually 21 seconds. So if the first IP address returned by the DNS server is offline, it will have to wait for another 21 seconds to return another IP address to try to connect. Again, the new IP address returned may or may not be online. Until it has exhausted all the IP addresses, it will be a trial and error approach, until you get that call again from the application team.
To correct and finally solve this intermittent connectivity timeout problem, you need to revisit your AG listener properties. In SSMS, open your AG properties window (as shown below).
From the properties window dialog box above, we zoom in at the RegisterAllProviderIP property. During the configuration of the AG listener via the Windows Cluster, this property is 0 by default. This property simply means that all IP addresses for each of your subnets will be registered in the DNS server.
When the property RegisterAllProvidersIP is set to 1, any client application whose connection string does not use the setting multi-subnetFailover = True, will experience high latency connections and subsequently experience the dreaded connection timeout problem. This occurs because these clients attempt to connect to all the registered IP addresses sequentially.
Note that to take advantage of the AG listener multi-subnet feature, the application will need a data provider that supports the multi-subnetFailover keyword in the connection string property.
In contrast, if RegisterAllProvidersIP is changed to 0, the active IP address is registered in the Client Access Point in the WSFC cluster, reducing latency for legacy applications.
The recommendation here is that, if you have legacy client applications that need to connect to an availability group listener and cannot use the multi-subnetFailover keyword, the property RegisterAllProvidersIP should be set to 0. If the property RegisterAllProvidersIP is set to 1, the client application connection string should add the multi-subnetFailover = True setting.
To view the same property in PowerShell, you may issue the following command (see figure below)
Then you will get the following result as below.
I hope the above recommendations solve the intermittent connectivity problem and makes your application failover test a success.
- Perform a more thorough client application test, by setting the RegisterAllProviderIP property for your scenario.
- For more help on the configuration of the AG listener, click here.
- For more articles on SQL Server 2012 AlwaysOn, click here.
About the author
View all my tips