SQL Server AlwaysOn Availability Groups Fail the Initial Failover Test


By:   |   Updated: 2014-08-18   |   Comments   |   Related: More > Availability Groups


Problem

In a previous tip, you have learned to setup AlwaysOn in SQL Server 2012. After a successful setup and everything seems to be in the right place, you are preparing to test your AlwaysOn to failover to your secondary replica for the first time. The first test is successful, but after testing to swing back again to your primary replica, you find out that automatic failover works only the first time.  How come?

Solution

This scenario, where in your automatic failover testing works only on the first time is an expected behavior and there is no cause for alarm.

One important setting that you need to check in this scenario is the setting of "Maximum Failures in the specified period." This value can be found in the Application property of your failover cluster settings. From your Server Manager console, go to Features > Failover Cluster Manager > Your Cluster > Services and Applications > Your Application > right click and select "Properties." You should be in the same menu as shown below.

Failover Cluster Manager Properties

In the property window, click on failover tab. You should see the following in the Failover part;

Failover Tab

By default the failover property settings for the resource group is:

  1. Maximum Failures in the specified period is set to (n-1), where n is the number of nodes participating in your cluster. If the number of nodes that you have is 2, subtracted by 1 is equal 1. So your automatic failover will work only the first time after your setup.
  2. Period (measured in hours) is set to 6.

With these default values, this means that if a failover happens and the resources does not come online, the cluster services tries to get the resources online only once and in 6 hours. These values you can play around during your testing only. When you go Live or roll out to production, these values must reflect your organization's RTO and RPO documented policies.

As for the failback settings, see bottom part of the image above, you can play around with the 2 options depending on the preferred owner for your cluster. In my case, in the General tab I had the 2 replicas selected as preferred owner, and I opted for the Allow failback with a specified time option.

Generally, these settings are also called the failover threshold. It determines if you have reached the limit of how many failovers you can have in a span of, in the setting above, 6 hours.

Other way that you can verify if this is really what is happening to your testing is to view the cluster logs. A quick way to view the cluster logs is open it in your Failover Cluster Manager > Your cluster > expand this and go to Cluster Events, right click and go to Query. Enter the query you want to filter (the failover that happened) and click OK.

Cluster Events Query

To summarize, what happened in your initial failover test is that you have reached the threshold setting. This threshold setting is to prevent the ping pong scenario of your cluster failing back and forth frequently. If it does happen too frequent for you, then it should be investigated deeply as there could be a more serious cause as to why is this happening. For my case, it was a hardware problem and I had to call the vendor again. It is a good thing that you are still in the testing stage.

Next Steps
  • If you are still having issues with your automatic failover testing, you might want to double check if your WSFC is configured correctly. For more help on WSFC configuration, you may click here
  • For more tips about AlwaysOn and Availability Groups, click here.


Last Updated: 2014-08-18


get scripts

next tip button



About the author
MSSQLTips author Carla Abanes Carla Abanes works for a private bank in Singapore as a SQL Server DBA.

View all my tips
Related Resources





Comments For This Article





download





Recommended Reading

Connect to SQL Server Availability Group replica with SSMS when Readable Secondary is Read-intent only

Configure SQL Server AlwaysOn Availability Group on a Multi-Subnet Cluster

Fix SQL Server AlwaysOn Availability Group Error: 1408 Joining database on secondary replica resulted in an error

What is SQL Server AlwaysOn?

Add a SQL Server Database to an Existing Always On Availability Group








get free sql tips
agree to terms


Learn more about SQL Server tools