Understand How Dynamic Witness Works for a SQL Server WFSC
Dynamic Witness was introduced with Windows Server 2012 R2 for Windows Server Failover Cluster (WSFC). Dynamic Witness allows the cluster to determine whether the witness actually has a vote depending on the number of cluster nodes up and available at a given point in time. This tip will explain the concept of Dynamic Witness in WSFC.
WSFC built on Windows Server 2012 R2 and uses a dynamic quorum, which allows the witness vote to be toggled dynamically to maintain an odd number of quorum votes. However, if the witness is unavailable the cluster will adjust the vote based on the available nodes. The recommendation for a Windows Server 2012 R2 cluster is to always configure a witness.
Note: "windows cluster" or "cluster" in this tip refers to WSFC.
There are two types of Dynamic Witnesses and only one can be configured in your cluster at a time, either a File Share Witness (FSW) or Disk Witness. If your cluster is built on the same subnet, you may configure either type of witness. But if your cluster crosses subnets, it's recommended to configure a FSW because a Dynamic Witness is a voting element and should be seen by all nodes.
FSW is a share on a server and it's recommended that this be on a separate server and possibly a different data center from the cluster nodes. This would allow any cluster node to reach the file share server in the scenario of site-to-site network failure. Otherwise, a secondary site might become the primary site if the FSW resides in the secondary site when a site-to-site network failure occurs, because the secondary site would gain the quorum majority with the witness server residing in the same site.
Configure File Share Witness (FSW) in Windows Cluster
This tip is a continuation of Dynamic Quorum for Windows Server 2012 R2 Cluster to Support SQL Server AlwaysOn. As per the previous tip, the cluster we are working on has these characteristics:
- Built on Windows Server 2012 R2 Standard Edition
- Contains 4-nodes in the cluster
- Quorum type of Node Majority
- Executing a contiguous workload on Availability Group Listener (AGL)
We will introduce a FSW into the 4-node cluster since this cluster spans subnets. The account that requires permissions on the file share is the cluster computer named object as shown below.
The PowerShell command below will add the FSW to the cluster. This will now set the cluster quorum type to be Node and File Share Majority.
Get-ClusterNode | ft ID, NodeName, NodeWeight, DynamicWeight, State -AutoSize Set-ClusterQuorum -Cluster CLUSQL1 -NodeAndFileShareMajority \\pl12\CLUSQL01_FSW
If you want to view the witness quorum vote, you can use Get-Cluster cmdlet to check the WitnessDynamicWeight. In a cluster with 4-nodes, the dynamic witness weight will be 1 so there is an odd number of quorum votes for the cluster.
Get-ClusterNode | ft ID, NodeName, NodeWeight, DynamicWeight, State -AutoSize Get-Cluster | ft Name, WitnessDynamicWeight -AutoSize
Shutting Down Cluster Nodes to Test Behavior
Now we will start to shut down nodes in the cluster. The primary Availability Group and cluster resource is hosted on PSQL09, so we will attempt to keep PSQL09 alive as the last node and shut down all other nodes.
After stopping the first node PSQL10, the FSW WitnessDynamicWeight now has a value of 0. We now have 3 nodes and FSW still up.
Stop-ClusterNode PSQL10 Get-ClusterNode | ft ID, NodeName, NodeWeight, DynamicWeight, State -AutoSize Get-Cluster | ft Name, WitnessDynamicWeight -AutoSize
After stopping cluster node SSQL09 the AGL workload is still running fine. We are now down to 3 votes and the FSW WitnessDynamicWeight toggles its quorum vote based on the state of the active nodes rather than adjusting votes from the existing active nodes to get an odd number of quorum votes.
When a cluster with a witness is down to 3 votes, it will behave differently from a cluster with a quorum type of Node Majority.
Stop-ClusterNode SSQL09 Get-ClusterNode | ft ID, NodeName, NodeWeight, DynamicWeight, State -AutoSize Get-Cluster | ft Name, WitnessDynamicWeight -AutoSize
After stopping cluster node PTEMP1, if you look at the DynamicWeight on node PTEMP1, the vote is not adjusted even though node PTEMP1 is marked as down. We now only have 1 node up and the witness in the cluster, nevertheless the cluster is still up and the AGL workload is still executing.
Stop-ClusterNode PTEMP1 Get-ClusterNode | ft ID, NodeName, NodeWeight, DynamicWeight, State -AutoSize Get-Cluster | ft Name, WitnessDynamicWeight -AutoSize
Letís disable the file share to simulate a failure on the FSW. After sometime, the cluster service will shut itself down and we can no longer connect to the cluster.
When the cluster failed, the AGL workload fails and the AG database is also inaccessible.
It is not a common scenario to see cluster nodes go offline until the last node and the FSW is also unavailable. But in the scenario outlined in this tip, the cluster will shutdown because it can no longer achieve a quorum majority. In a cluster with a witness, quorum votes are not adjusted when it is below 3 unlike the quorum type Node Majority. At least two quorum votes are required to continue running a cluster with a witness. This behavior is by design, so the cluster can recover from a partitioned scenario. This scenario makes sense to put the FSW in a third data center if possible. As with any implementation, understanding the fundamentals of a Windows cluster allows an administrator to design a better solution.
- What's New in Failover Clustering in Windows Server
- Configure and Manage the Quorum in a Windows Server 2012 Failover Cluster
- Read more clustering tips
About the author
View all my tips