By: Edwin Sarmiento | Comments (4) | Related: > Availability Groups
Problem
We have implemented SQL Server Always On Distributed Availability Group between our production data center and our disaster recovery (DR) data center as described in this tip. We are in the process of testing our DR plan and would like to manually failover our Distributed Availability Group and run the primary Availability Group in our DR data center. How do I manually failover a Distributed Availability Group?
Solution
Following the tip on Setup and Implement SQL Server 2016 Always On Distributed Availability Groups, ASYNCHRONOUS COMMIT availability (replication) mode is configured between the primary and secondary Availability Groups. That's because of the network latency between the two data centers. Configuring the primary and secondary Availability Groups with ASYNCHRONOUS COMMIT availability mode minimizes the performance impact on the database transactions in the primary databases. The availability mode between replicas within the same Availability Group can be set to SYNCHRONOUS COMMIT to achieve high availability. However, if you want to failover between ASYNCHRONOUS COMMIT replicas – be it for the regular Availability Groups or Distributed Availability Groups – with minimal data loss, you first need to change the availability (replication) mode from ASYNCHRONOUS to SYNCHRONOUS.
Changing the availability mode of the Availability Groups is just one of the steps to prepare the Distributed Availability Group for a failover. For the scenario described where the Distributed Availability Group will be manually failed over for DR testing, below are the details of the implementation.
Primary Availability Group (Production) | AG01 |
Replicas | TDPRD071, TDPRD072 |
Secondary Availability Group (DR) | AG02 |
Replicas | TDDR071, TDDR072 |
Distributed Availability Group | Dist_AG01_AG02 |
Here's a high-level overview of the steps for manually failing over a Distributed Availability Group for DR testing.
- Modify availability mode of the Distributed Availability Group (Dist_AG01_AG02) in the primary Availability Group (AG01)
- Modify availability mode of the Distributed Availability Group (Dist_AG01_AG02) in the secondary Availability Group (AG02)
- Set the Distributed Availability Group role on the primary Availability Group (AG01) to SECONDARY
- Failover the Distributed Availability Group to the secondary Availability Group (AG02)
Step #1: Modify availability mode of the Distributed Availability Group (Dist_AG01_AG02) in the primary Availability Group (AG01)
Use the T-SQL query below to modify the availability mode of the Distributed Availability Group in the primary Availability Group. The primary replica of the primary Availability Group is referred to as the global primary in a Distributed Availability Group. Since the primary and secondary Availability Groups are in ASYNCHRONOUS COMMIT availability mode, you need to switch them to SYNCHRONOUS COMMIT. Be sure to run the query in SQLCMD mode.
:CONNECT TDPRD071 --Modify availability mode of the primary and secondary AGs in a Distributed Availability Group --Run this on the primary replica of the primary AG – TDPRD071 ALTER AVAILABILITY GROUP [Dist_AG01_AG02] MODIFY AVAILABILITY GROUP ON 'AG01' WITH ( AVAILABILITY_MODE = SYNCHRONOUS_COMMIT ), 'AG02' WITH ( AVAILABILITY_MODE = SYNCHRONOUS_COMMIT );
Step #2: Modify availability mode of the Distributed Availability Group (Dist_AG01_AG02) in the secondary Availability Group (AG02)
Use the T-SQL query below to modify the availability mode of the Distributed Availability Group in the secondary Availability Group. The primary replica of the secondary Availability Group is referred to as the forwarder in a Distributed Availability Group. Since the primary and secondary Availability Groups are in ASYNCHRONOUS COMMIT availability mode, you need to switch them to SYNCHRONOUS COMMIT.
:CONNECT TDDR071 --Modify availability mode of the primary and secondary AGs in a Distributed Availability Group --Run this on the primary replica of the secondary AG – TDDR071 ALTER AVAILABILITY GROUP [Dist_AG01_AG02] MODIFY AVAILABILITY GROUP ON 'AG01' WITH ( AVAILABILITY_MODE = SYNCHRONOUS_COMMIT ), 'AG02' WITH ( AVAILABILITY_MODE = SYNCHRONOUS_COMMIT );
Don't be surprised if the T-SQL query on Step#1 is exactly the same as the one above - you're basically running the exact same query on two different Availability Groups, one on the global primary and another on the forwarder.
Before proceeding with Step #3, verify that both the primary and secondary Availability Groups are in SYNCHRONOUS COMMIT and synchronization_health_desc value of HEALTHY. Use the queries below to verify.
:CONNECT TDPRD071 SELECT r.replica_server_name, r.endpoint_url, rs.connected_state_desc, rs.role_desc, rs.operational_state_desc, rs.recovery_health_desc,rs.synchronization_health_desc, r.availability_mode_desc, r.failover_mode_desc FROM sys.dm_hadr_availability_replica_states rs INNER JOIN sys.availability_replicas r ON rs.replica_id=r.replica_id ORDER BY r.replica_server_name GO SELECT ag.name, ag.is_distributed, ar.replica_server_name, ar.availability_mode_desc, ars.connected_state_desc, ars.role_desc, ars.operational_state_desc, ars.synchronization_health_desc FROM sys.availability_groups ag JOIN sys.availability_replicas ar on ag.group_id=ar.group_id LEFT JOIN sys.dm_hadr_availability_replica_states ars ON ars.replica_id=ar.replica_id WHERE ag.is_distributed=1 GO SELECT ag.name , drs.database_id , drs.group_id , drs.replica_id , drs.synchronization_state_desc , drs.end_of_log_lsn FROM sys.dm_hadr_database_replica_states drs, sys.availability_groups ag WHERE drs.group_id = ag.group_id
Below is the result of running the queries on the primary replica of the primary Availability Group. Note the synchronization_state_desc value of the Distributed Availability Group. Make sure this is SYNCHRONIZED to minimize data loss when you perform a manual failover.
Below is the result of running the queries on the primary replica of the secondary Availability Group.
The Distributed Availability Group is ready for failover when the last_hardened_lsn value for all of the databases on both Availability Groups are the same. Use the T-SQL query below to verify if both Availability Groups are ready for failover.
:CONNECT TDPRD071 SELECT ag.name, drs.database_id, db_name(drs.database_id) as database_name, drs.group_id, drs.replica_id, drs.last_hardened_lsn FROM sys.dm_hadr_database_replica_states drs INNER JOIN sys.availability_groups ag ON drs.group_id = ag.group_id; :CONNECT TDDR071 SELECT ag.name, drs.database_id, db_name(drs.database_id) as database_name, drs.group_id, drs.replica_id, drs.last_hardened_lsn FROM sys.dm_hadr_database_replica_states drs INNER JOIN sys.availability_groups ag ON drs.group_id = ag.group_id;
Step #3: Set the Distributed Availability Group role on the primary Availability Group (AG01) to SECONDARY
Use the T-SQL query below to set the Distributed Availability Group role on the primary replica of the primary Availability Group to SECONDARY.
:CONNECT TDPRD071 ALTER AVAILABILITY GROUP [Dist_AG01_AG02] SET (ROLE = SECONDARY);
This takes the Distributed Availability Group offline, terminating client applications connected to the primary replica of the primary Availability Group. This also starts the countdown for your recovery time objective (RTO) to failover the Distributed Availability Group to the DR data center.
Step #4: Failover the Distributed Availability Group to the secondary Availability Group (AG02)
Use the T-SQL query below to manually failover the Distributed Availability Group to the secondary Availability Group. Run the query on the primary replica of the secondary Availability Group.
:CONNECT TDDR071 ALTER AVAILABILITY GROUP [Dist_AG01_AG02] FORCE_FAILOVER_ALLOW_DATA_LOSS;
This brings the Distributed Availability Group online on the old secondary Availability Group (AG02), making it the new primary Availability Group. Client applications can now connect to the new primary Availability Group thru its listener name.
NOTE: Don't be alarmed with the use of FORCE_FAILOVER_ALLOW_DATA_LOSS argument in the ALTER AVAILABILITY GROUP command - it is the only supported argument for failing over a Distributed Availability Group. This is not something you would use for regular Availability Groups unless in a DR situation. Make sure you update your DR documentation on the proper use of FORCE_FAILOVER_ALLOW_DATA_LOSS.
After failing over the Distributed Availability Group over to the DR site, you can re-run the queries from Step #1 and Step#2 to switch the AVAILABILITY_MODE back to ASYNCHRONOUS_COMMIT.
Next Steps
- Review the previous tip on Setup and Implement SQL Server 2016 Always On Distributed Availability Groups
- Read about Distributed Availability Groups from the Microsoft Documentation page
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips