Manual Failover of SQL Server Always On Distributed Availability Group for Disaster Recovery Testing

By: Edwin Sarmiento | Comments (4) | Related: > Availability Groups

Problem

We have implemented SQL Server Always On Distributed Availability Group between our production data center and our disaster recovery (DR) data center as described in this tip. We are in the process of testing our DR plan and would like to manually failover our Distributed Availability Group and run the primary Availability Group in our DR data center. How do I manually failover a Distributed Availability Group?

Solution

Following the tip on Setup and Implement SQL Server 2016 Always On Distributed Availability Groups, ASYNCHRONOUS COMMIT availability (replication) mode is configured between the primary and secondary Availability Groups. That's because of the network latency between the two data centers. Configuring the primary and secondary Availability Groups with ASYNCHRONOUS COMMIT availability mode minimizes the performance impact on the database transactions in the primary databases. The availability mode between replicas within the same Availability Group can be set to SYNCHRONOUS COMMIT to achieve high availability. However, if you want to failover between ASYNCHRONOUS COMMIT replicas – be it for the regular Availability Groups or Distributed Availability Groups – with minimal data loss, you first need to change the availability (replication) mode from ASYNCHRONOUS to SYNCHRONOUS.

Changing the availability mode of the Availability Groups is just one of the steps to prepare the Distributed Availability Group for a failover. For the scenario described where the Distributed Availability Group will be manually failed over for DR testing, below are the details of the implementation.

Primary Availability Group (Production)	AG01
Replicas	TDPRD071, TDPRD072
Secondary Availability Group (DR)	AG02
Replicas	TDDR071, TDDR072
Distributed Availability Group	Dist_AG01_AG02

Here's a high-level overview of the steps for manually failing over a Distributed Availability Group for DR testing.

Modify availability mode of the Distributed Availability Group (Dist_AG01_AG02) in the primary Availability Group (AG01)
Modify availability mode of the Distributed Availability Group (Dist_AG01_AG02) in the secondary Availability Group (AG02)
Set the Distributed Availability Group role on the primary Availability Group (AG01) to SECONDARY
Failover the Distributed Availability Group to the secondary Availability Group (AG02)

Step #1: Modify availability mode of the Distributed Availability Group (Dist_AG01_AG02) in the primary Availability Group (AG01)

Use the T-SQL query below to modify the availability mode of the Distributed Availability Group in the primary Availability Group. The primary replica of the primary Availability Group is referred to as the global primary in a Distributed Availability Group. Since the primary and secondary Availability Groups are in ASYNCHRONOUS COMMIT availability mode, you need to switch them to SYNCHRONOUS COMMIT. Be sure to run the query in SQLCMD mode.

:CONNECT TDPRD071
--Modify availability mode of the primary and secondary AGs in a Distributed Availability Group
--Run this on the primary replica of the primary AG – TDPRD071
 ALTER AVAILABILITY GROUP [Dist_AG01_AG02]     
 MODIFY 
 AVAILABILITY GROUP ON
'AG01' WITH 
( 
   AVAILABILITY_MODE = SYNCHRONOUS_COMMIT 
), 
'AG02' WITH     
( 
   AVAILABILITY_MODE = SYNCHRONOUS_COMMIT 
);

Step #2: Modify availability mode of the Distributed Availability Group (Dist_AG01_AG02) in the secondary Availability Group (AG02)

Use the T-SQL query below to modify the availability mode of the Distributed Availability Group in the secondary Availability Group. The primary replica of the secondary Availability Group is referred to as the forwarder in a Distributed Availability Group. Since the primary and secondary Availability Groups are in ASYNCHRONOUS COMMIT availability mode, you need to switch them to SYNCHRONOUS COMMIT.

:CONNECT TDDR071
--Modify availability mode of the primary and secondary AGs in a Distributed Availability Group
--Run this on the primary replica of the secondary AG – TDDR071
ALTER AVAILABILITY GROUP [Dist_AG01_AG02]     
 MODIFY 
 AVAILABILITY GROUP ON
'AG01' WITH 
( 
   AVAILABILITY_MODE = SYNCHRONOUS_COMMIT 
), 
'AG02' WITH     
( 
   AVAILABILITY_MODE = SYNCHRONOUS_COMMIT 
);

Don't be surprised if the T-SQL query on Step#1 is exactly the same as the one above - you're basically running the exact same query on two different Availability Groups, one on the global primary and another on the forwarder.

Before proceeding with Step #3, verify that both the primary and secondary Availability Groups are in SYNCHRONOUS COMMIT and synchronization_health_desc value of HEALTHY. Use the queries below to verify.

:CONNECT TDPRD071
SELECT r.replica_server_name, r.endpoint_url,
rs.connected_state_desc, rs.role_desc, rs.operational_state_desc,
rs.recovery_health_desc,rs.synchronization_health_desc,
r.availability_mode_desc, r.failover_mode_desc
FROM sys.dm_hadr_availability_replica_states rs 
INNER JOIN sys.availability_replicas r
ON rs.replica_id=r.replica_id
ORDER BY r.replica_server_name
GO
SELECT ag.name, ag.is_distributed, ar.replica_server_name, ar.availability_mode_desc, ars.connected_state_desc, ars.role_desc, 
ars.operational_state_desc, ars.synchronization_health_desc FROM sys.availability_groups ag     
JOIN sys.availability_replicas ar on ag.group_id=ar.group_id
LEFT JOIN sys.dm_hadr_availability_replica_states ars
ON ars.replica_id=ar.replica_id
WHERE ag.is_distributed=1
GO
SELECT ag.name
       , drs.database_id
       , drs.group_id
       , drs.replica_id
       , drs.synchronization_state_desc
       , drs.end_of_log_lsn 
FROM sys.dm_hadr_database_replica_states drs,
     sys.availability_groups ag
WHERE drs.group_id = ag.group_id

Below is the result of running the queries on the primary replica of the primary Availability Group. Note the synchronization_state_desc value of the Distributed Availability Group. Make sure this is SYNCHRONIZED to minimize data loss when you perform a manual failover.

Below is the result of running the queries on the primary replica of the secondary Availability Group.

The Distributed Availability Group is ready for failover when the last_hardened_lsn value for all of the databases on both Availability Groups are the same. Use the T-SQL query below to verify if both Availability Groups are ready for failover.

:CONNECT TDPRD071
 SELECT ag.name, 
        drs.database_id, 
        db_name(drs.database_id) as database_name,
        drs.group_id, 
        drs.replica_id,
        drs.last_hardened_lsn
 FROM sys.dm_hadr_database_replica_states drs
 INNER JOIN sys.availability_groups ag ON drs.group_id = ag.group_id; 
 
:CONNECT TDDR071
 SELECT ag.name, 
        drs.database_id, 
        db_name(drs.database_id) as database_name,
        drs.group_id, 
        drs.replica_id,
        drs.last_hardened_lsn
 FROM sys.dm_hadr_database_replica_states drs
 INNER JOIN sys.availability_groups ag ON drs.group_id = ag.group_id;

Step #3: Set the Distributed Availability Group role on the primary Availability Group (AG01) to SECONDARY

Use the T-SQL query below to set the Distributed Availability Group role on the primary replica of the primary Availability Group to SECONDARY.

:CONNECT TDPRD071
ALTER AVAILABILITY GROUP [Dist_AG01_AG02] SET (ROLE = SECONDARY);

This takes the Distributed Availability Group offline, terminating client applications connected to the primary replica of the primary Availability Group. This also starts the countdown for your recovery time objective (RTO) to failover the Distributed Availability Group to the DR data center.

Step #4: Failover the Distributed Availability Group to the secondary Availability Group (AG02)

Use the T-SQL query below to manually failover the Distributed Availability Group to the secondary Availability Group. Run the query on the primary replica of the secondary Availability Group.

:CONNECT TDDR071
ALTER AVAILABILITY GROUP [Dist_AG01_AG02] FORCE_FAILOVER_ALLOW_DATA_LOSS;

This brings the Distributed Availability Group online on the old secondary Availability Group (AG02), making it the new primary Availability Group. Client applications can now connect to the new primary Availability Group thru its listener name.

NOTE: Don't be alarmed with the use of FORCE_FAILOVER_ALLOW_DATA_LOSS argument in the ALTER AVAILABILITY GROUP command - it is the only supported argument for failing over a Distributed Availability Group. This is not something you would use for regular Availability Groups unless in a DR situation. Make sure you update your DR documentation on the proper use of FORCE_FAILOVER_ALLOW_DATA_LOSS.

After failing over the Distributed Availability Group over to the DR site, you can re-run the queries from Step #1 and Step#2 to switch the AVAILABILITY_MODE back to ASYNCHRONOUS_COMMIT.

Next Steps

Review the previous tip on Setup and Implement SQL Server 2016 Always On Distributed Availability Groups
Read about Distributed Availability Groups from the Microsoft Documentation page

About the author

Edwin M Sarmiento is a Microsoft SQL Server MVP and Microsoft Certified Master from Ottawa, Canada specializing in high availability, disaster recovery and system infrastructures.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Sunday, July 24, 2022 - 2:31:09 AM - Kishore	Back To Top (90308)
Could you please confirm once the DR rehearsal is complete, what is the process to bring back to original state?

Friday, October 2, 2020 - 12:04:10 PM - bass_player	Back To Top (86587)
If the primary AG is gone, just do Step #4

Friday, October 2, 2020 - 5:36:16 AM - MRj	Back To Top (86585)
Great article! What would the steps be when AG01 would hit 'disaster' and unexpectedly is not available anymore? Thx!