Distributed Availability Groups for SQL Server Disaster Recovery

By: Edwin Sarmiento | Updated: 2021-10-01 | Comments | Related: > Availability Groups

Problem

In a previous tip on Setup and Implement SQL Server 2016 Always On Distributed Availability Groups, we have seen how to configure a Distributed Availability Group as a disaster recovery strategy for SQL Server 2016 and higher instances. However, the configuration is for standalone SQL Server instances. We already have existing Always On Availability Groups configured on failover clustered instances similar to the one described in this tip. How do we configure Distributed Availability Group when the replicas in the primary Availability Group are failover clustered instances?

Solution

There are cases when SQL Server failover clustered instances (FCI) are used as primary replicas in an Availability Group (AG) to achieve local high availability (HA) while a standalone SQL Server instance is used as a secondary replica for disaster recovery (DR). But when compliance requirements dictate having a second copy of the databases in the same data center in addition to achieving local HA, all replicas in an AG need to be SQL Server FCIs, similar to the diagram below.

failover cluster with availability groups

In the example provided, all four (4) servers are in the same Windows Server Failover Cluster (WSFC). The WSFC runs 2 SQL Server FCIs – a default instance (TDPRDSQLCLS026) and a named instance (TDPRDSQLCLS029\INST29). Since AG is built on the principle of shared nothing architecture, the shared storage subsystem is only accessible on the machines that the SQL Server FCIs run on.

You can configure a Distributed AG as a DR strategy for this setup where the secondary AG consists of standalone SQL Server instances, as shown in the diagram below.

failover clusters with availability groups

Configuring Distributed Availability Groups as a Disaster Recovery Strategy for SQL Server Failover Clustered Instances

Configuring Distributed AG as a DR strategy for SQL Server FCIs is no different from configuring a Distributed AG as described in this tip. However, due to the complexity, I do not recommend building this type of architecture unless it is really necessary and you have a team of highly skilled DBAs and sysadmins who can support it as part of day-to-day operations. A simple mistake can cost you days, if not weeks, of troubleshooting configuration issues. This tip will walk you through the process of configuring a Distributed AG as a DR strategy when you have SQL Server FCIs as replicas. I will also highlight the common mistakes DBAs and sysadmins make when configuring this setup.

Below are the details of the implementation for the scenario described:

	Production	DR
WSFC	OS: Windows Server 2019	OS: Windows Server 2019
	Nodes: TDPRD021, TDPRD022, TDPRD027, and TDPRD028	Nodes: TDPRD031 and TDPRD032
	Cluster Name Object (CNO): TDPRDCLS024	Cluster Name Object (CNO): TDPRDCLS034
	IP Subnet: 172.16.0.0/16	IP Subnet: 192.168.0.0/24
SQL Server Instance 1	Virtual Network Name (VNN): TDPRDSQLCLS026	TDPRD031
SQL Server Instance 2	Virtual Network Name (VNN) : TDPRDSQLCLS029\INST29	TDPRD032
SQL Server service account	TESTDOMAIN\sqlservice	TESTDOMAIN\sqlservice
Availability Group	Name: TDPRDSQLCLSAG25	Name: TDPRDSQLAG35
	Listener: TDPRDSQLCLSAG25	Listener: TDPRDSQLAG35
	Distributed Availability Group Name: DistAG_DC1_DC2

Here’s a high-level overview of the steps for your reference.

Create the primary AG (TDPRDSQLCLSAG25) with a corresponding listener name (TDPRDSQLCLSAG25)
Create the secondary AG (TDPRDSQLAG35) with a corresponding listener name (TDPRDSQLAG35)
Create the Distributed AG (DistAG_DC1_DC2) on the primary AG (TDPRDSQLCLSAG25)
Join the secondary AG (TDPRDSQLAG35) to the Distributed AG

Before proceeding with the configuration, make sure that each server can access the other servers – both via IP address and fully qualified domain name. Do this for the server hostnames as well as the SQL Server VNNs. Domain name resolution is one of the most common issue you will encounter. A simple PING and TELNET tests can be used for verification.

The primary AG (TDPRDSQLCLSAG25) with the corresponding listener name has already been created. Refer to this tip on how to configure a traditional AG on SQL Server FCIs. Do not proceed with Step #2 unless the primary AG is fully functional.

One thing to keep in mind when a SQL Server FCI is configured as a replica in an AG: you lose the ability to do automatic failover on the AG level. Automatic failover happens within the scope of the SQL Server FCI nodes. Only manual failover is available within AG replicas. This is to maintain the shared nothing architecture across AG replicas.

Step #2: Create the secondary AG (TDPRDSQLAG35) with a corresponding listener name (TDPRDSQLAG35)

Refer to this tip and this tip on how to configure a traditional AG for the secondary AG. For testing purposes, I usually create a dummy database (testDB) to make sure that the secondary AG is fully functional before I proceed with configuring the Distributed AG.

Don’t proceed with the next step if the secondary AG isn’t fully functional. This will save you a lot of troubleshooting headache down the road.

Step #2a: Remove the dummy database from the secondary AG

I only added the dummy database to make sure that the secondary AG is fully functional. In a Distributed AG, the primary AG will replicate the databases over to the secondary AG. So, there should not be any databases in the secondary AG.

To remove the dummy database from the secondary AG,

From within Object Explorer, expand AlwaysOn High Availability > Availability Groups.
Expand the AG (TDPRDSQLAG35) > Availability Databases.
Right-click on the dummy database and select Remove Database from Availability Group.

In the Remove Database from Availability Group dialog box, click OK.

Alternatively, you can use the T-SQL script below to remove the dummy database from the AG. Be sure to run this on the primary replica of the secondary AG.

NOTE: Use SQLCMD mode when running the T-SQL scripts to make sure you are connected to the correct SQL Server instance.

--Run this on the primary replica of the secondary Availability Group
:CONNECT TDPRD031
USE [master]
GO
ALTER AVAILABILITY GROUP [TDPRDSQLAG35]
REMOVE DATABASE [testDB];
GO

Don’t be alarmed when the AG dashboard reports an error. This is normal. It only means that the secondary AG does not have any replicated databases. Also, make sure that there are no databases with the same name on both the primary and secondary AG. This will prevent the databases from the primary AG from getting replicated to the secondary AG. When in doubt, remove all user databases in the secondary AG’s replicas.

Step #3: Create Distributed AG (DistAG_DC1_DC2) on the primary AG (TDPRDSQLCLSAG25)

Once the secondary AG has been created, you can now proceed to create the Distributed AG. Use the T-SQL script below to create the Distributed AG (DistAG_DC1_DC2). Be sure you are connected to the SQL Server instance that you want to configure as primary replica of the primary AG.

--Create Distributed Availability Group
--Run this on the primary replica of the primary Availability Group
:CONNECT TDPRDSQLCLS026
CREATE AVAILABILITY GROUP [DistAG_DC1_DC2]  
WITH (DISTRIBUTED)   
AVAILABILITY GROUP ON  
'TDPRDSQLCLSAG25' WITH    
(   
   LISTENER_URL = 'TCP://TDPRDSQLCLS026.TESTDOMAIN.COM:5022',    
   AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
   FAILOVER_MODE = MANUAL,   
   SEEDING_MODE = AUTOMATIC   
),   
'TDPRDSQLAG35' WITH    
(   
   LISTENER_URL = 'TCP://TDPRDSQLAG35.TESTDOMAIN.COM:5022',   
   AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
   FAILOVER_MODE = MANUAL,   
   SEEDING_MODE = AUTOMATIC   
);    
GO

Refer to this tip for the parameters and their corresponding values.

I wanted to emphasize the LISTENER_URL parameter values. This is a very common mistake that DBAs and sysadmins make when configuring Distributed AG involving SQL Server FCIs. In this previous tip, the values of the LISTENER_URL parameters point to the corresponding listener names of the primary and secondary AGs - TDPRDSQLCLSAG25 and TDPRDSQLAG35, respectively. Using the listener name guarantees that the Distributed AG will always get redirected to the primary replica of both the primary and secondary AGs in case an automatic failover occurs.

However, since SQL Server FCIs are involved in the primary AG, there is no automatic failover between replicas of the primary AG. Hence, using the listener name doesn’t make any sense. What we are using here is the SQL Server VNN of the primary replica of the primary AG – TDPRDSQLCLS026. For the secondary AG, since both replicas are standalone instances, the listener name is used.

Step #4: Join the secondary AG (TDPRDSQLAG35) to the Distributed AG (DistAG_DC1_DC2)

Once the Distributed AG has been created, you can now proceed to join the secondary AG. Use the T-SQL script below to join the AG (TDPRDSQLAG35) to the Distributed AG (DistAG_DC1_DC2). Be sure you are connected to the SQL Server instance that you want to configure as primary replica of the secondary AG.

--Create second availability group on second failover cluster with replicas and listener
--Run this on the primary replica of the secondary Availability Group
:CONNECT TDPRD031
ALTER AVAILABILITY GROUP [DistAG_DC1_DC2]   
JOIN   
AVAILABILITY GROUP ON  
'TDPRDSQLCLSAG25' WITH    
(   
   LISTENER_URL = 'TCP://TDPRDSQLCLS026.TESTDOMAIN.COM:5022',    
   AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
   FAILOVER_MODE = MANUAL,   
   SEEDING_MODE = AUTOMATIC   
),   
'TDPRDSQLAG35' WITH    
(   
   LISTENER_URL = 'TCP://TDPRDSQLAG35.TESTDOMAIN.COM:5022',   
   AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
   FAILOVER_MODE = MANUAL,   
   SEEDING_MODE = AUTOMATIC   
);    
GO

Similar to the previous step, the SQL Server VNN of the primary replica of the primary AG – TDPRDSQLCLS026 - is used for the LISTENER_URL parameter value instead of the listener name.

Review the Distributed AG by expanding the Availability Groups folder in SSMS. Notice the word Distributed appended to the Distributed AG.

If you refresh the AG dashboard of the secondary AG, you will now see the same databases you have in the primary AG, as described in Step #1.

You can also use the T-SQL script below to view the metadata and status of the Distributed AG.

--Run this on the primary replica of the primary Availability Group
:CONNECT TDPRDSQLCLS026
--View metadata and status of the Distributed Availability Group
SELECT r.replica_server_name, r.endpoint_url, r.failover_mode_desc,
rs.connected_state_desc, rs.role_desc, rs.operational_state_desc,
rs.recovery_health_desc,rs.synchronization_health_desc,
r.availability_mode_desc
FROM sys.dm_hadr_availability_replica_states rs 
INNER JOIN sys.availability_replicas r
ON rs.replica_id=r.replica_id
ORDER BY r.replica_server_name;

Notice the endpoint_url values of all replicas. Unlike when working with standalone instances, replicas involving SQL Server FCIs (TDPRDSQLCLS026 and TDPRDSQLCLS029\INST29) must be configured with the SQL Server VNN (TDPRDSQLCLS026 and TDPRDSQLCLS029, respectively). The same is true when a primary or a secondary AG involves a SQL Server FCI as a replica.

Also, notice how all of the failover modes are set to manual. Only the replicas in the secondary AG have a failover mode set to automatic. That’s because the replicas in the primary AG are all SQL Server FCIs.

Implementing Distributed AGs requires proper planning and thorough documentation. It’s not as simple as it is. And it is even more complicated when SQL Server FCIs are involved.

In the next tip, I will cover how to update the Distributed AG when failing over between replicas of the primary AG. Since the SQL Server VNN is used to describe the Distributed AG’s endpoint_url parameter values, it needs to be updated whenever a manual failover is performed in the primary AG.

Next Steps

Review the previous tips on SQL Server AlwaysOn Availability Group Configuration
Read about the following topics from the Microsoft Documentation
- Distributed availability groups
- Configure an Always On distributed availability group

About the author

Edwin M Sarmiento is a Microsoft SQL Server MVP and Microsoft Certified Master from Ottawa, Canada specializing in high availability, disaster recovery and system infrastructures.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Article Last Updated: 2021-10-01

Configure Distributed Availability Groups for SQL Server Disaster Recovery with Failover Clustering