By: Tim Ford | Updated: 2008-08-04 | Comments (3) | Related: More > Clustering
Recently, I found myself rebuilding our production SQL Server 2005 cluster. Trust me when I say that a DBA does not take on a task like this because they want to. No, we encountered what Microsoft informed us was a "bug" in SQL Server 2005 that is to be resolved with Service Pack 3.
Why did I have to uninstall SQL Server on a perfectly good server to begin with? This server (also referred to as a node, when in a clustered environment) is dedicated to our payroll system. We are upgrading the product and this upgrade involved added functionality that would require replication. Unfortunately for me, Replication Components were not installed during the base install of SQL on this cluster node. Since this set of components is a child of the Database Services product you can not install it after the fact. SQL Server will check for installed components, see that Database Services are already installed, and will then not allow you to install Replication Components afterwards. The only option was a full uninstall and reinstall of the clustered SQL instance.
I followed the process from a previous tip on rebuilding the master database to script out all my SQL objects and detached the user databases. Unfortunately, unlike a previous process in our test environment, the production environment instance did not uninstall cleanly. (The aforementioned bug.) Since clustering leaves a footprint throughout the registry on all clustered nodes, and this was a clustered instance, we were left with all nodes in our environment having registry entries that prevented us from adding any new nodes back into the cluster. We went from a four-node, three-instance SQL cluster to a two-node, two-instance SQL cluster. The end result was an Active-Active cluster that does not provide true High-Availability.
So now, for the time being I am left with a two-node, two-instance (a.k.a. Active-Active) cluster that was our previous enterprise cluster, and a two-node, single-instance cluster (Active-Passive) for our payroll system. These two clusters are about to be integrated into a new one once again quite soon, but in the meantime it had me thinking about the nature of Active-Active versus Active-Passive.
Active/Active or Active/Passive Clustering
SQL Server clusters are often referred to as either being Active/Active or Active/Passive. When the terminology is used correctly this simply means that either all nodes (servers) participating in a Microsoft SQL Server cluster are dedicated to running at least a single SQL instance (Active-Active) or at least one of these nodes is reserved as a standby to accept failover of a SQL instance if one occurs. You could very well have an Active-Active-Active-Passive cluster, in the case of my production environment before the very long day I spoke of above; it's really still just classified as Active-Passive. That's it; quite simple in today's world of convoluted technical terminology. I do hope you're not disappointed. Thank you for reading this tip and goodbye...
Ah, but now for the more challenging question of "When to use Active-Active or Active-Passive architecture in your SQL Server environment." Knowing the terminology only takes you so far. To answer this question, we need to look at why one would not always use Active-Passive architecture. Sadly, the answer is typically going to be cost.
Active-Active Clustering Considerations
When we created our first Microsoft SQL Server 2005 cluster we did so with two nodes; an instance of SQL Server 2005 Enterprise Edition running on each node. This was also a 64-bit environment and had substantial amounts of RAM (though not nearly enough we would come to find out.) There was a terabyte of SAN dedicated for the cluster. This was a pricey consumer of space in our data center. The purpose of this cluster was not for High Availability (HA) as a Microsoft SQL Server cluster is typically designed, but rather for a mass consolidation effort. At the time I supported over 60 instances of SQL 2000 and about 10 SQL Server 2005 instances. We were able to tap into our capital fund for this new cluster in order to hopefully consolidate fifteen of these instances into the two instances that were to be hosted on the cluster. The purposes of the consolidation effort: reduced administrative overhead at both the hardware and database layers, upgrade of hardware that had reached the end of their support agreements, reduced licensing costs, and increased productivity for the Database Administrator.
However, as many of you have experienced in your organizations, many of the vendor-created databases we purchased were not only slow to adopt Microsoft SQL Server 2005, but also slow or unwilling to apply resources in their companies to certify that their databases and applications could run in 8.0 compatibility mode on a SQL 2005 instance. Furthermore, we even had select vendors tell us that they would not support a database running in a clustered environment. We ended up with our clustered environment for SQL 2005 hosting only a fraction of our SQL databases from those we targeted for consolidation. Maximum CPU never exceeded 20% and memory requirements never reached half of the 16gb of RAM installed for each server (node). Our situation was the same as for many others who choose to cluster in an Active-Active fashion: we had limited funds available for the clustering initiative, and/or our actual or estimated load on the SQL servers would not consume more than half the resources for each node.
I think this is the right time to make an important note that has not been discussed to this point: just because a node in a clustered environment is already hosting a SQL instance (acting as an Active node, in other words) does not preclude it from hosting another SQL instance. In fact, the server behaves no differently in this respect as it does in a non-clustered environment. A server can host multiple instances of SQL Server: a single default instance, and multiple additional named instances. Your only limits are the edition of Microsoft SQL Server, and of course, system resources.
In our Active-Active configuration we were able to achieve two important goals: consolidation, to some extent was achieved as was high-availability. Though we had no passive node in our cluster, we could safely fail over either instance so that both were running on the same physical server and not over-taxing the available resources of the server. Our SQL instances were configured to use a minimum of 4gb of RAM, and a maximum of 6gb. These settings reserved memory for the O/S and other non-SQL processes and also ensured that if a failover occurred the server could continue to handle the load of both instances at the levels of utilization we had monitored to date.
If you have the resources available to support the SQL instances in your cluster in this manner, without the need to maintain a passive node, then this architecture may be affective for you. However, this means that you are most-likely not utilizing the SQL instances to their full potential, or if you are that you've over-purchased hardware or memory for the workload involved. Essentially, it means you've not spent your money efficiently (unless this was your intention.) Most organizations end up with Active-Active SQL Server clusters not out of design, but out of circumstance. They may start out with a SQL Server cluster with a single SQL instance and a passive node ready for failover, but find that they are directed to add a new instance to the cluster for some "hot" project that did not have funding for hardware (been there, bought the refrigerator magnet) or have under-estimated their consolidation effort and are forced to add a new instance to the cluster with the promise of a new passive node the next time funds are made available. (Hey look, a t-shirt to go with that fridge magnet!) I do hope I've made my point that Active-Active clustering may have it's role, but it usually means you're forgoing better utilization of your hardware during the majority of time when the SQL instance(s) are running properly.
Active-Passive Clustering Considerations
High-Availability technically means that your SQL instances are available for user consumption of data. I would prefer to refine this definition and state that the data should not just be readily available, but also responding to user requests at a rate that are consistent with regular performance benchmarks. This means that your cluster failover process needs to allow for consistency in hardware between the nodes in your cluster.
Best practices state that all nodes in a cluster should be identical in terms of hardware, memory, and SQL Server editions. In the event of a failover in a cluster with a passive node, the SQL instance's ownership is switched to the passive node. After a brief period of downtime, typically around 10 seconds, the SQL instance should be running in an identical state of performance as it did on the original owner node (if the physical specifications for each node are identical.) You would configure the Microsoft SQL Server 2005 instance to utilize more memory then in an Active-Active architecture since you will only be running a single instance of SQL on a node at any given time.
I still strongly suggest reserving 4gb of RAM for the O/S and non-SQL operations of the server. You can add more load to the SQL instance on the active node in an Active-Passive cluster versus a similarly-designed active node in an Active-Active cluster. Your utilization of resources on the active node will be much more economical as a result. In our environment we also utilize our passive node to host the resources for the backup volume on our clustered environment. We utilize SQL native tools to backup to the SAN volume and have a third-party tool that writes these disk backups to tape on a nightly basis. The overhead associated with this process occurs on the passive node. In the event of a failover we have the ability to reschedule this process to a low point in SQL activity if we find contention for resources.
There is an important trade-off for the benefits of high-availability and consistency of performance. You must purchase and support a server that will hopefully never be used in production. A final item of note in regards to passive nodes: Microsoft does not require you to license the passive node of SQL Server 2005 in an Active-Passive clustered environment.
You must understand that no matter which architectural plan is used in your environment, you still end up with only a single instance of your data and the SQL Server objects (logins, Agent jobs, etc.) on your clustered instance. Unlike other forms of high-availability design afforded to the Microsoft SQL Server DBA, clustering only provides the means to ensure you are protected from server hardware failure. It is strongly recommended that you incorporate some form of redundancy into your disaster recovery plan and not rely on clustering alone. Such options include SAN redundancy products, database mirroring, log shipping, or replication to protect you against media failure on your SAN for which clustering fails to protect.
- Review related tips in this series:
- Look for other tips to come on clustering Microsoft SQL Server 2005
- Review your memory specifications for your SQL instances. Are you allowing SQL Server to dynamically configure the memory or are you setting hard limits?
Last Updated: 2008-08-04
About the author
View all my tips