Dismantling a SQL Server Cluster After a P2V Physical to Virtual Migration

By:   |   Comments (12)   |   Related: > Virtualization


Problem

Our organization has decided to no longer use Windows Clustering and instead rely on the hardware redundancy inherent in a virtual environment. My problem is that I have numerous SQL Server clusters. How should I handle these clusters during the virtualization process?

Solution

Luckily, there is a way to reconfigure a SQL Server cluster after a P2V (Physical to Virtual) without having to reinstall SQL. I want to warn you though that the method probably wouldn't be sanctioned by the Microsoft powers and I can only recommended it to those who are comfortable with living on the edge. I will say that I've performed this process on over a dozen production clusters. They have been running without error for over 6 months. The process really was one of necessity based on having an exceedingly aggressive P2V schedule. Are there risks - certainly, but then again isn't this why we became DBA's?

First Steps

In a large organization you'll need to work closely with the team responsible for converting the server to a virtual server. If you are the DBA and you happen to also be the one responsible for the conversion then these steps still apply and you're lucky enough to not have to schedule as many meetings.

Assuming we are only dealing with a 2-node cluster your configuration consists of 2 physical node names and 2 virtual names - one for SQL and one for the server. The only name you'll want to retain post P2V is the SQL Network Name. This is the name your application(s) should be connecting to and is the same as the default instance. You could choose a different name, but you'll be causing yourself much more trouble than it's worth.

In the image below the SQL Network Name is called STLPDBCCMSCLUS and it contains nodes STLPDBCCMSP01 and STLPDBCCMSP02. Both of the physical node names will be removed after the P2V and the only name you'll use is the original SQL Network Name.

SQLVName

Finally, you'll need to ensure the same drive letters are used. For example, if the cluster has LUNs F and G than the new VM should also have LUNs F and G. Again, you could change this, but it wouldn't be worth the effort. What will be removed is the quorum drive, usually Q. This will cause you problems with MSDTC, but we'll discuss this later in the tip.

Now for the Tricky Part

Everything you do during this tip will need to be completed during the P2V downtime. You or another team will shutdown production and begin the conversion. Once the conversion is completed SQL will try to start but fail. This is where you come in to save the day.

The first order of business is to remove all the unused IP addresses. If you remember the cluster had 4 server names (2-nodes, SQL name, server name). It now only has one. This means there is only one IP address being used, but SQL still thinks there are many. If you had looked in the Network settings on the cluster you would have seen something similar to this:

TCPIP

This screenshot shows 3 out of the 4 entries. There is also a loopback entry of 127.0.0.1. These TCP\IP entries simply refer to values in the registry and this is where we will make the changes. For SQL 2005 you should be able to find them at HKLM\Software\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLServer\SuperSocketNetLib. Determine the IP address of the new server and then remove all of the key entries except for the loopback entry, the server IP, and the IPAll. I make IP1 the IP address of the server and IP2 the loopback IP.

You will also see a key called "Cluster" under HKLM\Software\Microsoft\ Microsoft SQL Server\MSSQL.1\. The entire key can be deleted.

Only one more thing needs to be done before starting SQL Server. If your cluster was correctly configured than MSDTC would have a dependency on SQL Server. This is because you want MSDTC to have started prior to SQL Server starting. The problem now is the MSDTC log file is normally (but not always) stored on the cluster quorum drive. You'll need to change this in order for the MSDTC service to start.

Begin by searching through the registry for "MSDtc". You should find an entry under HKCR\CID\"some large CLSID"\CustomProperties\Log\Path. You'll want to change the path from Q:\ to another location.

MSDTC

All that is left now is to cross your fingers and start both the Distributed Transaction Service and the SQL Server Service. Viola'!

Summary

This method of reconfiguring a cluster that will break due to a P2V certainly isn't a best practice, but then again we sometimes have to make exceptions in order to move forward at the pace of change. There may still be some lingering cluster information in the registry, but your SQL instance will run smoothly. The steps are similar for SQL 2008 though some registry paths may be different from SQL 2005. You also shouldn't see much difference in P2V of active\active nodes. You'll still retain the SQL Network Names, but you'll end up with 2 VM instances instead of one.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Scott Shaw Scott Shaw is a Lead SQL Server DBA with extensive experience running SQL Server in a virtualized environment.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Tuesday, April 10, 2018 - 5:51:15 AM - Mark Back To Top (75654)

Thank You!

MSDTC log file in Quorum disk.  Nice tip!!!

Thanks,

Mark

 


Thursday, May 8, 2014 - 1:15:11 PM - Glenn Back To Top (30675)

Jose,

 

Thanks for your steps.  I am still confused on which IP's and hostname I use for my virtual sever I just P2V'd.  Do I use the original node name and IP address of that node, or can the hostname be anything and I have to use the cluster IP address?

 


Saturday, March 23, 2013 - 7:06:54 AM - jose antonio silva Back To Top (22982)

Langus, I did try that but as our data drives were converted to vhd and using the "normal" hyper-v SCSI controller, it did not support the "shared" mode and Cluster Service simply took the disk into a "offline" PReserved mode.

 

My steps were:

1. Disk2Vhd all drives (including Q:)

2. Build the new VM using PCI controller for the system disk, and putting all cluster disks on a new SCSI controller.

3. Boot (a solve bootrec, rebuildbcd, etc...) - have a windows ISO arround for boot recovery procedures

4. Clear configuration using cluster node /forcecleanup http://blogs.technet.com/b/canitpro/archive/2012/02/14/how-to-clean-up-cluster-nodes-from-destroyed-clusters.aspx

5. Remove "Failover Clustering" and MPIO features from windows

6. Clear HKLM\Software\Microsoft\Microsoft SQL Server\MSSQL.1\Cluster  http://tarajana.wordpress.com/2011/10/14/could-not-find-any-ip-address-that-this-sql-server-instance-depends-upon/

7. Add the clustered IP addresses to the local network adapter, removed the "virtual" computer from AD, added a static entry to DNS

8. reboot

9. clear lots of unused drivers using set devmgr_show_nonpresent_devices=1


Wednesday, November 21, 2012 - 7:24:34 PM - James Ford Back To Top (20465)

Langus, I think disk signaturing/ID's might prevent that.


Tuesday, November 20, 2012 - 1:28:23 PM - Langus Back To Top (20434)

Good post!

But I have a question: why not just virtualize a node and keep it running as if the other one is just turned off? (and it will never came back online)

Cheers! 


Tuesday, November 20, 2012 - 11:12:37 AM - James Ford Back To Top (20432)

I just went through this.  Thanks to Chris for more specifics.  The only problems I've had following the conversion we just did a week ago are two-fold:

First (minor issue), the SQL Mgmt Studio shows a 'white' icon instead of green and from what I can find online, it's because it (SQL Mgmt Studio) probably was installed when it was 'REAL' cluster, but now that it's not 'REAL' (with cluster service/drivers disabled), it just shows a 'white' icon for the database instance and SQL Agent.  You can stop/start the SQL service and agent service all day long with no problem and functionally all is just fine, but it's just a nuissance if, say, you're a database guy (I'm not) and you like to use SQL Mgmt Studio to 'pause' or stop/restart the SQL database instance (all those options are now grayed-out).

Second (perhaps a bit more concern; working on this) a particular app that used to depend on MS DTC no longer functions.  It claims that MS DTC is not up/running on the server, yet you can start/stop DTC all day long without fail; it's seemingly fine---locally.  Both the server node name and SQL cluster name IP addresses had been changed by me to the sole IP address of the single P2V'd node of the former cluster so all database connections would work, etc.  I also removed all IP's (IP1, IP2, etc in the 'tricky part') except for those as specified by the original author.  But now having looked at my registry backup, I can see that one of the IP* settings, 'IP3' in particular, had been removed (again, per author's noting he just kept IP1 (server IP), IP2 (loopback) and IPAll).  I removed it myself.  Will SQL service become MS DTC 'aware' again, perhaps, if I just put back the original IP address value as it was (which was the IP address of the originally clustered DTC) and ensure DNS points it back to the server in the end?  There is not much mention of DTC in this article so I'm looking for a bit of clarity if you guys are still watching this thread---thank you.


Thursday, July 26, 2012 - 9:25:50 AM - Chris LoVerme Back To Top (18812)

For others looking at this, one thing that wasn't fully clear in the article was the order of operations process for doing this.  Here are some tips for pulling this off, I've done it for a SQL 2005 cluster and moved to vmware 5 and it does work.  To be clear, you are doing a P2V process on the physical nodes of the cluster. You are also going to "break apart" the cluster after the P2V process. First, failover all the cluster resources to a single physical node (say SERVER1) and P2V the live node (Server1) with the 2nd node (SERVER2) with the cluster resources being offline or better yet, power off the second node (Server2). After you migrate SERVER1 from physical server to a virtual server, you don't want SERVER2 coming back online because if server1 goes offline at anypoint (say when you're installing the VMtools or doing any post P2V work) all the cluster resources will fall back to Server2 when the cluster service thinks something bad happened to node1 (server1). I would advise just to power off Server2 and keep it off.  If (for some reason) you want to capture the physical server2 node, I would advise to do so as a cold clone process with the windows server being offline. You can use the vmware coldclone boot cd to do this (from verions of vmware prior to v5), although it is hard to find. After you p2v server1, go ahead at this point and do the "Now the tricky part" steps.


There are two key steps missing from this procedure which can cause some panic if you don't do these steps and reboot the virtulized computer after doing the "tricky part" steps in the article. What will happen is that you will see the drives but they will appear to say 100% free space and if you click on the drives, they will say "device not ready".  First, you need to disable the cluster driver (not just the cluster service), steps on how to do this are in Microsoft KB article 280425 (Step 3c).  Next, you have to clear down the MSDTC log, steps to do this are in Microsoft KB article 916926.  After that the system will be fully stable. Great article and thanks to Scott for the steps.


Wednesday, April 6, 2011 - 3:24:22 PM - Derek Moore Back To Top (13453)

Yeah - I used to do a lot of this I wrote this procedure in 2008. Its outdated now (VMware 3.5 and MS-SQL 2005, Win Server 2008), but may help someone on the basic procedure for building VMware VM's of Microsoft SQL Clusters on Win2008 Server with Win2008 Active Directory. It is based on lots of input and research, from this site, Microsoft, VMware and other sites. Both VMware and Microsoft documentation, especially on the Group Policies for Microsoft SQL clusters, often leaves out one rule/policy in one document and leaves out a different one in another (VMware leaves out some disk configuration settings needed), so had to read a BUNCH of docs to get all the GPO's right (for me anyway on a 40,000+ user network) and other setting right. It may help someone. I do understand and appreciate your advice above. Links and references are included at the top of the document - please read them and the warning - it only works with the sofware listed - things have changed - newer versions now. Somewhere I have the version using multiple VMware physical boxes and also a document on how to recompile the VMWare 3.5 kernel to support cheap IDE/SATA drives instead of SCSI VFMS.

Building a Virtual MSCS Server with VMWare ESX 3.5 Server, Microsoft Server Cluster Services (MSCS),  SQL Server 2005 Cluster on same physical box With Windows 2008 Server Active Directory Domain Controllers and Group Policy and Lowest User Access Privileges Service Accounts for Audit Compliance

http://intensewebsites.com/html/VMWARE_SQL_CLUSTERING.htm

 

 

 


Wednesday, April 6, 2011 - 1:58:01 PM - Scott Shaw Back To Top (13452)

Derek,

Ha!  Sometimes sarcasm doesn't come through well in writing. I'm glad you noticed!


Wednesday, April 6, 2011 - 1:56:28 PM - Scott Shaw Back To Top (13451)

Rad,

I'm glad you posted. We never even tried clustering on VM so I'm glad to hear about your experience.  VMWare has a lot of documentation talking about VM clustering but then there is the sometimes not so subtle difference between theory and practice.

Scott


Wednesday, April 6, 2011 - 11:09:43 AM - Derek Moore Back To Top (13450)

Excellent advice  lol @ 'exceedingly aggressive P2V schedule' . Been there - done that - hahahaa and on your production SQL servers.


Wednesday, April 6, 2011 - 7:00:53 AM - Rad Back To Top (13446)

If you ever want to do clustered SQL on virtual environment, my only advise is: not to. With VMWare we had many issues starting from memory handling to database corruption. Even we had a 2 node cluster environment with a cool node1 and a node2 with the database in a  2 month old status. Somewhere, somehow it's didn't syncronised the two side without any mark of problem.















get free sql tips
agree to terms