Registry Check Pointing in a Windows Cluster to Bring SQL Server Online

By:   |   Comments (2)   |   Related: > Clustering


Problem

The SQL Server service for our cluster could not be brought online during a recent failover due to registry not found error(s). How can we correct this issue?

Solution

Sometimes the SQL Server service cannot be brought online when there is a failover to another node in the cluster. When you check the error logs, you may have noticed that the SQL Server service could not be brought online due to certain registry entries.  Here is an example of such error messages:

[sqsrvres] GetRegKeyAccessMask: Could not get registry access mask for registry key Software\Microsoft\Microsoft SQL Server\MSSQL10.<InstanceName>\Replication (status 2)).
[sqsrvres] Worker Thread (358930): Failed to retrieve the Replication registry key (last error = 0).
[sqsrvres] StartResourceService: Failed to start MSSQL$I03 service.  CurrentState: 1
[sqsrvres] OnlineThread: ResUtilsStartResourceService failed (status 435)
[sqsrvres] OnlineThread: Error 435 bringing resource online.

This issue can be resolved and the technique to resolve this issue is known as registry check-pointing.

SQL Server is a highly registry aware application. If any registry information (keys/values) changed for some unknown reason, these changes can have serious impacts to SQL Server running on that cluster node.

SQL Server maintains registry information of all its cluster resources and a sample is shown in Figure 1.

To review the registry, click on Start |  Run and type “regedit” and then navigate to the path shown in Figure 1.

1

Figure 1 - Cluster Resources in the Registry

Underneath the “Resources” folder in “regedit”, there are many folders with some sort of GUID as the name. These are the resources in the cluster and each resource has its own GUID for self identification purpose. If you expand those folders you can see other parameters related to that resource.

How Do You Know the Issue is With Cluster Check-Points?

This is a very valid question. Before you start anything with cluster check-pointing, you need to make sure no registry keys are missing. You can run the command below on any node of the cluster at the command prompt with administrator privileges.

2
Cluster res /checkpoints > c:\checkpoints_output.txt

Figure 2 – List all the check-points which are enabled in the cluster

Alternatively you can send the output to a text file for later analysis.

Sample output of the above command.

3

Figure 3 – List of enabled check-points in a cluster

This output file lists all the check-points which are enabled on every instance of the cluster. For example, if you have three SQL Server instances installed on the cluster, the above output would list all of the check-points for all three instances. If you identify check-points for an instance and it is not included or has missing check-points, this is the issue you need to fix.

How to Enable Missing Check-Points for Clustered SQL Server Instances

To fix the missing check-points, you need to enable the missing check-points for a particular SQL Server instance. It is a simple process, but you do need downtime because you need to failover the cluster to test it.

Assume, in the cluster you have three SQL Server instances running;

  1. SQLInstance00 (Default)
  2. SQLInstance01\I02
  3. SQLInstance02\I03

As per Figure 3, it is clear that there are no check-points enabled for SQLInstance01 because it is not listed in the output. If you want to fix that, follow the below steps:

1. Open command prompt with Administrator permissions on the active node of the SQL Server instance currently having issues (i.e. SQLInstance01)

2. Take the SQL Server service offline

3. Type the following commands in command prompt and press enter. You can execute all these at once or one by one.

cluster res "SQL Network Name (SQLInstance01)" /addcheckpoints:"SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.SQLInstance01\Cluster"
cluster res "SQL Network Name (SQLInstance01)" /addcheckpoints:"SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.SQLInstance01\MSSQLServer"
cluster res "SQL Network Name (SQLInstance01)" /addcheckpoints:"SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.SQLInstance01\Replication"
cluster res "SQL Network Name (SQLInstance01)" /addcheckpoints:"SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.SQLInstance01\Providers"
cluster res "SQL Network Name (SQLInstance01)" /addcheckpoints:"SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.SQLInstance01\SQLServerSCP"
cluster res "SQL Network Name (SQLInstance01)" /addcheckpoints:"SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.SQLInstance01\CPE"
cluster res "SQL Network Name (SQLInstance01)" /addcheckpoints:"SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.SQLInstance01\SQLServerAgent"

Note: Each command line has the following identical syntax. This syntax is different based on the SQL Server version. Below is an example for SQL Server 2008.

 cluster . resource "SQL Network Name(<InstanceName>)" /addcheckpoints:"HKEY_LOCAL_MACHINE\Software\Microsoft\Microsoft SQL Server\MSSQL.x\MSSQLSERVER"

Words in bold are the places you need to replace with the appropriate values for your cluster. (SQL Network Name(<InstanceName>), MSSQL.x)

The first value in the command above is the SQL Server resource name which you need to find. To find the resource name of any SQL Server instance in a cluster, follow the below steps:

3.1 Open Failover Cluster Manager (FCM)

3.2 Expand the cluster and select the particular cluster in left pane of FCM. Then it will show the resources of the selected SQL Server instance in the right pane of FCM.

3.2 Right click on the SQL Server instance name and click Properties. Refer to Figure 4.

4

Figure 4 - Get Properties for SQL Server Name

3.3 In the Properties window of the SQL Server instance, copy the Resource Name. Refer to Figure 5.

5

Figure 5 - Properties of SQL Server Instance

Note: This is the resource name you need to paste in as the first value of the command in step 3 (i.e. Resource Name).

4. If you do not get any errors and enabling the registry check-pointing is successful, bring the SQL Server online. If you done this while SQL Server is online, then ignore step #4.

5. To confirm, run "Cluster res /checkpoints" to see that check-points are enabled for all three instances.

6. Failover the instance to a different node, and this time you should be able to failover without any issues.

Conclusion

I had an opportunity to troubleshoot such an issue with the help of Microsoft Engineers and they identified the issue is due to registry check-pointing. I am sharing the knowledge I gathered with the community. It is always better to troubleshoot and do some homework yourself before attempting to do anything on production system. It is critical to backup the registry of the clustered node that you are enabling the check-points before making any changes. If something goes wrong you have a backup to restore the registry to its original state. Please read the references I mention in this tip to get a full understanding of the problem. If you have any doubt about this process and testing did not yield the required output, please be cautious about proceeding in production.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Susantha Bathige Susantha Bathige currently works at Pearson North America as a Production DBA. He has over ten years of experience in SQL Server as a Database Engineer, Developer, Analyst and Production DBA.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Monday, January 19, 2015 - 10:57:25 AM - Jeremy Kadlec Back To Top (35986)

Susantha,

Congrats on your first tip!

Thank you,
Jeremy Kadlec
Community Co-Leader


Monday, January 19, 2015 - 9:42:55 AM - Balakrishna.B Back To Top (35983)

Good one Susantha...















get free sql tips
agree to terms