Monitor SQL Instance Balance across Windows Cluster Nodes Using PowerShell

By: Lin Liu | Comments (6) | Related: > Clustering

Problem

While hardware is getting cheap these days and people install no more than one SQL Failover Cluster Instance (FCI) per node in a Windows Server Failover Cluster (WSFC), there are still organizations that want to save hardware, space or license cost by stacking multiple SQL FCI over a single WSFC where the number of FCI's is much more than the number of WSFC nodes. However, stacked FCIs could introduce resource contention if they are unevenly distributed across cluster nodes. This tip will demonstrate how to monitor SQL FCIs if they become unbalanced using Windows PowerShell.

Solution

A WSFC system can have multiple physical nodes, and you can install multiple SQL FCIs on each of them. In most cases, you would want these FCIs evenly distributed across the nodes. For example, in a 4-node WSFC has 12 SQL FCIs, you would want to keep 3 FCIs running on each node. This balanced condition allows you to maximize resource usage for each node. If these FCIs are similar in function, you can configure "Maximum SQL Server Memory" to 1/3 of physical memory for each instance. To maintain such balance, you can set instance to "Allow failover" to its preferred node in the instance's properties. However you might not want to set auto failover under certain event such as patching, which may cause over stacking to happen. When an excessive number of FCIs fall onto a single node, these FCI will compete for resources like memory, CPU, network, resulting in performance degradation.

In the following discussion we will show you how to design a monitor system to check the FCI balance and fire an alert when the FCIs are unbalanced, using PowerShell along with Windows Task Manager.

Step 1: Understand the logic to balance the SQL Server Failover Cluster Instances

We need to set the threshold, or the maximum number of FCIs per node for a given WSFC. We want these FCIs spread evenly across the WSFC. So the threshold can be figured by formula below:

Max # FCI per node = CEILING (# total FCI / # physical nodes)

Here CEILING () is a math function to round up to the nearest integer. The alert is fired when number of FCI in any node exceeds this number.

Example: if we have a 3-node WSFC stacked with 10 FCIs. The threshold, or maximum number of FCI per node, will be CEILING(10/3) = 4. The balanced FCI distribution would be like 3/3/4, 3/4/3, etc. If it happened to be 3/2/5 or 2/2/6 then the alert will be fired from the node that has more FCIs than the threshold.

Step 2: PowerShell Code for Calculating the SQL Server Failover Cluster Instances and Alerting

Now let's put above logic into PowerShell code. The code block is divided into 2 sections: sendmail() function and main logic. We use email as alert delivery method, so in below code you need to change SMTP mail server name, email address to your values.

# sendmail function
function sendmail{
param($msgText)
Write-Host "Sending Email"

# ***Change to your SMTP server name
$smtpServer = "your.smtp.server.com"
$msg = new-object Net.Mail.MailMessage
$smtp = new-object Net.Mail.SmtpClient($smtpServer)

# ***Change "From", "To" and "Subject" fields to your own
$msg.From = "[email protected]"
$msg.To.Add("[email protected]")
$msg.subject = "Too many SQL instances on $env:COMPUTERNAME"
$msg.body = $msgText 

# Sending email
$smtp.Send($msg)
}

#  main logic
$nTotSQLInst = Get-WMIObject Win32_Service | where {($_.Name -LIKE "MSSQL$*") -or ($_.Name -eq "MSSQLSERVER") } | Select -ExpandProperty Name
$nWSFCNode = get-clusterNode | select-object Name
$nSQLInst = Get-WMIObject Win32_Service -Filter {Name LIKE '%MSSQL$%' and State LIKE 'Running'} | Select -ExpandProperty Name
$maxSQLInst = [math]::ceiling($nTotSQLInst.count/$nWSFCNode.count)
If ( $nSQLInst.count -gt $maxSQLInst ) {
 $CR = "`r`n"
 $InstList = $null
 for ($i=0; $i -lt $nSQLInst.count; $i++) {$InstList = $InstList + $nSQLInst[$i] + $CR}
 $msgText = "There are " + $nSQLInst.count + " SQL instances in node $env:COMPUTERNAME :" + 
  $CR + $CR + $InstList + $CR + "The max # of instances allowed is $maxSQLInst" 
 sendmail($msgText)
}

Once you customized your code, save above script in each WSFC node's local drive with PowerShell extension ".ps1", in say the c:\apps\SQLInstBalMonitor.ps1 directory.

Step 3: Testing the PowerShell Code

You will test the script in each node, to make sure the server has the correct PowerShell components to allow execution of the script. Now go to one of the nodes and change directory to c:\apps and type this command from prompt as highlighted below:

C:\apps>powershell .\SQLInstBalMonitor.ps1

Depending on your own cluster configuration, you should expect the result of the script before the test. For example, if you have a 3-node WSFC and 22 FCI, your current distribution is 7/7/8 for node 1/2/3, and you are in node1. According to our formula the threshold is CEILING(22/3) = 8. Because 7 < 8, you won't get any email alert. Now if you failover 2 FCI from other nodes to node1, the distribution may become 9/6/7, you would expect an email after executing the script similar to the one below:

This alert prompts you to failover at least one FCI from IIS2008QADB to another node.

Step 4: Deploy to other nodes

Once the script is fully tested in each node and it yields the expected result, the last step is to deploy this PowerShell script to Windows Scheduler in each node of the WSFC so that you have an automated alert.

With the script in same location (c:\apps for example) of each node, you are ready to deploy. From any one node, you create a Windows scheduled task from Administrative Tools, Task Scheduler (call it "PowerShell- SQL instance balance monitor"), create a basic task. For the trigger (or schedule) set it to daily every 1 hour. For Action, you select "Start a Program" and specify how the script would be executed, typically: PowerShell c:\apps\SQLInstBalMonitor.ps1, then finish the task creation.

Once the task is deployed in one node, you manually run it as a final test. If there are no errors, you can export the task in XML format, and import the task's XML file to the Windows Task Schedule of all other nodes, as illustrated below.

Export the Windows Task to an XML file

Import the Windows task from XML file to other nodes

Since the script is generic, you can even deploy it to other WSFC in your network using the same methodology, with no change to the script.

Now you have set up the SQL instance balance monitor in all your SQL cluster systems, and it will monitor the platform for you. If some system administrators patch a node and forget to fail the SQL instances back to the original node, or for whatever reason the node gets rebooted and all FCI fall to a single node, the monitor will alert you that you need to re-balance these instances between Windows nodes.

Next Steps

This tip is intended to help DBAs detect SQL instances over stacking at an early stage, to avoid performance degradation. This code and job is easy to deploy to SQL cluster systems in organization. The PowerShell script is tested in Windows 2008 R2 and Windows 2012 server cluster, with SQL 2008/R2/2012 SQL server clusters. For Windows 2008 server you may need to preload some PowerShell cluster objects so the code will work.
Check out these additional resources:
- Failover Cluster cmdlets - http://technet.microsoft.com/en-us/library/ee461009.aspx
- Task Scheduler - http://technet.microsoft.com/en-us/library/cc721871.aspx
- PowerShell tips
- Clustering tips

About the author

Lin Liu is a SQL server DBA and consultant in the San Francisco area. He is specializes in SQL administration and architecture solutions.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Friday, August 16, 2019 - 10:28:23 AM - SQLLearner	Back To Top (82081)
How can we make this script work for SQL Server Availability Group Instances?

Saturday, May 4, 2013 - 4:44:23 PM - Alberto Santiago	Back To Top (23723)
Excelent! information, thanks

Thursday, April 18, 2013 - 2:17:11 PM - Jason	Back To Top (23427)
Nice script, wondering why you would want to deploy this to every server, rather than just use a list of servers, and the -computer flag on the GWMI call. I'd parse through a database table or a text file that has a list of server names, and the "max instances per node".