Monitor SQL Instance Balance across Windows Cluster Nodes Using PowerShell

By:   |   Comments (6)   |   Related: > Clustering


Problem

While hardware is getting cheap these days and people install no more than one SQL Failover Cluster Instance (FCI) per node in a Windows Server Failover Cluster (WSFC), there are still organizations that want to save hardware, space or license cost by stacking multiple SQL FCI over a single WSFC where the number of FCI's is much more than the number of WSFC nodes. However, stacked FCIs could introduce resource contention if they are unevenly distributed across cluster nodes. This tip will demonstrate how to monitor SQL FCIs if they become unbalanced using Windows PowerShell.

Solution

A WSFC system can have multiple physical nodes, and you can install multiple SQL FCIs on each of them. In most cases, you would want these FCIs evenly distributed across the nodes. For example, in a 4-node WSFC has 12 SQL FCIs, you would want to keep 3 FCIs running on each node. This balanced condition allows you to maximize resource usage for each node. If these FCIs are similar in function, you can configure "Maximum SQL Server Memory" to 1/3 of physical memory for each instance. To maintain such balance, you can set instance to "Allow failover" to its preferred node in the instance's properties. However you might not want to set auto failover under certain event such as patching, which may cause over stacking to happen. When an excessive number of FCIs fall onto a single node, these FCI will compete for resources like memory, CPU, network, resulting in performance degradation.

In the following discussion we will show you how to design a monitor system to check the FCI balance and fire an alert when the FCIs are unbalanced, using PowerShell along with Windows Task Manager.

Step 1: Understand the logic to balance the SQL Server Failover Cluster Instances

We need to set the threshold, or the maximum number of FCIs per node for a given WSFC. We want these FCIs spread evenly across the WSFC. So the threshold can be figured by formula below:

Max # FCI per node = CEILING (# total FCI / # physical nodes)

Here CEILING () is a math function to round up to the nearest integer. The alert is fired when number of FCI in any node exceeds this number.

Example: if we have a 3-node WSFC stacked with 10 FCIs. The threshold, or maximum number of FCI per node, will be CEILING(10/3) = 4. The balanced FCI distribution would be like 3/3/4, 3/4/3, etc. If it happened to be 3/2/5 or 2/2/6 then the alert will be fired from the node that has more FCIs than the threshold.

Step 2: PowerShell Code for Calculating the SQL Server Failover Cluster Instances and Alerting

Now let's put above logic into PowerShell code. The code block is divided into 2 sections: sendmail() function and main logic. We use email as alert delivery method, so in below code you need to change SMTP mail server name, email address to your values.

# sendmail function
function sendmail{
param($msgText)
Write-Host "Sending Email"

# ***Change to your SMTP server name
$smtpServer = "your.smtp.server.com"
$msg = new-object Net.Mail.MailMessage
$smtp = new-object Net.Mail.SmtpClient($smtpServer)

# ***Change "From", "To" and "Subject" fields to your own
$msg.From = "[email protected]"
$msg.To.Add("[email protected]")
$msg.subject = "Too many SQL instances on $env:COMPUTERNAME"
$msg.body = $msgText 

# Sending email
$smtp.Send($msg)
}

#  main logic
$nTotSQLInst = Get-WMIObject Win32_Service | where {($_.Name -LIKE "MSSQL$*") -or ($_.Name -eq "MSSQLSERVER") } | Select -ExpandProperty Name
$nWSFCNode = get-clusterNode | select-object Name
$nSQLInst = Get-WMIObject Win32_Service -Filter {Name LIKE '%MSSQL$%' and State LIKE 'Running'} | Select -ExpandProperty Name
$maxSQLInst = [math]::ceiling($nTotSQLInst.count/$nWSFCNode.count)
If ( $nSQLInst.count -gt $maxSQLInst ) {
 $CR = "`r`n"
 $InstList = $null
 for ($i=0; $i -lt $nSQLInst.count; $i++) {$InstList = $InstList + $nSQLInst[$i] + $CR}
 $msgText = "There are " + $nSQLInst.count + " SQL instances in node $env:COMPUTERNAME :" + 
  $CR + $CR + $InstList + $CR + "The max # of instances allowed is $maxSQLInst" 
 sendmail($msgText)
}

Once you customized your code, save above script in each WSFC node's local drive with PowerShell extension ".ps1", in say the c:\apps\SQLInstBalMonitor.ps1 directory.

Step 3: Testing the PowerShell Code

You will test the script in each node, to make sure the server has the correct PowerShell components to allow execution of the script. Now go to one of the nodes and change directory to c:\apps and type this command from prompt as highlighted below:

C:\apps>powershell .\SQLInstBalMonitor.ps1

Depending on your own cluster configuration, you should expect the result of the script before the test. For example, if you have a 3-node WSFC and 22 FCI, your current distribution is 7/7/8 for node 1/2/3, and you are in node1. According to our formula the threshold is CEILING(22/3) = 8. Because 7 < 8, you won't get any email alert. Now if you failover 2 FCI from other nodes to node1, the distribution may become 9/6/7, you would expect an email after executing the script similar to the one below:

email

This alert prompts you to failover at least one FCI from IIS2008QADB to another node.

Step 4: Deploy to other nodes

Once the script is fully tested in each node and it yields the expected result, the last step is to deploy this PowerShell script to Windows Scheduler in each node of the WSFC so that you have an automated alert.

With the script in same location (c:\apps for example) of each node, you are ready to deploy. From any one node, you create a Windows scheduled task from Administrative Tools, Task Scheduler (call it "PowerShell- SQL instance balance monitor"), create a basic task. For the trigger (or schedule) set it to daily every 1 hour. For Action, you select "Start a Program" and specify how the script would be executed, typically: PowerShell c:\apps\SQLInstBalMonitor.ps1, then finish the task creation.

Once the task is deployed in one node, you manually run it as a final test. If there are no errors, you can export the task in XML format, and import the task's XML file to the Windows Task Schedule of all other nodes, as illustrated below.

Export the Windows Task to an XML file

task scheduler - export

Import the Windows task from XML file to other nodes

task scheduler - import

Since the script is generic, you can even deploy it to other WSFC in your network using the same methodology, with no change to the script.

Now you have set up the SQL instance balance monitor in all your SQL cluster systems, and it will monitor the platform for you. If some system administrators patch a node and forget to fail the SQL instances back to the original node, or for whatever reason the node gets rebooted and all FCI fall to a single node, the monitor will alert you that you need to re-balance these instances between Windows nodes.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Lin Liu Lin Liu is a SQL server DBA and consultant in the San Francisco area. He is specializes in SQL administration and architecture solutions.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Friday, August 16, 2019 - 10:28:23 AM - SQLLearner Back To Top (82081)

How can we make this script work for SQL Server Availability Group Instances?


Tuesday, May 7, 2013 - 5:26:52 PM - Lin Back To Top (23781)

Jason, to answer your first question, deploying script to 100 servers is as simple as a batch file like below, if you have a server list:

copy /Y SQLInstBalMonitor.ps1 \\ServerName1\c$\apps

........

copy /Y SQLInstBalMonitor.ps1 \\ServerName100\c$\apps

Secondly, the quoted statement really means you shouldn't run monitor on a single or a small group of SQL instances as they could be all down.  By distributing it to every box, the probability of your system being unmonitored is nearly zero.


Saturday, May 4, 2013 - 4:44:23 PM - Alberto Santiago Back To Top (23723)

Excelent! information, thanks


Friday, May 3, 2013 - 3:27:55 PM - Jason Back To Top (23710)

Lin, your response makes sense from one point of view.  However, I would counter with, what if you want to make a change to your powershell script, and you have it deployed to 100 servers? 

 

And I hope I don't sound too much of an ass here, but I never expected to see a DBA say this... "I wouldn't query/store anything from a SQL database because any of DB/instance could be down."  :)


Thursday, April 18, 2013 - 3:47:36 PM - Lin Back To Top (23428)

Hi Jason, thanks for the comments.  The reason to deploy to every node of the cluster is to make sure the monitor is always on in case some nodes are offline.  Also, when some nodes are offline, the "max instances per node" will be recalculated based on current online nodes.  Secondly, I wouldn't query/store anything from a SQL database because any of DB/instance could be down.  By deploying to each node you are rest assured that monitor is running somewhere.


Thursday, April 18, 2013 - 2:17:11 PM - Jason Back To Top (23427)

Nice script, wondering why you would want to deploy this to every server, rather than just use a list of servers, and the -computer flag on the GWMI call.  I'd parse through a database table or a text file that has a list of server names, and the "max instances per node".















get free sql tips
agree to terms