We generally find out that a
SQL Server Cluster has failed over to another node in the cluster after the
fact when looking at logs for some other reason. This tip will look at using
to send out an email notification when a failover occurs.
There are many different methods we could use to find out that a
SQL Server Cluster has failed over. Since the process of failover is really
just the windows SQL Service stopping on one node and starting on another we could
check the windows event log. Alternatively we could scan the
Log for these actions. Since we are using
for this tip and with this comes an easy way to generate the windows cluster log,
let's use this log file to check for any
Cluster failovers. When writing any
scripts I usually like to write each command individually to make sure they run
successfully before putting them together. This process makes troubleshooting a
little easier as anyone who has done anything in PowerShell before knows that even
single commands can get pretty complicated.
In order to generate this notification
we'll need to write something to perform the following four tasks:
Generate the Windows Cluster log
Scan the Cluster log file for a SQL Cluster failover
Send an email alert noting the SQL Cluster failover
Schedule the PowerShell script
Step 1. Generate the Windows Cluster log
Before we can use any of the Failover Cluster cmdlets we need to import the FailoverClusters
module into our
PowerShell session. To do this we can run the following command.
Once the FailoverClusters module is loaded, all we need to do to generate the
cluster log file is to run the
Get-ClusterLog cmdlet. One thing to note is most
commands do not need to be run from the local machine and can be called from another
machine by specifying the node/computername parameter if required. For this tip
we will assume that all commands are running on the local machine however we will
include the node parameter so it could be run from any machine in your domain without
The below example will output the cluster log file in the C:\SQL directory
for the last 24 hours.
After running the above command you should see the following file in your C:\SQL
Step 2. Scan the Cluster log file for a SQL Cluster failover
Next we have to scan this log file looking for a failover. The easiest way I've
found to do this is to search the file for the "OnlinePending-->Online" message
for your SQL cluster resource. We can accomplish this by reading in the cluster
log file using the
cmdlet and then pipe this through the
cmdlet searching for this pattern. You can see the complete command below as well
as a sample output. Note that we've added another pipe to Out-String. This is so
we can use this output as the body of our email in the next step.
[RCM] TransitionToState(SQL Server (INST1)) OnlinePending-->Online.
[RCM] TransitionToState(SQL Server Agent (INST1)) OnlinePending-->Online.
Step 3. Send email alert noting the SQL Cluster failover
All that's left to do now is send the email notification. In order to do this
we will first store the result of the command above into a variable and test if
this variable is empty. If it does in fact contain data then we send out an email
Send-MailMessage cmdlet with the details that we pulled from the cluster log
file. Below is the command to send the email. Note: The "`" character tells PowerShell
the command continues onto the next line.
Putting all of the above steps together we now have our
which we can deploy to our cluster. One thing to note with the completed script
is I've added some variables for the
Send-MailMessage parameters so the email can be configured a little bit easier.
I also added a step at the very end of the script to delete the log file that we
Step 4. Schedule the PowerShell script
The last thing we need to do is schedule our script which we will do using "Task
Scheduler". Since my
PowerShell command retrieves the last 24 hours of log data from the cluster
log file I will schedule my task to run every 24 hours. This can be changed to suit
your needs and you could run it every 10 minutes if that's what is required. After
starting Task Scheduler from the Administrative Tools menu select "Create Task"
and fill out the tabs as follows.
Switch to the Triggers tab and click "New".
Switch to the Actions tab and click "New".
Once done we simply click "OK" to create the task. Note: You will have to input
the password for the account you have configured to run this task.
In my case the above script was enough to test for failover as I know which machine
an instance should be running on so when I see it come online on another node I
know it's failed over. If your environment does not have a convention similar to
this you may also want to add another job which alerts you when an instance goes
offline so you're able to distinguish failovers from simple restarts. To do this
you would use the same commands but search for the string "OfflineSavingCheckpoints-->Offline".
I prefer to see all restarts as I like to know when my SQL instances have gone offline,
whatever the reason.
As you said-- there are lots of ways to skin this cat. Consider using SP_PROCOPTION to create an "autoexec" stored procedure in the MASTER db that gets executed each time the SQL instance restarts.
Send the email from the SP. I have this set up on all my SQL boxes-- and use the values below to tell me the name of the server instance, whether it's a cluster-- and if it IS, I list the nodes and indicate which is the active node this instance is running on.
I love powershell but sometimes things are easier with basic T-SQL code. This is a pretty simple script that will detect if your sql instance is not on the right node and you can add a simp[le DbMail block of code or use custom code to send an email or page your cell.
Create proc sp_dba_checkclusterbalance as Declare @srvname varchar(55)
select @srvname = Convert(varchar,SERVERPROPERTY('ComputerNamePhysicalNetBIOS')) --Print @srvname if @srvname = 'DefaultPhyNodeNamehere' begin Print 'All is good' return end else Begin Print 'Cluster not balanced on ' +@srvname Print 'Pager Code goes below here' end
Thursday, January 16, 2014 - 2:33:50 PM - Edward Pochinski
And yet another way to skin a cat, seems mine has left my office for now...He usually sits next to me...It opens doors maybe it can read now, har har...
Another way I found long ago was this old DOS utility called findstr that can seach txt file for strings when we had to alarm off of a .eml file for a disk subsystem in a unmanned data center and decided to use T-SQL to do it..
Friday, January 17, 2014 - 12:22:42 AM - Graham Okely
The best way is indeed (as mentioned by all others) is to use pure t-sql in a start-up procedure, so whenever a failover happens, dbas can get real-time alerts. With this powershell solution, you only get the alert when the schedule task runs, and if the failover happens a few times during the day, the start-up procedure solution will report each failover event while the current powershell solution will not. PowerShell is best used when t-sql cannot or is difficult to solve the problem.
Monday, May 05, 2014 - 5:28:36 PM - Andrew Nakamura
Yes, but it got me thinking about not only cluster events but all errors in windows logs. I used this as a basis for an example to loop through all my SQL Servers pull all the event errors and email like a mini SCCM for SQL and then poll it every x minutes. This way I can stay on top of those crazy I/O error that sometimes correlate why my log backups fail!