By: Mike Eastland | Comments (3) | Related: > Monitoring
Problem
Have you ever had a SQL Server crash and not found out about it until you received a call from one of your end-users? As a DBA, one of my goals is to know about problems in my environment before anyone else does. Since not all DBAs have access to a commercial monitoring tool, the code in this tip uses native SQL Server functionality to monitor the availability and status of remote SQL Server instances.
Solution
Prerequisite Considerations
- If possible, the remote instance monitoring jobs should be created on a
SQL Server instance that is not heavily taxed by a critical production workload.
- Be sure to
configure database
mail and SQL Agent notifications as a prerequisite for proper functionality
of these monitoring jobs.
- This solution assumes that custom error message 64004 is not currently in
use by the monitoring instance. If this is not the case, code changes will be
necessary.
- The code has been tested against SQL Server 2012, but should function on
version 2005 and greater.
Custom SQL Server Error Message
Listing 1 creates the custom error message used by the remote instance monitoring
jobs. If error message 64004 is already in use by the instance, it will be overwritten.
-- Listing 1 USE [master] GO -- Create the custom error message EXEC dbo.sp_addmessage @msgnum = 64004, @msgtext = 'Remote Instance Monitor Alert - %s', @severity = 8, @with_log = 'FALSE', @replace = 'REPLACE'; GO
SQL Server Agent Operator, Alert, and Jobs
The code in Listing 2 targets the SQL Server Agent subsystem to verify and/or create the remaining components required to automate remote instance monitoring. Specifically, the code performs the following steps:
1. Create the metadata table in the administrative database that will contain
the list of remote instances to be monitored.
2. Check for the existence of a
SQL Server Agent Operator.
3. Create a custom SQL Server Agent alert for the
custom error message created in Listing 1.
4. Create two SQL Server Agent Jobs
responsible for remote instance monitoring. Each job is described in greater detail
within the next section.
Before executing the script in Listing 2, be sure to edit the USE statement and
@jobOper variable to reflect the dedicated administrative database and desired local
SQL Server Agent Operator, respectively.
-- Listing 2 USE [MSSQLTips] -- Use dedicated administrative database where meta-data table should be created GO SET NOCOUNT ON; /**************************************************************************************** *Author: Mike Eastland * * * *Purpose: The purpose of this script is to create SQL Agent jobs on the local * * instance that will monitor connectivity to remote SQL Server instances. * * * *PreReqs: The code in the Remote Monitoring jobs depends on a table in the * * administrative database identified by the USE statement at the beginning * * of this script. This table (dbo.CustomSqlMon) contains a list of the remote * * instances to be monitored by the local SQL Agent jobs. The table will be * * created by this script. * ****************************************************************************************/ DECLARE @cmd NVARCHAR(MAX), @db VARCHAR(128), @jobAlert VARCHAR(128), @jobCat VARCHAR(128), @jobDB VARCHAR(128), @jobDescr VARCHAR(256), @jobId BINARY(16), @jobName VARCHAR(128), @jobOper VARCHAR(32), @jobStep VARCHAR(256), @jobType VARCHAR(128), @msg VARCHAR(MAX), @returnCode INT, @svr VARCHAR(128), @ver DECIMAL, @waitSec VARCHAR(3); -- Edit variable values below where applicable SELECT @db = DB_NAME(), @jobAlert = 'Remote Instance Monitor', @jobCat = 'Instance Monitor', @jobOper = 'sqlDBA', /* Name of preferred SQL Server Agent Operator on the local instance */ @svr = @@SERVERNAME, @ver = CONVERT(FLOAT, SUBSTRING(@@VERSION, (CHARINDEX('-', @@VERSION) + 1), 5)), @waitSec = '300'; /* Retry wait interval in seconds for remote connectivity monitoring job */ -- End variable initialization SET @msg = 'Creation of Remote Instance Monitoring Jobs on [' + @@SERVERNAME + '] started at ' + CONVERT(VARCHAR, GETDATE()) + '.'; RAISERROR(@msg, 0, 0) WITH NOWAIT; BEGIN TRANSACTION SET @msg = ' Backup and/or create metadata table to hold list of remote instances to be monitored.'; RAISERROR(@msg, 0, 0) WITH NOWAIT; IF OBJECT_ID('dbo.CustomSqlMon', 'U') IS NOT NULL BEGIN IF EXISTS (SELECT * FROM dbo.CustomSqlMon) BEGIN SET @cmd = 'SELECT * INTO dbo.CustomSqlMon_' + CONVERT(VARCHAR, GETDATE(), 112) + REPLACE(CONVERT(VARCHAR, GETDATE(), 108), ':', '') + ' FROM [dbo].[CustomSqlMon]'; EXEC sp_ExecuteSQL @cmd; END DROP TABLE [dbo].[CustomSqlMon]; END IF OBJECT_ID(N'[dbo].[CustomSqlMon]', 'U') IS NULL BEGIN CREATE TABLE [dbo].[CustomSqlMon] ( InstanceID INT IDENTITY(1,1) NOT NULL, InstanceName VARCHAR(128) NOT NULL, Active BIT NOT NULL, CheckSvcs BIT NOT NULL, ErrorLevel TINYINT NOT NULL, LastUpdate DATETIME NOT NULL, UpdatedBy VARCHAR(128) NOT NULL ); ALTER TABLE [dbo].[CustomSqlMon] ADD CONSTRAINT [PK_CustomSqlMon] PRIMARY KEY CLUSTERED ([InstanceID]); ALTER TABLE [dbo].[CustomSqlMon] ADD CONSTRAINT [UQ_CustomSqlMon_InstanceName] UNIQUE ([InstanceName]); ALTER TABLE [dbo].[CustomSqlMon] ADD CONSTRAINT [DF_CustomSqlMon_Active] DEFAULT (1) FOR [Active]; ALTER TABLE [dbo].[CustomSqlMon] ADD CONSTRAINT [DF_CustomSqlMon_CheckSvcs] DEFAULT (0) FOR [CheckSvcs]; ALTER TABLE [dbo].[CustomSqlMon] ADD CONSTRAINT [DF_CustomSqlMon_LastUpdate] DEFAULT (GETDATE()) FOR [LastUpdate]; ALTER TABLE [dbo].[CustomSqlMon] ADD CONSTRAINT [DF_CustomSqlMon_UpdatedBy] DEFAULT (SUSER_SNAME()) FOR [UpdatedBy]; SET @cmd = 'CREATE TRIGGER [dbo].[utr_CustomSqlMon] ON [dbo].[CustomSqlMon] AFTER UPDATE AS SET NOCOUNT ON UPDATE l SET l.LastUpdate = GETDATE(), l.UpdatedBy = SUSER_SNAME() FROM [dbo].[CustomSqlMon] l INNER JOIN INSERTED i ON l.InstanceID = i.InstanceID;' EXEC sp_ExecuteSQL @cmd; END -- Check for existence of specified operator; use failsafe operator if it doesn't IF NOT EXISTS (SELECT * FROM [msdb].dbo.sysoperators WHERE [name] = @jobOper) BEGIN SET @msg = 'Operator [' + @jobOper + '] not found; checking for failsafe operator.'; RAISERROR(@msg, 0, 0) WITH NOWAIT; EXEC [master].dbo.xp_instance_regread N'HKEY_LOCAL_MACHINE', N'Software\Microsoft\MSSQLServer\SQLServerAgent', N'AlertFailSafeOperator', @jobOper OUTPUT; END IF @jobOper IS NULL BEGIN SET @msg = 'No failsafe operator found; Job [' + @jobName + '] will not be created without notification functionality.'; RAISERROR(@msg, 8, 0) WITH LOG, NOWAIT; GOTO QuitWithRollback; END -- Create alert associated with custom error message IF EXISTS (SELECT * FROM [msdb].dbo.sysalerts WHERE [name] = @jobAlert) EXEC [msdb].dbo.sp_delete_alert @name = @jobAlert; EXEC [msdb].dbo.sp_add_alert @name = @jobAlert, @message_id = 64004, @severity = 0, @enabled = 1, @delay_between_responses = 0, @include_event_description_in = 3; EXEC [msdb].dbo.sp_add_notification @alert_name = @jobAlert, @operator_name = @jobOper, @notification_method = 1; -- Create job category if it doesn't already exist IF NOT EXISTS (SELECT [name] FROM [msdb].dbo.syscategories WHERE [name] = @jobCat AND category_class = 1) BEGIN EXEC @returnCode = [msdb].dbo.sp_add_category @class = N'JOB', @type = N'LOCAL', @name = @jobCat; IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback; END -- Remote instance monitor job SELECT @jobDescr = N'Check connectivity to remote SQL instances', @jobId = NULL, @jobName = N'Monitor - Remote SQL Connectivity', @jobStep = @jobDescr; IF EXISTS (SELECT * FROM [msdb].dbo.sysjobs WHERE [name] = @jobName) BEGIN EXEC @returnCode = [msdb].dbo.sp_delete_job @job_name = @jobName, @delete_unused_schedule = 1; IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback; END IF @ver < 10 BEGIN SELECT @jobDB = 'VBScript', @jobType = 'ActiveScripting', @cmd = N'On Error Resume Next set oConn = CreateObject("ADODB.Connection") set oRS = CreateObject("ADODB.RecordSet") sSvr = "' + @svr + '" sDB = "' + @db + '" iWait = ' + @waitSec + ' sConStr = "Provider=SQLOLEDB;Server=" & sSvr & ";Database=" & sDB & ";Integrated Security=SSPI" oConn.Open sConStr sSQL = "SELECT InstanceName, ErrorLevel FROM dbo.CustomSqlMon WHERE Active = 1 ORDER BY InstanceName" oRS.CursorLocation = 3 oRS.Open sSQL, oConn if Err.Number <> 0 then sOut = "Error querying dbo.CustomSqlMon." sSQL = "RAISERROR(64004, 8, 1, ''" & sOut & "'') WITH LOG, NOWAIT" oConn.Execute sSQL else Err.Clear if oRS.RecordCount = 0 then sOut = "The source table (dbo.CustomSqlMon) appears to be empty." sSQL = "RAISERROR(64004, 8, 2, ''" & sOut & "'') WITH LOG, NOWAIT" oConn.Execute sSQL else do while not oRS.EOF sSvr = oRS("InstanceName").Value iError = oRS("ErrorLevel").Value iRC = fRemoteConn(sSvr, iWait) select case iRC case -1 sOut = "ERROR: Unable to connect to " & sSvr case -2 sOut = "ERROR: Unable to query master.sys.databases on " & sSvr case else sOut = "Successfully connected to " & sSvr & ": " & cStr(iRC) iError = 0 end select if iRC < 0 then sSQL = "RAISERROR(64004, " & CStr(iError) & ", 3, ''" & sOut & "'') WITH LOG, NOWAIT" oConn.Execute sSQL end if oRS.MoveNext loop end if end if set oRS = Nothing oConn.Close set oConn = Nothing Private Function fRemoteConn (fRemoteInst, fWait) On Error Resume Next Set fConn = CreateObject("ADODB.Connection") Set fRS = CreateObject("ADODB.RecordSet") Err.Clear fCount = 0 fMax = 2 fConStr = "Provider=SQLOLEDB;Server=" & fRemoteInst & ";Database=master;Integrated Security=SSPI" fSQL = "SELECT COUNT(*) as dbCount FROM sys.databases" Do While fCount < fMax Err.Clear fConn.Open fConStr If Err.Number <> 0 Then fCount = fCount + 1 fStart = Time() fEnd = DateAdd("s", fWait, fStart) If fCount = fMax Then fRemoteConn = -1 Exit Function set fRS = Nothing set fConn = Nothing Else While fEnd >= Time() Wend End If Else Err.Clear Exit Do End If Loop fRS.Open fSQL, fConn If Err.Number <> 0 Then fRemoteConn = -2 Exit Function set fRS = Nothing set fConn = Nothing Else fRemoteConn = fRS("dbCount").Value End If Err.Clear Set fRS = Nothing fConn.Close Set fConn = Nothing End Function' END ELSE BEGIN SELECT @jobDB = 'PowerShell', @jobType = @jobDB, @cmd = '$dbName = "' + @db + '"; $ErrorActionPreference = "Stop"; $retMsg = ""; $sqlSvr = "' + @svr + '"; $waitSec = ' + @waitSec + '; $monTbl = New-Object System.Data.DataTable; $monTbl.Columns.Add("InstanceName") | Out-Null; $monTbl.Columns.Add("ErrorLevel", [int]) | Out-Null; $sqlConn = New-Object System.Data.SqlClient.SqlConnection "Server = $sqlSvr;Database = $dbName;Integrated Security = SSPI"; $sqlCmd = $sqlConn.CreateCommand(); $sqlCmd.CommandText = @" SELECT InstanceName, ErrorLevel FROM dbo.CustomSqlMon WHERE [Active] = 1 ORDER BY ErrorLevel DESC, InstanceName "@; try { $sqlConn.Open(); $sqlRdr = $sqlCmd.ExecuteReader(); } catch { $retCode = -1; $retMsg = "Error querying the dbo.CustomSqlMon table in the " + $dbName + " database on " + $sqlSvr; } while ($sqlRdr.Read()) { $monRow = $monTbl.NewRow(); $monRow["InstanceName"] = $sqlRdr["InstanceName"]; $monRow["ErrorLevel"] = $sqlRdr["ErrorLevel"]; $monTbl.Rows.Add($monRow); } $sqlRdr.Close(); foreach ($monRow in $monTbl.Rows) { $sqlInst = $monRow["InstanceName"]; $errLev = $monRow["ErrorLevel"]; $alertCmd = $sqlConn.CreateCommand(); $remoteConn = New-Object System.Data.SqlClient.SqlConnection "Server = $sqlInst;Database = master;Integrated Security = SSPI"; $remoteSQL = $remoteConn.CreateCommand(); $remoteSQL.CommandText = "SELECT COUNT(*) AS [dbCount] FROM sys.databases"; $retry = 0; $max = 2; $retCode = 0; $remoteDbCount = 0; while ($retry -le $max) { if ($retCode -eq 0 -or $retCode -eq -2) { try { $remoteConn.Open(); $retCode = 1; } catch { $retCode = -2; $retMsg = "Error connecting to SQL Server instance " + $sqlInst; $retry = $retry + 1; } } if ($retCode -eq 1 -or $retCode -eq -3) { try { $remoteDbCount = $remoteSQL.ExecuteScalar(); $remoteSQL.Dispose(); $remoteConn.Close(); $remoteConn.Dispose(); $retCode = 0; break; } catch { $retCode = -3; $retMsg = "Error querying master database on " + $sqlInst; $retry = $retry + 1; } } Start-Sleep -Seconds $waitSec; } if ($retCode -eq 0 -and $remoteDbCount -gt 3) { $SQL = "RAISERROR(''$retMsg'', 0, 0) WITH NOWAIT"; $retMsg = "Successfully connected to " + $sqlInst + "."; $alertCmd.CommandText = $SQL; } else { $SQL = "RAISERROR(64004, " + $errLev + ", 0, ''$retMsg'') WITH LOG, NOWAIT"; $alertCmd.CommandText = $SQL; } $alertCmd.ExecuteNonQuery() | Out-Null; $alertCmd.Dispose(); } $sqlConn.Close();' END EXEC @returnCode = [msdb].dbo.sp_add_job @job_name = @jobName, @enabled = 0, @notify_level_eventlog = 2, @notify_level_email = 0, @notify_level_netsend = 0, @notify_level_page = 0, @delete_level = 0, @description = @jobDescr, @category_name = @jobCat, @owner_login_name = N'sa', @job_id = @jobId OUTPUT; IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback; EXEC @returnCode = [msdb].dbo.sp_add_jobstep @job_id = @jobId, @step_name = @jobStep, @step_id = 1, @cmdexec_success_code = 0, @on_success_action = 1, @on_success_step_id = 0, @on_fail_action = 2, @on_fail_step_id = 0, @retry_attempts = 0, @retry_interval = 0, @os_run_priority = 0, @subsystem = @jobType, @command = @cmd, @database_name = @jobDB, @flags = 0; IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback; EXEC @returnCode = [msdb].dbo.sp_update_job @job_id = @jobId, @start_step_id = 1; IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback; EXEC @returnCode = [msdb].dbo.sp_add_jobschedule @job_id = @jobId, @name = @jobStep, @enabled = 1, @freq_type = 4, @freq_interval = 1, @freq_subday_type = 4, @freq_subday_interval = 15, @freq_relative_interval = 0, @freq_recurrence_factor = 0, @active_start_date = 20000101, @active_end_date = 99991231, @active_start_time = 3200, @active_end_time = 235959; IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback; EXEC @returnCode = [msdb].dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'; IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback; SET @msg = ' Job [' + @jobName + '] created successfully.'; RAISERROR(@msg, 0, 0) WITH NOWAIT; -- Remote service monitor job SELECT @jobDescr = N'Check status of services for remote SQL instances', @jobId = NULL, @jobName = N'Monitor - Remote SQL Services', @jobStep = @jobDescr; IF EXISTS (SELECT * FROM [msdb].dbo.sysjobs WHERE [name] = @jobName) BEGIN EXEC [msdb].dbo.sp_delete_job @job_name = @jobName, @delete_unused_schedule = 1; IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback; END IF @ver < 10 BEGIN SELECT @jobDB = 'VBScript', @jobType = 'ActiveScripting', @cmd = N'On Error Resume Next set oConn = CreateObject("ADODB.Connection") set oRS = CreateObject("ADODB.RecordSet") sSvr = "' + @svr + '" sDB = "' + @db + '" sConStr = "Provider=SQLOLEDB;Server=" & sSvr & ";Database=" & sDB & ";Integrated Security=SSPI" oConn.Open sConStr sSQL = "SELECT InstanceName, ErrorLevel FROM dbo.CustomSqlMon WHERE Active = 1 AND CheckSvcs = 1" oRS.CursorLocation = 3 oRS.Open sSQL, oConn if Err.Number <> 0 then sOut = "Error querying dbo.CustomSqlMon." sSQL = "RAISERROR(64004, 8, 1, ''" & sOut & "'') WITH LOG, NOWAIT" oConn.Execute sSQL else Err.Clear if oRS.RecordCount = 0 then sOut = "The source table (dbo.CustomSqlMon) appears to be empty." sSQL = "RAISERROR(64004, 8, 2, ''" & sOut & "'') WITH LOG, NOWAIT" oConn.Execute sSQL else do while not oRS.EOF iError = oRS("ErrorLevel").Value sSQL = "" sSvr = oRS("InstanceName").Value iRC = fSvcCheck(sSvr) select case iRC case -4 sOut = "ERROR: At least one SQL service failed to start for " & sSvr case -3 sOut = "ERROR: Empty service list returned for " & sSvr case -2 sOut = "ERROR: Could not execute WMI query against " & sSvr case -1 sOut = "ERROR: Could not make WMI connection to " & sSvr case 0 iError = 0 sOut = "All SQL services appear to be running on " & sSvr & ": " & cStr(iRC) sSQL = "RAISERROR(''" & sOut & "'', " & CStr(iError) & ", 3) WITH NOWAIT" case Else iError = 8 sOut = "There were " & CStr(iRC) & " SQL service(s) that were restarted on " & sSvr end select if sSQL = "" then sSQL = "RAISERROR(64004, " & CStr(iError) & ", 4, ''" & sOut & "'') WITH LOG, NOWAIT" end if oConn.Execute sSQL oRS.MoveNext loop end if end if set oRS = Nothing oConn.Close set oConn = Nothing Private Function fSvcCheck (fRemoteInst) On Error Resume Next If InStr(fRemoteInst, "\") <> 0 Then fSvr = Mid(fRemoteInst, 1, (InStr(fRemoteInst, "\") - 1)) fInst = Mid(fRemoteInst, (InStr(fRemoteInst, "\") + 1), (Len(fRemoteInst) - (InStr(fRemoteInst, "\")))) Else fSvr = fRemoteInst fInst = "default" End If Err.Clear fCount = 0 Set fWMI = GetObject("winmgmts:{impersonationLevel = impersonate}//" & fSvr & "/root/cimv2") If Err.Number <> 0 Then fSvcCheck = -1 Exit Function End If Err.Clear Set fSvcList = fWMI.ExecQuery("SELECT * FROM Win32_Service WHERE PathName LIKE ''%SQL%''") If Err.Number <> 0 Then fSvcCheck = -2 Exit Function End If Err.Clear For Each fSvc In fSvcList fCount = fCount + 1 Next If fCount < 2 Then fSvcCheck = -3 Exit Function Else fCount = 0 End If Err.Clear For Each fSvc In fSvcList fCode = 0 If ( (InStr(fSvc.Name, "$") <> 0 And InStr(fSvc.Name, fInst) <> 0) Or (fInst = "default" And InStr(fSvc.Name, "$") = 0) ) Then If fSvc.StartMode = "Auto" And fSvc.State <> "Running" Then fCount = fCount + 1 Select Case fSvc.State Case "Stopped" fCode = fSvc.StartService Case "Paused" fCode = fSvc.ResumeService End Select If fCode <> 0 Then fSvcCheck = -4 Exit Function End If End If End If Next Set fSvc = Nothing Set fSvcList = Nothing Set fWMI = Nothing Err.Clear fSvcCheck = fCount End Function' END ELSE BEGIN SELECT @jobDB = 'Powershell', @jobType = @jobDB, @cmd = '$dbName = "' + @db + '"; $ErrorActionPreference = "Stop"; $retCode = 0; $retMsg = ""; $sqlSvr = "' + @svr + '"; $monTbl = New-Object System.Data.DataTable; $monTbl.Columns.Add("InstanceName") | Out-Null; $monTbl.Columns.Add("ErrorLevel", [int]) | Out-Null; $sqlConn = New-Object System.Data.SqlClient.SqlConnection "Server = $sqlSvr;Database = $dbName;Integrated Security = SSPI"; $sqlCmd = $sqlConn.CreateCommand(); $sqlCmd.CommandText = @" SELECT InstanceName, ErrorLevel FROM dbo.CustomSqlMon WHERE [Active] = 1 AND CheckSvcs = 1 ORDER BY ErrorLevel DESC, InstanceName "@; try { $sqlConn.Open(); $sqlRdr = $sqlCmd.ExecuteReader(); } catch { $retCode = -1; $retMsg = "Error querying the dbo.CustomSqlMon table in the " + $dbName + " database on " + $sqlSvr; } while ($sqlRdr.Read() -and $retCode -eq 0) { $monRow = $monTbl.NewRow(); $monRow["InstanceName"] = $sqlRdr["InstanceName"]; $monRow["ErrorLevel"] = $sqlRdr["ErrorLevel"]; $monTbl.Rows.Add($monRow); } $sqlRdr.Close(); foreach ($monRow in $monTbl.Rows) { $svcList = ""; $svcCount = 0; $sqlInst = $monRow["InstanceName"]; $errLev = $monRow["ErrorLevel"]; if ($sqlInst.Contains("\")) { $l = $sqlInst.IndexOf("\"); $Svr = $sqlInst.Substring(0, $l); $i = ($sqlInst.IndexOf("\") + 1); $l = ($sqlInst.Length - $i); $Inst = "$" + $sqlInst.Substring($i, $l); } else { $Svr = $sqlInst; $Inst = "default"; } try { $svcCont = Get-WmiObject -Computer $Svr -Class Win32_Service -Filter "Name LIKE ''%SQL%''"; } catch { $retCode = -2; $retMsg = "Error executing WMI service query against " + $Svr; break; } foreach ($svc in $svcCont) { $svcName = $svc.Name; $alertCmd = $sqlConn.CreateCommand(); if (($Inst.Contains("$") -and $svcName.Contains($Inst)) -or ($Inst -eq "default" -and -not ($svcName.Contains("$")))) { if ($svc.StartMode -eq "Auto" -and -not ($svc.State -eq "Running")) { $svcCount = $svcCount + 1; if ($svcList -eq "") { $svcList = $svcName; } else { $svcList = $svcList + ", " + $svcName; } try { if ($svc.State -eq "Paused") { $svc.ResumeService() } else { $svc.StartService() } } catch { $retCode = -3; $retMsg = "The " + $svcName + " service on " + $Svr + " is set to AutoStart but is not currently running. Attempt to restart it failed."; break; } } } } if ($retCode -eq 0) { if ($svcCount -eq 0) { $retMsg = "All SQL-related services set to AutoStart for " + $sqlInst + " are currently running."; } else { $retMsg = "The following SQL-related services were successfully restarted for " + $sqlInst + ": " + $svcList; $SQL = "RAISERROR(''$retMsg'', 0, " + $errLev + ") WITH LOG, NOWAIT"; $alertCmd.CommandText = $SQL; $alertCmd.ExecuteNonQuery()| Out-Null; $alertCmd.Dispose(); } } } $sqlConn.Close();' END EXEC @returnCode = [msdb].dbo.sp_add_job @job_name = @jobName, @enabled = 0, @notify_level_eventlog = 2, @notify_level_email = 0, @notify_level_netsend = 0, @notify_level_page = 0, @delete_level = 0, @description = @jobDescr, @category_name = @jobCat, @owner_login_name = N'sa', @job_id = @jobId OUTPUT; IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback; EXEC @returnCode = [msdb].dbo.sp_add_jobstep @job_id = @jobId, @step_name = @jobStep, @step_id = 1, @cmdexec_success_code = 0, @on_success_action = 1, @on_success_step_id = 0, @on_fail_action = 2, @on_fail_step_id = 0, @retry_attempts = 0, @retry_interval = 0, @os_run_priority = 0, @subsystem = @jobType, @command = @cmd, @database_name = @jobDB, @flags = 0; IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback; EXEC @returnCode = [msdb].dbo.sp_update_job @job_id = @jobId, @start_step_id = 1; IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback; EXEC @returnCode = [msdb].dbo.sp_add_jobschedule @job_id = @jobId, @name = @jobStep, @enabled = 1, @freq_type = 4, @freq_interval = 1, @freq_subday_type = 4, @freq_subday_interval = 15, @freq_relative_interval = 0, @freq_recurrence_factor = 0, @active_start_date = 20000101, @active_end_date = 99991231, @active_start_time = 2800, @active_end_time = 235959; IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback; EXEC @returnCode = [msdb].dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'; IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback; SET @msg = ' Job [' + @jobName + '] created successfully.'; RAISERROR(@msg, 0, 0) WITH NOWAIT; IF @@TRANCOUNT > 0 COMMIT TRANSACTION; SET @msg = 'Creation of Remote Instance Monitoring Jobs on [' + @@SERVERNAME + '] completed at ' + CONVERT(VARCHAR, GETDATE()) + '.'; RAISERROR(@msg, 0, 0) WITH NOWAIT; RETURN; QuitWithRollback: BEGIN IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION; SET @msg = 'Creation of Remote Instance Monitoring Jobs on [' + @@SERVERNAME + '] failed at ' + CONVERT(VARCHAR, GETDATE()) + '.'; RAISERROR(@msg, 0, 0) WITH NOWAIT; END GO
Overview of objects created in Listing 2
The code in Listing 2 will create two SQL Server Agent Jobs to monitor the accessibility and state of services for remote SQL Server instances. While each job performs a different type of remote instance monitoring, both jobs have a few common characteristics. First, they both depend on at least one row of data in the dbo.CustomSqlMon table with the [Active] column set to a value of 1. Second, the step type and command are both determined by the SQL Server version of the instance on which the jobs are created. Starting with SQL Server 2008, SQL Server Agent job steps can be created using Powershell code. So, if the monitoring instance is running SQL Server 2008 or higher, Powershell will be used for the job step commands. On the other (and hopefully much less likely) hand, if the instance is running SQL Server version 2005 or earlier, the job steps will be created using VBScript under the ActiveScripting subsystem. Finally, both jobs make use of the SQL Server Agent service account to perform actions at either the SQL Server or Windows OS level. A detailed description of the metadata table and each SQL Server Agent job is listed below.
Metadata Table: dbo.CustomSqlMon
The dbo.CustomSqlMon table is created by the code in Listing 2. It is used by both of the SQL Server Agent jobs (also created by the code in Listing 2) to determine which remote instances will be monitored and to what degree each remote instance will be monitored. Below is a listing of each column in the dbo.CustomSqlMon table and its respective purpose.
Column | Datatype | Purpose |
---|---|---|
InstanceID | INT (Identity) | Primary Key (surrogate). |
InstanceName | VARCHAR(128) | Name of the remote instance to be monitored. |
Active | BIT | Flag indicating if the remote instance should be actively monitored (1) or not (0). |
CheckSvcs | BIT | Flag indicating if the SQL services related to the remote instance should be monitored (1) or not (0). |
ErrorLevel | TINYINT | Value used by the severity parameter of any RAISERROR commands executed by the SQL Server Agent jobs while monitoring the InstanceName. |
LastUpdate | DATETIME | Last time the row was updated. |
UpdatedBy | VARCHAR(128) | Name of the user that last updated the row. |
SQL Agent Job: Monitor - Remote SQL Connectivity
The Monitor - Remote SQL Connectivity job starts by querying dbo.CustomSqlMon table in the administrative database identified in the USE statement when the code in Listing 2 was executed. This query identifies the remote instances that should be monitored by the rows in the CustomSqlMon table with the [Active] column set to a value of 1. The job then attempts to connect to and execute a query against the remote instance. Therefore, the Windows Active Directory account used to start the SQL Agent service on the monitoring instance should at least have permissions to query the [master].sys.databases table on the remote instances it needs to monitor. If the connection or the query fails, the job will wait a default of 5 minutes (defined by the @waitSec variable in Listing 2) and will retry twice. The purpose of this retry functionality is to prevent false alarms due to a brief loss of communication between the monitoring instance and the monitored instance(s). If the connection still fails at this point, a RAISERROR command will be executed using the custom error message created in Listing 1 and subsequently, will fire the custom alert created in Listing 2.
SQL Agent Job: Monitor - Remote SQL Services
The Monitor - Remote SQL Services job also queries the dbo.CustomSqlMon table. It compiles a list of remote instances to be monitored by looking for rows in the CustomSqlMon table with the [Active] column set to a value of 1 and the [CheckSvcs] column set to a value of 1. The job then makes a WMI connection to the remote server and checks for any SQL services related to the monitored instance that are set to auto-start but are not currently running. If any services are found in this state, the job will attempt to start the service(s) on the remote server. In order to accomplish these tasks, the logon account for the SQL Agent service on the monitoring instance must have sufficient permissions to interrogate and start services on the remote host. The job will execute a RAISERROR command with custom error message 64004 if any error is encountered. Please note that this job has the potential to restart services that have been intentionally stopped for whatever reason. Manual steps must be taken to prevent this from happening, such as setting the monitored services to Manual start mode, temporarily setting the Active flag to 0 in the dbo.CustomSqlMon table on the monitoring instance, or disabling the job completely.
Conclusion
The code in this tip provides a way to automate the monitoring of remote SQL Server instances using tools and functionality native to SQL Server. As with most things in the DBA world, your mileage may vary. Be sure to test the jobs in your environment.
Next Steps
- Populate the dbo.CustomSqlMon table with the list of remote instances that should be monitored by the monitoring instance.
- Review the job schedule of each job and update according to your requirements.
- Enable the job(s) when you are ready for them to run unattended. The jobs are created as disabled by default.
- Consider additional customizations to enable different alert types, such as email or paging, for different remote instances being monitored. For example, an email could be sent if a development instance is inaccessible and a page could be sent for a production instance.
- Check out more tips related to monitoring SQL Server.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips