Problem
Have you ever had a SQL Server crash and not found out about it until you received a call from one of your end-users? As a DBA, one of my goals is to know about problems in my environment before anyone else does. Since not all DBAs have access to a commercial monitoring tool, the code in this tip uses native SQL Server functionality to monitor the availability and status of remote SQL Server instances.
Solution
Prerequisite Considerations
- If possible, the remote instance monitoring jobs should be created on a SQL Server instance that is not heavily taxed by a critical production workload.
- Be sure to configure database mail and SQL Agent notifications as a prerequisite for proper functionality of these monitoring jobs.
- This solution assumes that custom error message 64004 is not currently in use by the monitoring instance. If this is not the case, code changes will be necessary.
- The code has been tested against SQL Server 2012, but should function on version 2005 and greater.
Custom SQL Server Error Message
Listing 1 creates the custom error message used by the remote instance monitoring jobs. If error message 64004 is already in use by the instance, it will be overwritten.
-- Listing 1 USE [master] GO -- Create the custom error message EXEC dbo.sp_addmessage @msgnum = 64004, @msgtext = 'Remote Instance Monitor Alert - %s', @severity = 8, @with_log = 'FALSE', @replace = 'REPLACE'; GO
SQL Server Agent Operator, Alert, and Jobs
The code in Listing 2 targets the SQL Server Agent subsystem to verify and/or create the remaining components required to automate remote instance monitoring. Specifically, the code performs the following steps:
1. Create the metadata table in the administrative database that will contain the list of remote instances to be monitored.
2. Check for the existence of a SQL Server Agent Operator.
3. Create a custom SQL Server Agent alert for the custom error message created in Listing 1.
4. Create two SQL Server Agent Jobs responsible for remote instance monitoring. Each job is described in greater detail within the next section.
Before executing the script in Listing 2, be sure to edit the USE statement and @jobOper variable to reflect the dedicated administrative database and desired local SQL Server Agent Operator, respectively.
-- Listing 2
USE [MSSQLTips] -- Use dedicated administrative database where meta-data table should be created
GO
SET NOCOUNT ON;
/****************************************************************************************
*Author: Mike Eastland *
* *
*Purpose: The purpose of this script is to create SQL Agent jobs on the local *
* instance that will monitor connectivity to remote SQL Server instances. *
* *
*PreReqs: The code in the Remote Monitoring jobs depends on a table in the *
* administrative database identified by the USE statement at the beginning *
* of this script. This table (dbo.CustomSqlMon) contains a list of the remote *
* instances to be monitored by the local SQL Agent jobs. The table will be *
* created by this script. *
****************************************************************************************/
DECLARE @cmd NVARCHAR(MAX),
@db VARCHAR(128),
@jobAlert VARCHAR(128),
@jobCat VARCHAR(128),
@jobDB VARCHAR(128),
@jobDescr VARCHAR(256),
@jobId BINARY(16),
@jobName VARCHAR(128),
@jobOper VARCHAR(32),
@jobStep VARCHAR(256),
@jobType VARCHAR(128),
@msg VARCHAR(MAX),
@returnCode INT,
@svr VARCHAR(128),
@ver DECIMAL,
@waitSec VARCHAR(3);
-- Edit variable values below where applicable
SELECT @db = DB_NAME(),
@jobAlert = 'Remote Instance Monitor',
@jobCat = 'Instance Monitor',
@jobOper = 'sqlDBA', /* Name of preferred SQL Server Agent Operator on the local instance */
@svr = @@SERVERNAME,
@ver = CONVERT(FLOAT, SUBSTRING(@@VERSION, (CHARINDEX('-', @@VERSION) + 1), 5)),
@waitSec = '300'; /* Retry wait interval in seconds for remote connectivity monitoring job */
-- End variable initialization
SET @msg = 'Creation of Remote Instance Monitoring Jobs on [' + @@SERVERNAME + '] started at ' + CONVERT(VARCHAR, GETDATE()) + '.';
RAISERROR(@msg, 0, 0) WITH NOWAIT;
BEGIN TRANSACTION
SET @msg = ' Backup and/or create metadata table to hold list of remote instances to be monitored.';
RAISERROR(@msg, 0, 0) WITH NOWAIT;
IF OBJECT_ID('dbo.CustomSqlMon', 'U') IS NOT NULL
BEGIN
IF EXISTS (SELECT * FROM dbo.CustomSqlMon)
BEGIN
SET @cmd = 'SELECT * INTO dbo.CustomSqlMon_' + CONVERT(VARCHAR, GETDATE(), 112) + REPLACE(CONVERT(VARCHAR, GETDATE(), 108), ':', '') + ' FROM [dbo].[CustomSqlMon]';
EXEC sp_ExecuteSQL @cmd;
END
DROP TABLE [dbo].[CustomSqlMon];
END
IF OBJECT_ID(N'[dbo].[CustomSqlMon]', 'U') IS NULL
BEGIN
CREATE TABLE [dbo].[CustomSqlMon] (
InstanceID INT IDENTITY(1,1) NOT NULL,
InstanceName VARCHAR(128) NOT NULL,
Active BIT NOT NULL,
CheckSvcs BIT NOT NULL,
ErrorLevel TINYINT NOT NULL,
LastUpdate DATETIME NOT NULL,
UpdatedBy VARCHAR(128) NOT NULL
);
ALTER TABLE [dbo].[CustomSqlMon] ADD CONSTRAINT [PK_CustomSqlMon] PRIMARY KEY CLUSTERED ([InstanceID]);
ALTER TABLE [dbo].[CustomSqlMon] ADD CONSTRAINT [UQ_CustomSqlMon_InstanceName] UNIQUE ([InstanceName]);
ALTER TABLE [dbo].[CustomSqlMon] ADD CONSTRAINT [DF_CustomSqlMon_Active] DEFAULT (1) FOR [Active];
ALTER TABLE [dbo].[CustomSqlMon] ADD CONSTRAINT [DF_CustomSqlMon_CheckSvcs] DEFAULT (0) FOR [CheckSvcs];
ALTER TABLE [dbo].[CustomSqlMon] ADD CONSTRAINT [DF_CustomSqlMon_LastUpdate] DEFAULT (GETDATE()) FOR [LastUpdate];
ALTER TABLE [dbo].[CustomSqlMon] ADD CONSTRAINT [DF_CustomSqlMon_UpdatedBy] DEFAULT (SUSER_SNAME()) FOR [UpdatedBy];
SET @cmd = 'CREATE TRIGGER [dbo].[utr_CustomSqlMon] ON [dbo].[CustomSqlMon]
AFTER UPDATE
AS
SET NOCOUNT ON
UPDATE l
SET l.LastUpdate = GETDATE(), l.UpdatedBy = SUSER_SNAME()
FROM [dbo].[CustomSqlMon] l
INNER JOIN INSERTED i ON l.InstanceID = i.InstanceID;'
EXEC sp_ExecuteSQL @cmd;
END
-- Check for existence of specified operator; use failsafe operator if it doesn't
IF NOT EXISTS (SELECT * FROM [msdb].dbo.sysoperators WHERE [name] = @jobOper)
BEGIN
SET @msg = 'Operator [' + @jobOper + '] not found; checking for failsafe operator.';
RAISERROR(@msg, 0, 0) WITH NOWAIT;
EXEC [master].dbo.xp_instance_regread N'HKEY_LOCAL_MACHINE',
N'Software\Microsoft\MSSQLServer\SQLServerAgent',
N'AlertFailSafeOperator', @jobOper OUTPUT;
END
IF @jobOper IS NULL
BEGIN
SET @msg = 'No failsafe operator found; Job [' + @jobName +
'] will not be created without notification functionality.';
RAISERROR(@msg, 8, 0) WITH LOG, NOWAIT;
GOTO QuitWithRollback;
END
-- Create alert associated with custom error message
IF EXISTS (SELECT * FROM [msdb].dbo.sysalerts WHERE [name] = @jobAlert)
EXEC [msdb].dbo.sp_delete_alert @name = @jobAlert;
EXEC [msdb].dbo.sp_add_alert @name = @jobAlert, @message_id = 64004, @severity = 0, @enabled = 1,
@delay_between_responses = 0, @include_event_description_in = 3;
EXEC [msdb].dbo.sp_add_notification @alert_name = @jobAlert, @operator_name = @jobOper,
@notification_method = 1;
-- Create job category if it doesn't already exist
IF NOT EXISTS (SELECT [name] FROM [msdb].dbo.syscategories WHERE [name] = @jobCat AND category_class = 1)
BEGIN
EXEC @returnCode = [msdb].dbo.sp_add_category @class = N'JOB', @type = N'LOCAL', @name = @jobCat;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;
END
-- Remote instance monitor job
SELECT @jobDescr = N'Check connectivity to remote SQL instances', @jobId = NULL,
@jobName = N'Monitor - Remote SQL Connectivity', @jobStep = @jobDescr;
IF EXISTS (SELECT * FROM [msdb].dbo.sysjobs WHERE [name] = @jobName)
BEGIN
EXEC @returnCode = [msdb].dbo.sp_delete_job @job_name = @jobName, @delete_unused_schedule = 1;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;
END
IF @ver < 10
BEGIN
SELECT @jobDB = 'VBScript', @jobType = 'ActiveScripting', @cmd = N'On Error Resume Next
set oConn = CreateObject("ADODB.Connection")
set oRS = CreateObject("ADODB.RecordSet")
sSvr = "' + @svr + '"
sDB = "' + @db + '"
iWait = ' + @waitSec + '
sConStr = "Provider=SQLOLEDB;Server=" & sSvr & ";Database=" & sDB & ";Integrated Security=SSPI"
oConn.Open sConStr
sSQL = "SELECT InstanceName, ErrorLevel FROM dbo.CustomSqlMon WHERE Active = 1 ORDER BY InstanceName"
oRS.CursorLocation = 3
oRS.Open sSQL, oConn
if Err.Number <> 0 then
sOut = "Error querying dbo.CustomSqlMon."
sSQL = "RAISERROR(64004, 8, 1, ''" & sOut & "'') WITH LOG, NOWAIT"
oConn.Execute sSQL
else
Err.Clear
if oRS.RecordCount = 0 then
sOut = "The source table (dbo.CustomSqlMon) appears to be empty."
sSQL = "RAISERROR(64004, 8, 2, ''" & sOut & "'') WITH LOG, NOWAIT"
oConn.Execute sSQL
else
do while not oRS.EOF
sSvr = oRS("InstanceName").Value
iError = oRS("ErrorLevel").Value
iRC = fRemoteConn(sSvr, iWait)
select case iRC
case -1
sOut = "ERROR: Unable to connect to " & sSvr
case -2
sOut = "ERROR: Unable to query master.sys.databases on " & sSvr
case else
sOut = "Successfully connected to " & sSvr & ": " & cStr(iRC)
iError = 0
end select
if iRC < 0 then
sSQL = "RAISERROR(64004, " & CStr(iError) & ", 3, ''" & sOut & "'') WITH LOG, NOWAIT"
oConn.Execute sSQL
end if
oRS.MoveNext
loop
end if
end if
set oRS = Nothing
oConn.Close
set oConn = Nothing
Private Function fRemoteConn (fRemoteInst, fWait)
On Error Resume Next
Set fConn = CreateObject("ADODB.Connection")
Set fRS = CreateObject("ADODB.RecordSet")
Err.Clear
fCount = 0
fMax = 2
fConStr = "Provider=SQLOLEDB;Server=" & fRemoteInst & ";Database=master;Integrated Security=SSPI"
fSQL = "SELECT COUNT(*) as dbCount FROM sys.databases"
Do While fCount < fMax
Err.Clear
fConn.Open fConStr
If Err.Number <> 0 Then
fCount = fCount + 1
fStart = Time()
fEnd = DateAdd("s", fWait, fStart)
If fCount = fMax Then
fRemoteConn = -1
Exit Function
set fRS = Nothing
set fConn = Nothing
Else
While fEnd >= Time()
Wend
End If
Else
Err.Clear
Exit Do
End If
Loop
fRS.Open fSQL, fConn
If Err.Number <> 0 Then
fRemoteConn = -2
Exit Function
set fRS = Nothing
set fConn = Nothing
Else
fRemoteConn = fRS("dbCount").Value
End If
Err.Clear
Set fRS = Nothing
fConn.Close
Set fConn = Nothing
End Function'
END
ELSE
BEGIN
SELECT @jobDB = 'PowerShell', @jobType = @jobDB, @cmd = '$dbName = "' + @db + '";
$ErrorActionPreference = "Stop";
$retMsg = "";
$sqlSvr = "' + @svr + '";
$waitSec = ' + @waitSec + ';
$monTbl = New-Object System.Data.DataTable;
$monTbl.Columns.Add("InstanceName") | Out-Null;
$monTbl.Columns.Add("ErrorLevel", [int]) | Out-Null;
$sqlConn = New-Object System.Data.SqlClient.SqlConnection "Server = $sqlSvr;Database = $dbName;Integrated Security = SSPI";
$sqlCmd = $sqlConn.CreateCommand();
$sqlCmd.CommandText = @"
SELECT InstanceName, ErrorLevel
FROM dbo.CustomSqlMon
WHERE [Active] = 1
ORDER BY ErrorLevel DESC, InstanceName
"@;
try
{
$sqlConn.Open();
$sqlRdr = $sqlCmd.ExecuteReader();
}
catch
{
$retCode = -1;
$retMsg = "Error querying the dbo.CustomSqlMon table in the " + $dbName + " database on " + $sqlSvr;
}
while ($sqlRdr.Read())
{
$monRow = $monTbl.NewRow();
$monRow["InstanceName"] = $sqlRdr["InstanceName"];
$monRow["ErrorLevel"] = $sqlRdr["ErrorLevel"];
$monTbl.Rows.Add($monRow);
}
$sqlRdr.Close();
foreach ($monRow in $monTbl.Rows)
{
$sqlInst = $monRow["InstanceName"];
$errLev = $monRow["ErrorLevel"];
$alertCmd = $sqlConn.CreateCommand();
$remoteConn = New-Object System.Data.SqlClient.SqlConnection "Server = $sqlInst;Database = master;Integrated Security = SSPI";
$remoteSQL = $remoteConn.CreateCommand();
$remoteSQL.CommandText = "SELECT COUNT(*) AS [dbCount] FROM sys.databases";
$retry = 0;
$max = 2;
$retCode = 0;
$remoteDbCount = 0;
while ($retry -le $max)
{
if ($retCode -eq 0 -or $retCode -eq -2)
{
try
{
$remoteConn.Open();
$retCode = 1;
}
catch
{
$retCode = -2;
$retMsg = "Error connecting to SQL Server instance " + $sqlInst;
$retry = $retry + 1;
}
}
if ($retCode -eq 1 -or $retCode -eq -3)
{
try
{
$remoteDbCount = $remoteSQL.ExecuteScalar();
$remoteSQL.Dispose();
$remoteConn.Close();
$remoteConn.Dispose();
$retCode = 0;
break;
}
catch
{
$retCode = -3;
$retMsg = "Error querying master database on " + $sqlInst;
$retry = $retry + 1;
}
}
Start-Sleep -Seconds $waitSec;
}
if ($retCode -eq 0 -and $remoteDbCount -gt 3)
{
$SQL = "RAISERROR(''$retMsg'', 0, 0) WITH NOWAIT";
$retMsg = "Successfully connected to " + $sqlInst + ".";
$alertCmd.CommandText = $SQL;
}
else
{
$SQL = "RAISERROR(64004, " + $errLev + ", 0, ''$retMsg'') WITH LOG, NOWAIT";
$alertCmd.CommandText = $SQL;
}
$alertCmd.ExecuteNonQuery() | Out-Null;
$alertCmd.Dispose();
}
$sqlConn.Close();'
END
EXEC @returnCode = [msdb].dbo.sp_add_job @job_name = @jobName, @enabled = 0, @notify_level_eventlog = 2, @notify_level_email = 0, @notify_level_netsend = 0, @notify_level_page = 0, @delete_level = 0, @description = @jobDescr, @category_name = @jobCat, @owner_login_name = N'sa', @job_id = @jobId OUTPUT;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;
EXEC @returnCode = [msdb].dbo.sp_add_jobstep @job_id = @jobId, @step_name = @jobStep, @step_id = 1, @cmdexec_success_code = 0, @on_success_action = 1, @on_success_step_id = 0, @on_fail_action = 2, @on_fail_step_id = 0, @retry_attempts = 0, @retry_interval = 0, @os_run_priority = 0, @subsystem = @jobType, @command = @cmd, @database_name = @jobDB, @flags = 0;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;
EXEC @returnCode = [msdb].dbo.sp_update_job @job_id = @jobId, @start_step_id = 1;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;
EXEC @returnCode = [msdb].dbo.sp_add_jobschedule @job_id = @jobId, @name = @jobStep, @enabled = 1, @freq_type = 4, @freq_interval = 1, @freq_subday_type = 4, @freq_subday_interval = 15, @freq_relative_interval = 0, @freq_recurrence_factor = 0, @active_start_date = 20000101, @active_end_date = 99991231, @active_start_time = 3200, @active_end_time = 235959;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;
EXEC @returnCode = [msdb].dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)';
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;
SET @msg = ' Job [' + @jobName + '] created successfully.';
RAISERROR(@msg, 0, 0) WITH NOWAIT;
-- Remote service monitor job
SELECT @jobDescr = N'Check status of services for remote SQL instances', @jobId = NULL, @jobName = N'Monitor - Remote SQL Services', @jobStep = @jobDescr;
IF EXISTS (SELECT * FROM [msdb].dbo.sysjobs WHERE [name] = @jobName)
BEGIN
EXEC [msdb].dbo.sp_delete_job @job_name = @jobName, @delete_unused_schedule = 1;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;
END
IF @ver < 10
BEGIN
SELECT @jobDB = 'VBScript', @jobType = 'ActiveScripting', @cmd = N'On Error Resume Next
set oConn = CreateObject("ADODB.Connection")
set oRS = CreateObject("ADODB.RecordSet")
sSvr = "' + @svr + '"
sDB = "' + @db + '"
sConStr = "Provider=SQLOLEDB;Server=" & sSvr & ";Database=" & sDB & ";Integrated Security=SSPI"
oConn.Open sConStr
sSQL = "SELECT InstanceName, ErrorLevel FROM dbo.CustomSqlMon WHERE Active = 1 AND CheckSvcs = 1"
oRS.CursorLocation = 3
oRS.Open sSQL, oConn
if Err.Number <> 0 then
sOut = "Error querying dbo.CustomSqlMon."
sSQL = "RAISERROR(64004, 8, 1, ''" & sOut & "'') WITH LOG, NOWAIT"
oConn.Execute sSQL
else
Err.Clear
if oRS.RecordCount = 0 then
sOut = "The source table (dbo.CustomSqlMon) appears to be empty."
sSQL = "RAISERROR(64004, 8, 2, ''" & sOut & "'') WITH LOG, NOWAIT"
oConn.Execute sSQL
else
do while not oRS.EOF
iError = oRS("ErrorLevel").Value
sSQL = ""
sSvr = oRS("InstanceName").Value
iRC = fSvcCheck(sSvr)
select case iRC
case -4
sOut = "ERROR: At least one SQL service failed to start for " & sSvr
case -3
sOut = "ERROR: Empty service list returned for " & sSvr
case -2
sOut = "ERROR: Could not execute WMI query against " & sSvr
case -1
sOut = "ERROR: Could not make WMI connection to " & sSvr
case 0
iError = 0
sOut = "All SQL services appear to be running on " & sSvr & ": " & cStr(iRC)
sSQL = "RAISERROR(''" & sOut & "'', " & CStr(iError) & ", 3) WITH NOWAIT"
case Else
iError = 8
sOut = "There were " & CStr(iRC) & " SQL service(s) that were restarted on " & sSvr
end select
if sSQL = "" then
sSQL = "RAISERROR(64004, " & CStr(iError) & ", 4, ''" & sOut & "'') WITH LOG, NOWAIT"
end if
oConn.Execute sSQL
oRS.MoveNext
loop
end if
end if
set oRS = Nothing
oConn.Close
set oConn = Nothing
Private Function fSvcCheck (fRemoteInst)
On Error Resume Next
If InStr(fRemoteInst, "\") <> 0 Then
fSvr = Mid(fRemoteInst, 1, (InStr(fRemoteInst, "\") - 1))
fInst = Mid(fRemoteInst, (InStr(fRemoteInst, "\") + 1), (Len(fRemoteInst) - (InStr(fRemoteInst, "\"))))
Else
fSvr = fRemoteInst
fInst = "default"
End If
Err.Clear
fCount = 0
Set fWMI = GetObject("winmgmts:{impersonationLevel = impersonate}//" & fSvr & "/root/cimv2")
If Err.Number <> 0 Then
fSvcCheck = -1
Exit Function
End If
Err.Clear
Set fSvcList = fWMI.ExecQuery("SELECT * FROM Win32_Service WHERE PathName LIKE ''%SQL%''")
If Err.Number <> 0 Then
fSvcCheck = -2
Exit Function
End If
Err.Clear
For Each fSvc In fSvcList
fCount = fCount + 1
Next
If fCount < 2 Then
fSvcCheck = -3
Exit Function
Else
fCount = 0
End If
Err.Clear
For Each fSvc In fSvcList
fCode = 0
If ( (InStr(fSvc.Name, "$") <> 0 And InStr(fSvc.Name, fInst) <> 0) Or (fInst = "default" And InStr(fSvc.Name, "$") = 0) ) Then
If fSvc.StartMode = "Auto" And fSvc.State <> "Running" Then
fCount = fCount + 1
Select Case fSvc.State
Case "Stopped" fCode = fSvc.StartService
Case "Paused" fCode = fSvc.ResumeService
End Select
If fCode <> 0 Then
fSvcCheck = -4
Exit Function
End If
End If
End If
Next
Set fSvc = Nothing
Set fSvcList = Nothing
Set fWMI = Nothing
Err.Clear
fSvcCheck = fCount
End Function'
END
ELSE
BEGIN
SELECT @jobDB = 'Powershell', @jobType = @jobDB, @cmd = '$dbName = "' + @db + '";
$ErrorActionPreference = "Stop";
$retCode = 0;
$retMsg = "";
$sqlSvr = "' + @svr + '";
$monTbl = New-Object System.Data.DataTable;
$monTbl.Columns.Add("InstanceName") | Out-Null;
$monTbl.Columns.Add("ErrorLevel", [int]) | Out-Null;
$sqlConn = New-Object System.Data.SqlClient.SqlConnection "Server = $sqlSvr;Database = $dbName;Integrated Security = SSPI";
$sqlCmd = $sqlConn.CreateCommand();
$sqlCmd.CommandText = @"
SELECT InstanceName, ErrorLevel
FROM dbo.CustomSqlMon
WHERE [Active] = 1
AND CheckSvcs = 1
ORDER BY ErrorLevel DESC, InstanceName
"@;
try
{
$sqlConn.Open();
$sqlRdr = $sqlCmd.ExecuteReader();
}
catch
{
$retCode = -1;
$retMsg = "Error querying the dbo.CustomSqlMon table in the " + $dbName + " database on " + $sqlSvr;
}
while ($sqlRdr.Read() -and $retCode -eq 0)
{
$monRow = $monTbl.NewRow();
$monRow["InstanceName"] = $sqlRdr["InstanceName"];
$monRow["ErrorLevel"] = $sqlRdr["ErrorLevel"];
$monTbl.Rows.Add($monRow);
}
$sqlRdr.Close();
foreach ($monRow in $monTbl.Rows)
{
$svcList = "";
$svcCount = 0;
$sqlInst = $monRow["InstanceName"];
$errLev = $monRow["ErrorLevel"];
if ($sqlInst.Contains("\"))
{
$l = $sqlInst.IndexOf("\");
$Svr = $sqlInst.Substring(0, $l);
$i = ($sqlInst.IndexOf("\") + 1);
$l = ($sqlInst.Length - $i);
$Inst = "$" + $sqlInst.Substring($i, $l);
}
else
{
$Svr = $sqlInst;
$Inst = "default";
}
try
{
$svcCont = Get-WmiObject -Computer $Svr -Class Win32_Service -Filter "Name LIKE ''%SQL%''";
}
catch
{
$retCode = -2;
$retMsg = "Error executing WMI service query against " + $Svr;
break;
}
foreach ($svc in $svcCont)
{
$svcName = $svc.Name;
$alertCmd = $sqlConn.CreateCommand();
if (($Inst.Contains("$") -and $svcName.Contains($Inst)) -or ($Inst -eq "default" -and -not ($svcName.Contains("$"))))
{
if ($svc.StartMode -eq "Auto" -and -not ($svc.State -eq "Running"))
{
$svcCount = $svcCount + 1;
if ($svcList -eq "")
{ $svcList = $svcName; }
else
{ $svcList = $svcList + ", " + $svcName; }
try
{
if ($svc.State -eq "Paused")
{ $svc.ResumeService() }
else
{ $svc.StartService() }
}
catch
{
$retCode = -3;
$retMsg = "The " + $svcName + " service on " + $Svr + " is set to AutoStart but is not currently running. Attempt to restart it failed.";
break;
}
}
}
}
if ($retCode -eq 0)
{
if ($svcCount -eq 0)
{
$retMsg = "All SQL-related services set to AutoStart for " + $sqlInst + " are currently running.";
}
else
{
$retMsg = "The following SQL-related services were successfully restarted for " + $sqlInst + ": " + $svcList;
$SQL = "RAISERROR(''$retMsg'', 0, " + $errLev + ") WITH LOG, NOWAIT";
$alertCmd.CommandText = $SQL;
$alertCmd.ExecuteNonQuery()| Out-Null;
$alertCmd.Dispose();
}
}
}
$sqlConn.Close();'
END
EXEC @returnCode = [msdb].dbo.sp_add_job @job_name = @jobName, @enabled = 0, @notify_level_eventlog = 2, @notify_level_email = 0, @notify_level_netsend = 0, @notify_level_page = 0, @delete_level = 0, @description = @jobDescr, @category_name = @jobCat, @owner_login_name = N'sa', @job_id = @jobId OUTPUT;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;
EXEC @returnCode = [msdb].dbo.sp_add_jobstep @job_id = @jobId, @step_name = @jobStep, @step_id = 1, @cmdexec_success_code = 0, @on_success_action = 1, @on_success_step_id = 0, @on_fail_action = 2, @on_fail_step_id = 0, @retry_attempts = 0, @retry_interval = 0, @os_run_priority = 0, @subsystem = @jobType, @command = @cmd, @database_name = @jobDB, @flags = 0;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;
EXEC @returnCode = [msdb].dbo.sp_update_job @job_id = @jobId, @start_step_id = 1;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;
EXEC @returnCode = [msdb].dbo.sp_add_jobschedule @job_id = @jobId, @name = @jobStep, @enabled = 1, @freq_type = 4, @freq_interval = 1, @freq_subday_type = 4, @freq_subday_interval = 15, @freq_relative_interval = 0, @freq_recurrence_factor = 0, @active_start_date = 20000101, @active_end_date = 99991231, @active_start_time = 2800, @active_end_time = 235959;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;
EXEC @returnCode = [msdb].dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)';
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;
SET @msg = ' Job [' + @jobName + '] created successfully.';
RAISERROR(@msg, 0, 0) WITH NOWAIT;
IF @@TRANCOUNT > 0
COMMIT TRANSACTION;
SET @msg = 'Creation of Remote Instance Monitoring Jobs on [' + @@SERVERNAME + '] completed at ' + CONVERT(VARCHAR, GETDATE()) + '.';
RAISERROR(@msg, 0, 0) WITH NOWAIT;
RETURN;
QuitWithRollback:
BEGIN
IF @@TRANCOUNT > 0
ROLLBACK TRANSACTION;
SET @msg = 'Creation of Remote Instance Monitoring Jobs on [' + @@SERVERNAME + '] failed at ' + CONVERT(VARCHAR, GETDATE()) + '.';
RAISERROR(@msg, 0, 0) WITH NOWAIT;
END
GOOverview of objects created in Listing 2
The code in Listing 2 will create two SQL Server Agent Jobs to monitor the accessibility and state of services for remote SQL Server instances. While each job performs a different type of remote instance monitoring, both jobs have a few common characteristics. First, they both depend on at least one row of data in the dbo.CustomSqlMon table with the [Active] column set to a value of 1. Second, the step type and command are both determined by the SQL Server version of the instance on which the jobs are created. Starting with SQL Server 2008, SQL Server Agent job steps can be created using Powershell code. So, if the monitoring instance is running SQL Server 2008 or higher, Powershell will be used for the job step commands. On the other (and hopefully much less likely) hand, if the instance is running SQL Server version 2005 or earlier, the job steps will be created using VBScript under the ActiveScripting subsystem. Finally, both jobs make use of the SQL Server Agent service account to perform actions at either the SQL Server or Windows OS level. A detailed description of the metadata table and each SQL Server Agent job is listed below.
Metadata Table: dbo.CustomSqlMon
The dbo.CustomSqlMon table is created by the code in Listing 2. It is used by both of the SQL Server Agent jobs (also created by the code in Listing 2) to determine which remote instances will be monitored and to what degree each remote instance will be monitored. Below is a listing of each column in the dbo.CustomSqlMon table and its respective purpose.
| Column | Datatype | Purpose |
|---|---|---|
| InstanceID | INT (Identity) | Primary Key (surrogate). |
| InstanceName | VARCHAR(128) | Name of the remote instance to be monitored. |
| Active | BIT | Flag indicating if the remote instance should be actively monitored (1) or not (0). |
| CheckSvcs | BIT | Flag indicating if the SQL services related to the remote instance should be monitored (1) or not (0). |
| ErrorLevel | TINYINT | Value used by the severity parameter of any RAISERROR commands executed by the SQL Server Agent jobs while monitoring the InstanceName. |
| LastUpdate | DATETIME | Last time the row was updated. |
| UpdatedBy | VARCHAR(128) | Name of the user that last updated the row. |
SQL Agent Job: Monitor – Remote SQL Connectivity
The Monitor – Remote SQL Connectivity job starts by querying dbo.CustomSqlMon table in the administrative database identified in the USE statement when the code in Listing 2 was executed. This query identifies the remote instances that should be monitored by the rows in the CustomSqlMon table with the [Active] column set to a value of 1. The job then attempts to connect to and execute a query against the remote instance. Therefore, the Windows Active Directory account used to start the SQL Agent service on the monitoring instance should at least have permissions to query the [master].sys.databases table on the remote instances it needs to monitor. If the connection or the query fails, the job will wait a default of 5 minutes (defined by the @waitSec variable in Listing 2) and will retry twice. The purpose of this retry functionality is to prevent false alarms due to a brief loss of communication between the monitoring instance and the monitored instance(s). If the connection still fails at this point, a RAISERROR command will be executed using the custom error message created in Listing 1 and subsequently, will fire the custom alert created in Listing 2.
SQL Agent Job: Monitor – Remote SQL Services
The Monitor – Remote SQL Services job also queries the dbo.CustomSqlMon table. It compiles a list of remote instances to be monitored by looking for rows in the CustomSqlMon table with the [Active] column set to a value of 1 and the [CheckSvcs] column set to a value of 1. The job then makes a WMI connection to the remote server and checks for any SQL services related to the monitored instance that are set to auto-start but are not currently running. If any services are found in this state, the job will attempt to start the service(s) on the remote server. In order to accomplish these tasks, the logon account for the SQL Agent service on the monitoring instance must have sufficient permissions to interrogate and start services on the remote host. The job will execute a RAISERROR command with custom error message 64004 if any error is encountered. Please note that this job has the potential to restart services that have been intentionally stopped for whatever reason. Manual steps must be taken to prevent this from happening, such as setting the monitored services to Manual start mode, temporarily setting the Active flag to 0 in the dbo.CustomSqlMon table on the monitoring instance, or disabling the job completely.
Conclusion
The code in this tip provides a way to automate the monitoring of remote SQL Server instances using tools and functionality native to SQL Server. As with most things in the DBA world, your mileage may vary. Be sure to test the jobs in your environment.
Next Steps
- Populate the dbo.CustomSqlMon table with the list of remote instances that should be monitored by the monitoring instance.
- Review the job schedule of each job and update according to your requirements.
- Enable the job(s) when you are ready for them to run unattended. The jobs are created as disabled by default.
- Consider additional customizations to enable different alert types, such as email or paging, for different remote instances being monitored. For example, an email could be sent if a development instance is inaccessible and a page could be sent for a production instance.
- Check out more tips related to monitoring SQL Server.

Mike has been a SQL Server DBA since 1999. His main areas of interest are monitoring and the automation of administrative tasks.


