Monitor Remote SQL Server Instance Availability

By:   |   Comments (3)   |   Related: > Monitoring


Problem

Have you ever had a SQL Server crash and not found out about it until you received a call from one of your end-users? As a DBA, one of my goals is to know about problems in my environment before anyone else does. Since not all DBAs have access to a commercial monitoring tool, the code in this tip uses native SQL Server functionality to monitor the availability and status of remote SQL Server instances.

Solution

Prerequisite Considerations

  • If possible, the remote instance monitoring jobs should be created on a SQL Server instance that is not heavily taxed by a critical production workload.
  • Be sure to configure database mail and SQL Agent notifications as a prerequisite for proper functionality of these monitoring jobs.
  • This solution assumes that custom error message 64004 is not currently in use by the monitoring instance. If this is not the case, code changes will be necessary.
  • The code has been tested against SQL Server 2012, but should function on version 2005 and greater.

Custom SQL Server Error Message

Listing 1 creates the custom error message used by the remote instance monitoring jobs. If error message 64004 is already in use by the instance, it will be overwritten.

-- Listing 1
USE [master]
GO
 
-- Create the custom error message
EXEC dbo.sp_addmessage @msgnum = 64004,
@msgtext = 'Remote Instance Monitor Alert - %s',
@severity = 8, @with_log = 'FALSE', @replace =
'REPLACE';
GO

SQL Server Agent Operator, Alert, and Jobs

The code in Listing 2 targets the SQL Server Agent subsystem to verify and/or create the remaining components required to automate remote instance monitoring. Specifically, the code performs the following steps:

1. Create the metadata table in the administrative database that will contain the list of remote instances to be monitored.
2. Check for the existence of a SQL Server Agent Operator.
3. Create a custom SQL Server Agent alert for the custom error message created in Listing 1.
4. Create two SQL Server Agent Jobs responsible for remote instance monitoring. Each job is described in greater detail within the next section.

Before executing the script in Listing 2, be sure to edit the USE statement and @jobOper variable to reflect the dedicated administrative database and desired local SQL Server Agent Operator, respectively.

-- Listing 2
USE [MSSQLTips] -- Use dedicated administrative database where meta-data table should be created
GO

SET NOCOUNT ON;

/****************************************************************************************
*Author: Mike Eastland *
* *
*Purpose: The purpose of this script is to create SQL Agent jobs on the local *
* instance that will monitor connectivity to remote SQL Server instances. *
* *
*PreReqs: The code in the Remote Monitoring jobs depends on a table in the *
* administrative database identified by the USE statement at the beginning *
* of this script. This table (dbo.CustomSqlMon) contains a list of the remote *
* instances to be monitored by the local SQL Agent jobs. The table will be *
* created by this script. *
****************************************************************************************/
DECLARE @cmd NVARCHAR(MAX),
@db VARCHAR(128),
@jobAlert VARCHAR(128),
@jobCat VARCHAR(128),
@jobDB VARCHAR(128),
@jobDescr VARCHAR(256),
@jobId BINARY(16),
@jobName VARCHAR(128),
@jobOper VARCHAR(32),
@jobStep VARCHAR(256),
@jobType VARCHAR(128),
@msg VARCHAR(MAX),
@returnCode INT,
@svr VARCHAR(128),
@ver DECIMAL,
@waitSec VARCHAR(3);

-- Edit variable values below where applicable
SELECT @db = DB_NAME(),
@jobAlert = 'Remote Instance Monitor',
@jobCat = 'Instance Monitor',
@jobOper = 'sqlDBA', /* Name of preferred SQL Server Agent Operator on the local instance */ 
@svr = @@SERVERNAME, 
@ver = CONVERT(FLOAT, SUBSTRING(@@VERSION, (CHARINDEX('-', @@VERSION) + 1), 5)),
@waitSec = '300'; /* Retry wait interval in seconds for remote connectivity monitoring job */
-- End variable initialization

SET @msg = 'Creation of Remote Instance Monitoring Jobs on [' + @@SERVERNAME + '] started at ' + CONVERT(VARCHAR, GETDATE()) + '.';
RAISERROR(@msg, 0, 0) WITH NOWAIT;

BEGIN TRANSACTION

SET @msg = ' Backup and/or create metadata table to hold list of remote instances to be monitored.';
RAISERROR(@msg, 0, 0) WITH NOWAIT;

IF OBJECT_ID('dbo.CustomSqlMon', 'U') IS NOT NULL
BEGIN
IF EXISTS (SELECT * FROM dbo.CustomSqlMon)
BEGIN
SET @cmd = 'SELECT * INTO dbo.CustomSqlMon_' + CONVERT(VARCHAR, GETDATE(), 112) + REPLACE(CONVERT(VARCHAR, GETDATE(), 108), ':', '') + ' FROM [dbo].[CustomSqlMon]';
EXEC sp_ExecuteSQL @cmd;
END

DROP TABLE [dbo].[CustomSqlMon]; 
END

IF OBJECT_ID(N'[dbo].[CustomSqlMon]', 'U') IS NULL
BEGIN
CREATE TABLE [dbo].[CustomSqlMon] (
InstanceID INT IDENTITY(1,1) NOT NULL,
InstanceName VARCHAR(128) NOT NULL,
Active BIT NOT NULL,
CheckSvcs BIT NOT NULL,
ErrorLevel TINYINT NOT NULL,
LastUpdate DATETIME NOT NULL,
UpdatedBy VARCHAR(128) NOT NULL
);

ALTER TABLE [dbo].[CustomSqlMon] ADD CONSTRAINT [PK_CustomSqlMon] PRIMARY KEY CLUSTERED ([InstanceID]);
ALTER TABLE [dbo].[CustomSqlMon] ADD CONSTRAINT [UQ_CustomSqlMon_InstanceName] UNIQUE ([InstanceName]);
ALTER TABLE [dbo].[CustomSqlMon] ADD CONSTRAINT [DF_CustomSqlMon_Active] DEFAULT (1) FOR [Active];
ALTER TABLE [dbo].[CustomSqlMon] ADD CONSTRAINT [DF_CustomSqlMon_CheckSvcs] DEFAULT (0) FOR [CheckSvcs];
ALTER TABLE [dbo].[CustomSqlMon] ADD CONSTRAINT [DF_CustomSqlMon_LastUpdate] DEFAULT (GETDATE()) FOR [LastUpdate];
ALTER TABLE [dbo].[CustomSqlMon] ADD CONSTRAINT [DF_CustomSqlMon_UpdatedBy] DEFAULT (SUSER_SNAME()) FOR [UpdatedBy];

SET @cmd = 'CREATE TRIGGER [dbo].[utr_CustomSqlMon] ON [dbo].[CustomSqlMon]
AFTER UPDATE
AS
SET NOCOUNT ON

UPDATE l
SET l.LastUpdate = GETDATE(), l.UpdatedBy = SUSER_SNAME()
FROM [dbo].[CustomSqlMon] l
INNER JOIN INSERTED i ON l.InstanceID = i.InstanceID;'
EXEC sp_ExecuteSQL @cmd;
END

-- Check for existence of specified operator; use failsafe operator if it doesn't
IF NOT EXISTS (SELECT * FROM [msdb].dbo.sysoperators WHERE [name] = @jobOper)
BEGIN
SET @msg = 'Operator [' + @jobOper + '] not found; checking for failsafe operator.';
RAISERROR(@msg, 0, 0) WITH NOWAIT;

EXEC [master].dbo.xp_instance_regread N'HKEY_LOCAL_MACHINE', 
N'Software\Microsoft\MSSQLServer\SQLServerAgent',
N'AlertFailSafeOperator', @jobOper OUTPUT;
END

IF @jobOper IS NULL
BEGIN
SET @msg = 'No failsafe operator found; Job [' + @jobName + 
'] will not be created without notification functionality.';
RAISERROR(@msg, 8, 0) WITH LOG, NOWAIT;
GOTO QuitWithRollback;
END

-- Create alert associated with custom error message
IF EXISTS (SELECT * FROM [msdb].dbo.sysalerts WHERE [name] = @jobAlert)
EXEC [msdb].dbo.sp_delete_alert @name = @jobAlert;

EXEC [msdb].dbo.sp_add_alert @name = @jobAlert, @message_id = 64004, @severity = 0, @enabled = 1,
@delay_between_responses = 0, @include_event_description_in = 3;

EXEC [msdb].dbo.sp_add_notification @alert_name = @jobAlert, @operator_name = @jobOper,
@notification_method = 1;

-- Create job category if it doesn't already exist
IF NOT EXISTS (SELECT [name] FROM [msdb].dbo.syscategories WHERE [name] = @jobCat AND category_class = 1)
BEGIN
EXEC @returnCode = [msdb].dbo.sp_add_category @class = N'JOB', @type = N'LOCAL', @name = @jobCat;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;
END

-- Remote instance monitor job
SELECT @jobDescr = N'Check connectivity to remote SQL instances', @jobId = NULL, 
@jobName = N'Monitor - Remote SQL Connectivity', @jobStep = @jobDescr;

IF EXISTS (SELECT * FROM [msdb].dbo.sysjobs WHERE [name] = @jobName)
BEGIN
EXEC @returnCode = [msdb].dbo.sp_delete_job @job_name = @jobName, @delete_unused_schedule = 1;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;
END

IF @ver < 10
BEGIN
SELECT @jobDB = 'VBScript', @jobType = 'ActiveScripting', @cmd = N'On Error Resume Next

set oConn = CreateObject("ADODB.Connection")
set oRS = CreateObject("ADODB.RecordSet")

sSvr = "' + @svr + '"
sDB = "' + @db + '"
iWait = ' + @waitSec + '

sConStr = "Provider=SQLOLEDB;Server=" & sSvr & ";Database=" & sDB & ";Integrated Security=SSPI"
oConn.Open sConStr

sSQL = "SELECT InstanceName, ErrorLevel FROM dbo.CustomSqlMon WHERE Active = 1 ORDER BY InstanceName"
oRS.CursorLocation = 3
oRS.Open sSQL, oConn

if Err.Number <> 0 then
sOut = "Error querying dbo.CustomSqlMon."

sSQL = "RAISERROR(64004, 8, 1, ''" & sOut & "'') WITH LOG, NOWAIT"
oConn.Execute sSQL
else
Err.Clear
if oRS.RecordCount = 0 then
sOut = "The source table (dbo.CustomSqlMon) appears to be empty."

sSQL = "RAISERROR(64004, 8, 2, ''" & sOut & "'') WITH LOG, NOWAIT"
oConn.Execute sSQL
else
do while not oRS.EOF 
sSvr = oRS("InstanceName").Value
iError = oRS("ErrorLevel").Value

iRC = fRemoteConn(sSvr, iWait)

select case iRC
case -1
sOut = "ERROR: Unable to connect to " & sSvr
case -2
sOut = "ERROR: Unable to query master.sys.databases on " & sSvr
case else
sOut = "Successfully connected to " & sSvr & ": " & cStr(iRC)
iError = 0
end select

if iRC < 0 then
sSQL = "RAISERROR(64004, " & CStr(iError) & ", 3, ''" & sOut & "'') WITH LOG, NOWAIT"
oConn.Execute sSQL
end if

oRS.MoveNext
loop 
end if
end if

set oRS = Nothing
oConn.Close
set oConn = Nothing


Private Function fRemoteConn (fRemoteInst, fWait)

On Error Resume Next

Set fConn = CreateObject("ADODB.Connection")
Set fRS = CreateObject("ADODB.RecordSet")
Err.Clear

fCount = 0
fMax = 2

fConStr = "Provider=SQLOLEDB;Server=" & fRemoteInst & ";Database=master;Integrated Security=SSPI"
fSQL = "SELECT COUNT(*) as dbCount FROM sys.databases"

Do While fCount < fMax
Err.Clear
fConn.Open fConStr
If Err.Number <> 0 Then
fCount = fCount + 1
fStart = Time()
fEnd = DateAdd("s", fWait, fStart)

If fCount = fMax Then
fRemoteConn = -1
Exit Function
set fRS = Nothing
set fConn = Nothing
Else
While fEnd >= Time()
Wend
End If
Else
Err.Clear
Exit Do
End If
Loop

fRS.Open fSQL, fConn
If Err.Number <> 0 Then
fRemoteConn = -2
Exit Function
set fRS = Nothing
set fConn = Nothing
Else
fRemoteConn = fRS("dbCount").Value
End If

Err.Clear
Set fRS = Nothing
fConn.Close
Set fConn = Nothing

End Function'
END
ELSE
BEGIN
SELECT @jobDB = 'PowerShell', @jobType = @jobDB, @cmd = '$dbName = "' + @db + '";
$ErrorActionPreference = "Stop";
$retMsg = "";
$sqlSvr = "' + @svr + '";
$waitSec = ' + @waitSec + ';

$monTbl = New-Object System.Data.DataTable;
$monTbl.Columns.Add("InstanceName") | Out-Null;
$monTbl.Columns.Add("ErrorLevel", [int]) | Out-Null;

$sqlConn = New-Object System.Data.SqlClient.SqlConnection "Server = $sqlSvr;Database = $dbName;Integrated Security = SSPI";

$sqlCmd = $sqlConn.CreateCommand();

$sqlCmd.CommandText = @"
SELECT InstanceName, ErrorLevel
FROM dbo.CustomSqlMon
WHERE [Active] = 1
ORDER BY ErrorLevel DESC, InstanceName
"@;

try
{
$sqlConn.Open();
$sqlRdr = $sqlCmd.ExecuteReader();
}
catch
{
$retCode = -1;
$retMsg = "Error querying the dbo.CustomSqlMon table in the " + $dbName + " database on " + $sqlSvr;
}

while ($sqlRdr.Read())
{
$monRow = $monTbl.NewRow();

$monRow["InstanceName"] = $sqlRdr["InstanceName"];
$monRow["ErrorLevel"] = $sqlRdr["ErrorLevel"];

$monTbl.Rows.Add($monRow);
}

$sqlRdr.Close();

foreach ($monRow in $monTbl.Rows)
{
$sqlInst = $monRow["InstanceName"];
$errLev = $monRow["ErrorLevel"];

$alertCmd = $sqlConn.CreateCommand();

$remoteConn = New-Object System.Data.SqlClient.SqlConnection "Server = $sqlInst;Database = master;Integrated Security = SSPI";
$remoteSQL = $remoteConn.CreateCommand();
$remoteSQL.CommandText = "SELECT COUNT(*) AS [dbCount] FROM sys.databases";

$retry = 0;
$max = 2;

$retCode = 0;
$remoteDbCount = 0;

while ($retry -le $max)
{ 
if ($retCode -eq 0 -or $retCode -eq -2)
{
try
{
$remoteConn.Open();
$retCode = 1;
}
catch
{
$retCode = -2;
$retMsg = "Error connecting to SQL Server instance " + $sqlInst;
$retry = $retry + 1;
}
}

if ($retCode -eq 1 -or $retCode -eq -3)
{
try
{
$remoteDbCount = $remoteSQL.ExecuteScalar();
$remoteSQL.Dispose();
$remoteConn.Close();
$remoteConn.Dispose();
$retCode = 0;
break;
}
catch
{
$retCode = -3;
$retMsg = "Error querying master database on " + $sqlInst;
$retry = $retry + 1;
}
}

Start-Sleep -Seconds $waitSec;
}

if ($retCode -eq 0 -and $remoteDbCount -gt 3)
{
$SQL = "RAISERROR(''$retMsg'', 0, 0) WITH NOWAIT";
$retMsg = "Successfully connected to " + $sqlInst + ".";
$alertCmd.CommandText = $SQL;
}
else
{
$SQL = "RAISERROR(64004, " + $errLev + ", 0, ''$retMsg'') WITH LOG, NOWAIT";
$alertCmd.CommandText = $SQL;
}

$alertCmd.ExecuteNonQuery() | Out-Null;

$alertCmd.Dispose();
}

$sqlConn.Close();'
END

EXEC @returnCode = [msdb].dbo.sp_add_job @job_name = @jobName, @enabled = 0, @notify_level_eventlog = 2, @notify_level_email = 0, @notify_level_netsend = 0, @notify_level_page = 0, @delete_level = 0, @description = @jobDescr, @category_name = @jobCat, @owner_login_name = N'sa', @job_id = @jobId OUTPUT;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;

EXEC @returnCode = [msdb].dbo.sp_add_jobstep @job_id = @jobId, @step_name = @jobStep, @step_id = 1, @cmdexec_success_code = 0, @on_success_action = 1, @on_success_step_id = 0, @on_fail_action = 2, @on_fail_step_id = 0, @retry_attempts = 0, @retry_interval = 0, @os_run_priority = 0, @subsystem = @jobType, @command = @cmd, @database_name = @jobDB, @flags = 0;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;

EXEC @returnCode = [msdb].dbo.sp_update_job @job_id = @jobId, @start_step_id = 1;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;

EXEC @returnCode = [msdb].dbo.sp_add_jobschedule @job_id = @jobId, @name = @jobStep, @enabled = 1, @freq_type = 4, @freq_interval = 1, @freq_subday_type = 4, @freq_subday_interval = 15, @freq_relative_interval = 0, @freq_recurrence_factor = 0, @active_start_date = 20000101, @active_end_date = 99991231, @active_start_time = 3200, @active_end_time = 235959;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;

EXEC @returnCode = [msdb].dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)';
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;

SET @msg = ' Job [' + @jobName + '] created successfully.';
RAISERROR(@msg, 0, 0) WITH NOWAIT;

-- Remote service monitor job
SELECT @jobDescr = N'Check status of services for remote SQL instances', @jobId = NULL, @jobName = N'Monitor - Remote SQL Services', @jobStep = @jobDescr;

IF EXISTS (SELECT * FROM [msdb].dbo.sysjobs WHERE [name] = @jobName)
BEGIN
EXEC [msdb].dbo.sp_delete_job @job_name = @jobName, @delete_unused_schedule = 1;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;
END

IF @ver < 10
BEGIN
SELECT @jobDB = 'VBScript', @jobType = 'ActiveScripting', @cmd = N'On Error Resume Next

set oConn = CreateObject("ADODB.Connection")
set oRS = CreateObject("ADODB.RecordSet")

sSvr = "' + @svr + '"
sDB = "' + @db + '"

sConStr = "Provider=SQLOLEDB;Server=" & sSvr & ";Database=" & sDB & ";Integrated Security=SSPI"
oConn.Open sConStr

sSQL = "SELECT InstanceName, ErrorLevel FROM dbo.CustomSqlMon WHERE Active = 1 AND CheckSvcs = 1"
oRS.CursorLocation = 3
oRS.Open sSQL, oConn

if Err.Number <> 0 then
sOut = "Error querying dbo.CustomSqlMon."

sSQL = "RAISERROR(64004, 8, 1, ''" & sOut & "'') WITH LOG, NOWAIT"
oConn.Execute sSQL
else
Err.Clear
if oRS.RecordCount = 0 then
sOut = "The source table (dbo.CustomSqlMon) appears to be empty."

sSQL = "RAISERROR(64004, 8, 2, ''" & sOut & "'') WITH LOG, NOWAIT"
oConn.Execute sSQL
else
do while not oRS.EOF
iError = oRS("ErrorLevel").Value
sSQL = ""
sSvr = oRS("InstanceName").Value

iRC = fSvcCheck(sSvr)

select case iRC
case -4
sOut = "ERROR: At least one SQL service failed to start for " & sSvr
case -3
sOut = "ERROR: Empty service list returned for " & sSvr
case -2
sOut = "ERROR: Could not execute WMI query against " & sSvr
case -1
sOut = "ERROR: Could not make WMI connection to " & sSvr
case 0
iError = 0
sOut = "All SQL services appear to be running on " & sSvr & ": " & cStr(iRC)
sSQL = "RAISERROR(''" & sOut & "'', " & CStr(iError) & ", 3) WITH NOWAIT"
case Else
iError = 8
sOut = "There were " & CStr(iRC) & " SQL service(s) that were restarted on " & sSvr
end select

if sSQL = "" then
sSQL = "RAISERROR(64004, " & CStr(iError) & ", 4, ''" & sOut & "'') WITH LOG, NOWAIT"
end if

oConn.Execute sSQL

oRS.MoveNext
loop 
end if
end if

set oRS = Nothing
oConn.Close
set oConn = Nothing


Private Function fSvcCheck (fRemoteInst)

On Error Resume Next

If InStr(fRemoteInst, "\") <> 0 Then
fSvr = Mid(fRemoteInst, 1, (InStr(fRemoteInst, "\") - 1))
fInst = Mid(fRemoteInst, (InStr(fRemoteInst, "\") + 1), (Len(fRemoteInst) - (InStr(fRemoteInst, "\"))))
Else
fSvr = fRemoteInst
fInst = "default"
End If

Err.Clear
fCount = 0
Set fWMI = GetObject("winmgmts:{impersonationLevel = impersonate}//" & fSvr & "/root/cimv2")
If Err.Number <> 0 Then
fSvcCheck = -1
Exit Function
End If

Err.Clear
Set fSvcList = fWMI.ExecQuery("SELECT * FROM Win32_Service WHERE PathName LIKE ''%SQL%''")
If Err.Number <> 0 Then
fSvcCheck = -2
Exit Function
End If

Err.Clear
For Each fSvc In fSvcList
fCount = fCount + 1
Next

If fCount < 2 Then
fSvcCheck = -3
Exit Function
Else
fCount = 0
End If

Err.Clear
For Each fSvc In fSvcList
fCode = 0

If ( (InStr(fSvc.Name, "$") <> 0 And InStr(fSvc.Name, fInst) <> 0) Or (fInst = "default" And InStr(fSvc.Name, "$") = 0) ) Then

If fSvc.StartMode = "Auto" And fSvc.State <> "Running" Then
fCount = fCount + 1

Select Case fSvc.State
Case "Stopped" fCode = fSvc.StartService
Case "Paused" fCode = fSvc.ResumeService
End Select

If fCode <> 0 Then
fSvcCheck = -4
Exit Function
End If
End If
End If
Next

Set fSvc = Nothing
Set fSvcList = Nothing
Set fWMI = Nothing
Err.Clear
fSvcCheck = fCount

End Function'
END
ELSE
BEGIN
SELECT @jobDB = 'Powershell', @jobType = @jobDB, @cmd = '$dbName = "' + @db + '";
$ErrorActionPreference = "Stop";
$retCode = 0;
$retMsg = "";
$sqlSvr = "' + @svr + '";

$monTbl = New-Object System.Data.DataTable;
$monTbl.Columns.Add("InstanceName") | Out-Null;
$monTbl.Columns.Add("ErrorLevel", [int]) | Out-Null;

$sqlConn = New-Object System.Data.SqlClient.SqlConnection "Server = $sqlSvr;Database = $dbName;Integrated Security = SSPI";

$sqlCmd = $sqlConn.CreateCommand();

$sqlCmd.CommandText = @"
SELECT InstanceName, ErrorLevel
FROM dbo.CustomSqlMon
WHERE [Active] = 1
AND CheckSvcs = 1
ORDER BY ErrorLevel DESC, InstanceName
"@;

try
{
$sqlConn.Open();
$sqlRdr = $sqlCmd.ExecuteReader();
}
catch
{
$retCode = -1;
$retMsg = "Error querying the dbo.CustomSqlMon table in the " + $dbName + " database on " + $sqlSvr;
}

while ($sqlRdr.Read() -and $retCode -eq 0)
{
$monRow = $monTbl.NewRow();

$monRow["InstanceName"] = $sqlRdr["InstanceName"];
$monRow["ErrorLevel"] = $sqlRdr["ErrorLevel"];

$monTbl.Rows.Add($monRow);
}

$sqlRdr.Close();

foreach ($monRow in $monTbl.Rows)
{
$svcList = "";
$svcCount = 0;
$sqlInst = $monRow["InstanceName"];
$errLev = $monRow["ErrorLevel"];

if ($sqlInst.Contains("\"))
{
$l = $sqlInst.IndexOf("\");
$Svr = $sqlInst.Substring(0, $l);

$i = ($sqlInst.IndexOf("\") + 1);
$l = ($sqlInst.Length - $i);
$Inst = "$" + $sqlInst.Substring($i, $l);
}
else
{
$Svr = $sqlInst;
$Inst = "default";
}

try
{
$svcCont = Get-WmiObject -Computer $Svr -Class Win32_Service -Filter "Name LIKE ''%SQL%''";
}
catch
{
$retCode = -2;
$retMsg = "Error executing WMI service query against " + $Svr;
break;
}

foreach ($svc in $svcCont)
{
$svcName = $svc.Name;

$alertCmd = $sqlConn.CreateCommand();

if (($Inst.Contains("$") -and $svcName.Contains($Inst)) -or ($Inst -eq "default" -and -not ($svcName.Contains("$"))))
{
if ($svc.StartMode -eq "Auto" -and -not ($svc.State -eq "Running"))
{
$svcCount = $svcCount + 1;

if ($svcList -eq "")
{ $svcList = $svcName; }
else
{ $svcList = $svcList + ", " + $svcName; }

try
{
if ($svc.State -eq "Paused")
{ $svc.ResumeService() }
else
{ $svc.StartService() }
}
catch
{
$retCode = -3;
$retMsg = "The " + $svcName + " service on " + $Svr + " is set to AutoStart but is not currently running. Attempt to restart it failed.";
break;
}
}
}
}

if ($retCode -eq 0)
{
if ($svcCount -eq 0)
{
$retMsg = "All SQL-related services set to AutoStart for " + $sqlInst + " are currently running."; 
}
else
{
$retMsg = "The following SQL-related services were successfully restarted for " + $sqlInst + ": " + $svcList;

$SQL = "RAISERROR(''$retMsg'', 0, " + $errLev + ") WITH LOG, NOWAIT";
$alertCmd.CommandText = $SQL;

$alertCmd.ExecuteNonQuery()| Out-Null;

$alertCmd.Dispose();
}
}
}

$sqlConn.Close();'
END

EXEC @returnCode = [msdb].dbo.sp_add_job @job_name = @jobName, @enabled = 0, @notify_level_eventlog = 2, @notify_level_email = 0, @notify_level_netsend = 0, @notify_level_page = 0, @delete_level = 0, @description = @jobDescr, @category_name = @jobCat, @owner_login_name = N'sa', @job_id = @jobId OUTPUT;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;

EXEC @returnCode = [msdb].dbo.sp_add_jobstep @job_id = @jobId, @step_name = @jobStep, @step_id = 1, @cmdexec_success_code = 0, @on_success_action = 1, @on_success_step_id = 0, @on_fail_action = 2, @on_fail_step_id = 0, @retry_attempts = 0, @retry_interval = 0, @os_run_priority = 0, @subsystem = @jobType, @command = @cmd, @database_name = @jobDB, @flags = 0;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;

EXEC @returnCode = [msdb].dbo.sp_update_job @job_id = @jobId, @start_step_id = 1;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;

EXEC @returnCode = [msdb].dbo.sp_add_jobschedule @job_id = @jobId, @name = @jobStep, @enabled = 1, @freq_type = 4, @freq_interval = 1, @freq_subday_type = 4, @freq_subday_interval = 15, @freq_relative_interval = 0, @freq_recurrence_factor = 0, @active_start_date = 20000101, @active_end_date = 99991231, @active_start_time = 2800, @active_end_time = 235959;
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;

EXEC @returnCode = [msdb].dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)';
IF (@@ERROR <> 0 OR @returnCode <> 0) GOTO QuitWithRollback;

SET @msg = ' Job [' + @jobName + '] created successfully.';
RAISERROR(@msg, 0, 0) WITH NOWAIT;

IF @@TRANCOUNT > 0
COMMIT TRANSACTION;

SET @msg = 'Creation of Remote Instance Monitoring Jobs on [' + @@SERVERNAME + '] completed at ' + CONVERT(VARCHAR, GETDATE()) + '.';
RAISERROR(@msg, 0, 0) WITH NOWAIT;

RETURN;

QuitWithRollback:
BEGIN
IF @@TRANCOUNT > 0
ROLLBACK TRANSACTION;

SET @msg = 'Creation of Remote Instance Monitoring Jobs on [' + @@SERVERNAME + '] failed at ' + CONVERT(VARCHAR, GETDATE()) + '.';
RAISERROR(@msg, 0, 0) WITH NOWAIT;
END

GO

Overview of objects created in Listing 2

The code in Listing 2 will create two SQL Server Agent Jobs to monitor the accessibility and state of services for remote SQL Server instances. While each job performs a different type of remote instance monitoring, both jobs have a few common characteristics. First, they both depend on at least one row of data in the dbo.CustomSqlMon table with the [Active] column set to a value of 1. Second, the step type and command are both determined by the SQL Server version of the instance on which the jobs are created. Starting with SQL Server 2008, SQL Server Agent job steps can be created using Powershell code. So, if the monitoring instance is running SQL Server 2008 or higher, Powershell will be used for the job step commands. On the other (and hopefully much less likely) hand, if the instance is running SQL Server version 2005 or earlier, the job steps will be created using VBScript under the ActiveScripting subsystem. Finally, both jobs make use of the SQL Server Agent service account to perform actions at either the SQL Server or Windows OS level. A detailed description of the metadata table and each SQL Server Agent job is listed below.

Metadata Table: dbo.CustomSqlMon

The dbo.CustomSqlMon table is created by the code in Listing 2. It is used by both of the SQL Server Agent jobs (also created by the code in Listing 2) to determine which remote instances will be monitored and to what degree each remote instance will be monitored. Below is a listing of each column in the dbo.CustomSqlMon table and its respective purpose.

Column Datatype Purpose
InstanceID INT (Identity) Primary Key (surrogate).
InstanceName VARCHAR(128) Name of the remote instance to be monitored.
Active BIT Flag indicating if the remote instance should be actively monitored (1) or not (0).
CheckSvcs BIT Flag indicating if the SQL services related to the remote instance should be monitored (1) or not (0).
ErrorLevel TINYINT Value used by the severity parameter of any RAISERROR commands executed by the SQL Server Agent jobs while monitoring the InstanceName.
LastUpdate DATETIME Last time the row was updated.
UpdatedBy VARCHAR(128) Name of the user that last updated the row.

SQL Agent Job: Monitor - Remote SQL Connectivity

The Monitor - Remote SQL Connectivity job starts by querying dbo.CustomSqlMon table in the administrative database identified in the USE statement when the code in Listing 2 was executed. This query identifies the remote instances that should be monitored by the rows in the CustomSqlMon table with the [Active] column set to a value of 1. The job then attempts to connect to and execute a query against the remote instance. Therefore, the Windows Active Directory account used to start the SQL Agent service on the monitoring instance should at least have permissions to query the [master].sys.databases table on the remote instances it needs to monitor. If the connection or the query fails, the job will wait a default of 5 minutes (defined by the @waitSec variable in Listing 2) and will retry twice. The purpose of this retry functionality is to prevent false alarms due to a brief loss of communication between the monitoring instance and the monitored instance(s). If the connection still fails at this point, a RAISERROR command will be executed using the custom error message created in Listing 1 and subsequently, will fire the custom alert created in Listing 2.

SQL Agent Job: Monitor - Remote SQL Services

The Monitor - Remote SQL Services job also queries the dbo.CustomSqlMon table. It compiles a list of remote instances to be monitored by looking for rows in the CustomSqlMon table with the [Active] column set to a value of 1 and the [CheckSvcs] column set to a value of 1. The job then makes a WMI connection to the remote server and checks for any SQL services related to the monitored instance that are set to auto-start but are not currently running. If any services are found in this state, the job will attempt to start the service(s) on the remote server. In order to accomplish these tasks, the logon account for the SQL Agent service on the monitoring instance must have sufficient permissions to interrogate and start services on the remote host. The job will execute a RAISERROR command with custom error message 64004 if any error is encountered. Please note that this job has the potential to restart services that have been intentionally stopped for whatever reason. Manual steps must be taken to prevent this from happening, such as setting the monitored services to Manual start mode, temporarily setting the Active flag to 0 in the dbo.CustomSqlMon table on the monitoring instance, or disabling the job completely.

Conclusion

The code in this tip provides a way to automate the monitoring of remote SQL Server instances using tools and functionality native to SQL Server. As with most things in the DBA world, your mileage may vary. Be sure to test the jobs in your environment.

Next Steps
  • Populate the dbo.CustomSqlMon table with the list of remote instances that should be monitored by the monitoring instance.
  • Review the job schedule of each job and update according to your requirements.
  • Enable the job(s) when you are ready for them to run unattended. The jobs are created as disabled by default.
  • Consider additional customizations to enable different alert types, such as email or paging, for different remote instances being monitored. For example, an email could be sent if a development instance is inaccessible and a page could be sent for a production instance.
  • Check out more tips related to monitoring SQL Server.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Mike Eastland Mike Eastland has been a SQL Server DBA since 1999. His main areas of interest are monitoring and the automation of administrative tasks.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Wednesday, May 20, 2015 - 3:55:41 PM - Edward Pochinski Back To Top (37238)

Nice article Mike, check this out this is how I use to do this.

Create table in database called perfmon, call vb script from a job step1, then call stored procedure from step2

Create Table:

CREATE TABLE [dbo].[Services_Table] (
    [Name] [varchar](55) NULL ,
    [ExitCode] [int] NULL ,
    [Started] [varchar](25) NULL ,
    [State] [varchar](25) NULL ,
    [Status] [varchar](25) NULL ,
    [SystemName] [varchar](55) NULL,
    [Time_Occurred] [smalldatetime]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Services_Table] ADD 
    CONSTRAINT [DF_Services_Table_Occurred] DEFAULT (LEFT(GETDATE(),20)) FOR [Time_Occurred]

vb script wrapper for WMI:

'Edward J Pochinski III 2006 I made many mods to this code after it came from MS
'Change the file extension to .vbs to call
'MS used a DSN and I did not even test that but used a DNS_LESS/OLEDB connection.
'This means this is a commented line of code FYI....
'Create ActiveX connection & objects

Set objConn = CreateObject("ADODB.Connection")
Set objRS = CreateObject("ADODB.Recordset")
objConn.Open  "PROVIDER=SQLOLEDB;DATA SOURCE=sqlshark;UID=sa;PWD=sqldba;DATABASE=perfMon "
 
objRS.CursorLocation = 3
objRS.Open "SELECT * FROM Services_Table" , objConn, 3, 3
strComputer = "."
Set objWMIService = GetObject("winmgmts:" _
    & "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")
Set colRetrievedEvents = objWMIService.ExecQuery _
    ("Select * from Win32_Service where name = 'MSSQLSERVER'",,48)
For Each objEvent in colRetrievedEvents
    objRS.AddNew
    objRS("Name") = objEvent.Name
    objRS("ExitCode") = objEvent.ExitCode
    objRS("Started") = objEvent.Started
    objRS("State") = objEvent.State
    objRS("Status") = objEvent.Status
    objRS("SystemName") = objEvent.SystemName
    objRS.Update
Next
 
objRS.Close
objConn.Close

 

Next is a stored procedure:

USE MASTER
GO
 
Drop Proc sp_dba_ServicesMonitor_PASS
go
Create Procedure sp_dba_ServicesMonitor_PASS AS
 
--This procedure will check the services table and page a 
--dba if the SQL or Agent services are down.
--Implementation is as follows: The SQL Agent Job step should call this procedure
/*
*********************************************************************************
*                          *
* Use at your own Risk    Author: Edward J Pochinski III TEAMSQL            *                     
                         *
* Usage: compile in master or DBA DB                        *
*                                   *
*                                                                   *
* THE AUTHOR MAKES NO GUARANTEES OR WARRANTIES, EXPRESS, IMPLIED OR OF ANY      *
* OTHER KIND TO THIS CODE OR ANY USER MODIFICATIONS                             *
* DO NOT USE IN A PRODUCTION ENVIRONMENT UNTIL YOU HAVE TESTED the CODE     *
* IN A SECURED LAB ENVIRONMENT. USE AT YOUR OWN RISK.                           *
*********************************************************************************
*/
 
--Load the services data from the remote server
DECLARE @retval int
DECLARE @cmdstr nvarchar(4000)
--Check the path to the vbs file
SET @cmdstr = ' C:\Docs\seminar\July_LivePresenter\WMI_Scripting\Services_all_Dump_To1Table.vbs'  
 
EXEC @retval = master..xp_cmdshell @cmdstr
IF (@retval = 0) --0 is a successful return code meaning successful
     BEGIN
     print '**************************************************************'
     print '/***** The file was called successfully'
     print '**************************************************************'
     print ''
     END
else 
     Begin 
    Print '**********************************************************************************************'
    print ' A problem has occurred'
    Print '**********************************************************************************************'
/*     
    Select @srvname  = @@servername 
    select @fr = '*'
    select @t = '[email protected],[email protected]'
    select @sub = @srvname  +' Alert Error'
    select @bod =  'A problem occurred executing the vbs file'
exec sp_dbaSendMail1 @fr , @t , @sub, @bod
*/
     END
 
--Part 2 : Check the status of the services and note the qualified path to the table
Declare @sysname varchar(55), @Service_name varchar(55)
select @sysname = SystemName, @service_name = name from perfmon..Services_Table where state = 'stopped'
IF @@rowcount <> 0 
Begin
 
Print @sysname + ' ' + @service_name + ' service is not running call Rocky and BullWinkle....'
/*
Declare @msg varchar(255),@fr varchar(50),@t varchar (255),@sub varchar(100), @bod varchar(255)
--Build the string to send out as a page
select @msg = @sysname + ' ' + @service_name + ' service is not running'
select @fr = '[email protected]'
select @t = '[email protected],[email protected]'  --edit who this is going to as [email protected]
select @sub = 'Alert' 
select @bod = @msg
exec sp_dbaSendMail1 @fr , @t , @sub, @bod
*/
--Note the qualifiers
truncate table perfmon..Services_Table
 
END
    ELSE
Begin
--Note the qualifiers 
truncate table perfmon..Services_Table
Print ' We are at the return Section of code'
Return
END 
 


Tuesday, May 19, 2015 - 11:54:54 AM - Mike Back To Top (37228)

John - I'm glad the tip provided some value.  Thanks for the feedback.


Tuesday, May 19, 2015 - 10:13:06 AM - John Langston Back To Top (37226)

I am always interested in how other folks approach problems common to all DBAs and knowing a server is down or unresponsive before anyone else (even the server engineers) is certainly one of them.  It also helps me evaluate our approach with what others do and look for improvements.  Reading Mike's article reminds me of how xp_instance_regread is one of those extended procedures that I overlook in what it can provide for me. 

At my current employer we're fortunate enough to be able to have a server dedicated to checking remote server availability throughout the day and into the evening.  More than once we have been alerted  so we could intervene before an overnight process failed due to a server being unreachable. 

For a number of years  whenever I have configured a SQL Server I have installed a SQL Agent job that will execute when the Agent service starts and send an email to me and others.  More than once this has alerted us to a phantom reboot on an important server.

Thanks Mike, great job!















get free sql tips
agree to terms