SQL Server Agent Jobs Monitoring Script

By:   |   Comments (3)   |   Related: > SQL Server Agent


Problem

We had a SQL Server Agent Job that was scheduled to run an important process and it had not been running for a while. We figured out that somebody disabled the job (probably before SQL Server maintenance) and then forgot to enable the job. There was also another case in the past where one of the shared schedules was disabled and a job stopped running.

How can we prevent this from happening in the future? How can we check our SQL Server Agent Jobs? What other problems can SQL Server Agent Jobs have?

Solution

Problems with SQL Server Agent Jobs can lead to missing backups, missing notifications about a failed job and sometimes missing mission critical tasks that are run as scheduled jobs.

Some of the common SQL Server Agent Job problems are described in other MSSQLTips tips. In this tip we will provide a single script that will check the jobs for a number of issues including:

  • Disabled SQL Server Agent Jobs
  • Failed SQL Server Agent Jobs or Job Steps
  • SQL Server Agent Jobs without a Description
  • SQL Server Agent Jobs without Schedules
  • Disabled SQL Server Agent Job Schedules
  • SQL Server Agent Jobs without Categories
  • SQL Server Agent Jobs Owned by a DBA or Other Privileged User
  • SQL Server Agent Jobs with no Owner or Owner Login is Denied "CONNECT TO SQL"
  • No SQL Server Agent Notifications on Job Failure or Completion
  • SQL Server Agent Operators Issues
  • SQL Server Agent Missing Job History
  • Potential SQL Server Agent Job Step Issue or Step Won’t Execute

Disabled SQL Server Agent Jobs

SQL Server Agent Jobs could be disabled, for example, before a maintenance window and not enabled afterwards.  It would be best to not disable jobs at all in Production environments, but sometimes it is necessary.  If you have disabled jobs, make sure you know why they are disabled and make sure it is documented. If a job will be disabled for a maintenance outage or another reason - set up a reminder to enable the job.  As a final point, add a comment in the disabled jobs description to make sure you don't forget why it was disabled and to make sure that other DBAs know this too.

Read this tip about setting up automated alerts for disabled/enabled jobs.

Failed SQL Server Agent Jobs or Job Steps

Failed SQL Server Agent Jobs is the most common check and you are probably already checking this or getting notifications on failed jobs.

Read one of the previous tips that provides scripts to determine SQL Server Agent Job failures.

SQL Server Agent Jobs without a Description

It’s a good practice to make sure your SQL Server Agent Jobs have a description. This is another way of documenting jobs and it will help other DBAs understand what the job does.

SQL Server Agent Jobs without Schedules

Understanding and documenting the schedule for a job is a best practice and can be done by simply adding a comment. Sometimes jobs don't need to have schedules as they will be started manually or ad-hoc by an application. But it's possible that somebody might have deleted a schedule, for example a shared schedule that could impact multiple jobs.

Disabled SQL Server Agent Job Schedules

Disabled SQL Server Agent Job Schedules is a problem when a job is enabled, but the schedule assigned to it is disabled. For example, a shared schedule was disabled, but the person disabling it hasn't realized that it's in use by another job.

Read this tip about setting up and managing SQL Server Agent shared schedules.

SQL Server Agent Jobs without Categories

SQL Server Agent Jobs without categories is not really a problem, but to keep things in order you may need to check if there are jobs that have default the category "[Uncategorized (Local)]".

Read this tip about SQL Server Jobs management.

Here is a tip about custom job categories.

SQL Server Agent Jobs Owned by a DBA or Other Privileged User

Ideally jobs have to be owned by an account that has minimal permissions and was setup just to run the jobs. Read this tip about potential issues you may have when jobs are owned by sysadmins.

Also, if a job was owned by a DBA (or another actual user) who left a company the job might not run later. The job’s owner will be empty and you will get the following error: “Unable to determine if the owner () of job <job_name> has server access”.

Access to the SQL Server could be granted through Domain groups. For one of our checks we find members of the "DBA" Domain group. You can update the name of the group in this part of the script:

@DBA_Group = 'DOMAIN\DBA_Group', -- DBA group name in Active Directory

SQL Server Agent Jobs with no Owner or Owner Login is Denied "CONNECT TO SQL"

For SQL Server Agent Jobs with no owner or the ability to connect to SQL Server, the jobs won’t run and you will get the following errors:

  • When the owner is empty: “Unable to determine if the owner () of job <job_name> has server access”.
  • When the owner’s login has login denied (by “DENY CONNECT SQL TO …“): “The owner (_TempLogin) of job _TestJob#5 does not have server access”. Note that if login is disabled jobs still can run.

No SQL Server Agent Notifications on Job Failure or Completion

Make sure that there are no new jobs created without notifications. Every job needs to have an operator assigned to it (at least to send emails on failure).

Check this tip about setting notifications for all jobs and this tip about finding SQL Server Agent jobs without an operator.

SQL Server Agent Operators Issues

Make sure that jobs send notifications to the right operator. Learn about some operator's related issues described in this previous tip.

We are checking that an operator has a valid email address and has a valid name that we use across all SQL Servers. You can set the following variables to your values in this part of the script:

@DBA_Op_Email = '[email protected]', -- DBA group distribution email (for an operator)
@DBA_Operator = 'TestOp_valid'          -- DBA Operator name

SQL Server Agent Missing Job History

Missing (or limited) job history won’t let you troubleshoot a failed job.

Settings for the jobs history are set under the SQL Server Agent properties and these settings will depend on the number of jobs and on the frequency of the job executions. So it might be different on different SQL Servers. You can read more about setting up job history here.

For the script in this tip, we are checking history only for enabled jobs.

The number of history records per job could be configured at the beginning of the script with this variable:

@Min_hist_length = 10, -- minimum history per job

Potential SQL Server Agent Job Step Issue or Step Won’t Execute

In our script we check some occurrences of missing job steps. We check to see if a job starts with a step other than step 1. Also we check if several steps have “Quit the job reporting success” on success action. The last check is to see if there are no steps “reporting success“.

Here is one of the warnings you may get when you setup a job that has a job step issue:

Warning when job step cannot be reached with the current job step flow logic

Our script doesn’t cover all scenarios, but these are probably the most common issues.

Consolidated SQL Server Agent Job Troubleshooting Script

The script below will check all of the issues described above. It could be converted to the stored procedure or you can run it as a script on Central Management Server.

/*
----------------------------------------------------------------------------
-- Object Name: MSSQLTips_SGolovko_Tip44_CheckJobs_script.sql
-- Business Process: Database Administration
-- Purpose: To check SQL Server Agent Jobs
-- Database: msdb
-- Dependent Objects: system tables and procedures only
-- Created On: 2017-10-05 
-- Created: By Svetlana Golovko for MSSQLTips.com 
----------------------------------------------------------------------------
*/

USE msdb
GO
SET NOCOUNT ON;

DECLARE @Min_hist_length INT,       
        @DBA_Group      SYSNAME,
        @DBA_Op_Email   NVARCHAR(100),
        @DBA_Operator   SYSNAME
 
SELECT @Min_hist_length = 10,                     -- minimum history per job
       @DBA_Group       = 'DOMAIN\DBA_Group',     -- DBA group name in Active Directory
       @DBA_Op_Email    = '[email protected]', -- DBA group distribution email (for an operator)
       @DBA_Operator    = 'TestOp_valid'          -- DBA Operator name

-- get members of the DBA group
DECLARE @TestTable TABLE ([account name]      NVARCHAR(256), 
                          [type]              NVARCHAR(30), 
                          [privilege]         NVARCHAR(30), 
                          [mapped login name] NVARCHAR(256), 
                          [permission path]   NVARCHAR(256)
                  );

INSERT INTO @TestTable
EXEC  sys.xp_logininfo @DBA_Group , 'members';

-- get issues sorted by severity
WITH jobs_problems
AS (
SELECT  j.[name] AS job_name, 
      j.[enabled] job_enabled, 
      CASE WHEN s.[enabled] = 0 
            THEN s.[name] 
            ELSE NULL END AS schedule_name, 
      CASE WHEN s.[enabled] = 0 
            THEN s.[enabled] 
            ELSE NULL END AS schedule_enabled, 
      o.[name] AS operator_name,
      j.owner_sid,
      ISNULL(jh.hist_length,0) AS hist_length,
      CASE WHEN j.[enabled] = 0 
            THEN 'disabled job' 
            ELSE NULL END AS Problem1,
      CASE WHEN js.last_run_outcome = 0 OR (st.last_run_outcome = 0  AND jh.job_id IS NOT NULL)
            THEN 'failed job or job''s step' 
            ELSE NULL END AS Problem2,
      CASE WHEN j.[description] IS NULL 
            THEN 'no description' 
            ELSE NULL END AS Problem3,
      CASE WHEN sh.job_id IS NULL 
            THEN 'no schedule' 
            ELSE NULL END AS Problem4,
      CASE WHEN s.[enabled] = 0 
            THEN 'job with disabled schedule' 
            ELSE NULL END AS Problem5,
      CASE WHEN j.category_id = 0 
            THEN 'unassigned category = "[Uncategorized (Local)]"' 
            ELSE NULL END AS Problem6,
      CASE WHEN IS_SRVROLEMEMBER('sysadmin', SUSER_SNAME(j.owner_sid)) = 1 
               OR (SUSER_SNAME(j.owner_sid) IN (SELECT [mapped login name] FROM @TestTable) AND IS_SRVROLEMEMBER('sysadmin', @DBA_Group) = 1)
            THEN 'owned by sysadmin'  
            ELSE NULL END AS Problem7,
      CASE WHEN  SUSER_SNAME (j.owner_sid) IS NULL OR l.hasaccess = 0 OR l.denylogin = 1 
            THEN 'no owner or owner login is disabled'  
            ELSE NULL END AS Problem8,
      CASE WHEN j.notify_level_email < 1 
            THEN 'no notification' 
            ELSE NULL END AS Problem9,
      CASE WHEN o.[id] IS NULL 
            THEN 'no operator' 
            WHEN o.[name] != @DBA_Operator OR o.email_address != @DBA_Op_Email  OR (o.[name] IS NOT NULL AND  o.email_address IS NULL )
            THEN 'issue with operator'
            ELSE NULL END AS Problem10,
      CASE WHEN j.[enabled] = 1 AND ISNULL(jh.hist_length,0) < @Min_hist_length
            THEN 'not enough history' 
            ELSE NULL END AS Problem11,
      CASE WHEN sc.job_id IS NOT NULL  
            THEN 'potential job step issue' 
            ELSE NULL END AS Problem12
FROM dbo.sysjobs j  JOIN 
      dbo.sysjobsteps st ON j.job_id = st.job_id LEFT JOIN 
      dbo.sysjobschedules sh ON j.job_id = sh.job_id LEFT JOIN 
      dbo.sysschedules s ON sh.schedule_id = s.schedule_id AND s.[enabled] = 0 LEFT JOIN 
      dbo.sysoperators o ON j.notify_email_operator_id = o.[id] LEFT JOIN 
      master.sys.syslogins l ON j.owner_sid = l.[sid] LEFT JOIN 
      dbo.sysjobservers js ON j.job_id = js.job_id LEFT JOIN
      (SELECT COUNT(*) hist_length, job_id  FROM sysjobhistory 
         GROUP BY job_id) jh ON j.job_id = jh.job_id LEFT JOIN
      (SELECT j.job_id, COUNT(s.step_id) countsteps FROM dbo.sysjobs j LEFT JOIN 
               dbo.sysjobsteps s ON s.job_id=j.job_id 
            WHERE s.on_success_action = 1 
            GROUP BY j.job_id,  s.on_success_action
            HAVING  COUNT(s.step_id) > 1
            UNION
            SELECT j.job_id, NULL FROM dbo.sysjobs j 
            WHERE  j.start_step_id > 1
            UNION
            SELECT j.job_id, COUNT(s.step_id) countsteps FROM dbo.sysjobs j LEFT JOIN 
               dbo.sysjobsteps s ON s.job_id=j.job_id 
            AND s.on_success_action = 1 
            GROUP BY j.job_id,  s.on_success_action
            HAVING  COUNT(s.step_id) < 1) sc ON j.job_id = sc.job_id
WHERE 
      (sh.job_id IS NULL -- job with no schedule
      OR j.[description] IS NULL -- job without description
      OR s.[enabled] = 0 -- job with disabled schedule
      OR j.[enabled] = 0 -- disabled job
      OR js.last_run_outcome = 0 OR (st.last_run_outcome = 0  AND jh.job_id IS NOT NULL) -- failed job or job step
      OR j.category_id = 0 -- unassigned category = "[Uncategorized (Local)]"
      OR (IS_SRVROLEMEMBER('sysadmin', SUSER_SNAME(j.owner_sid)) = 1
      OR SUSER_SNAME (j.owner_sid) IS NULL
      OR (SUSER_SNAME(j.owner_sid) IN (SELECT [mapped login name] FROM @TestTable) 
            AND IS_SRVROLEMEMBER('sysadmin', @DBA_Group) = 1)
         ) -- job owned by sysadmin 
      OR o.[id] IS NULL   -- no operator
      OR (o.[name] != @DBA_Operator OR o.email_address != @DBA_Op_Email 
            OR (o.[name] IS NOT NULL AND  o.email_address IS NULL ) ) -- operator not valid
      OR SUSER_SNAME (j.owner_sid) IS NULL -- no job owner (deleted)
      OR j.notify_level_email < 1 -- no notification
      OR (j.[enabled] = 1 AND ISNULL(jh.hist_length,0)  < @Min_hist_length ) -- not enough history
      OR sc.job_id IS NOT NULL -- potential job step issue
))
SELECT  job_name, 
      job_enabled, 
      MIN(schedule_name) AS disabled_schedule_name,
      MIN(operator_name) AS operator_name,
      SUSER_SNAME(owner_sid)   AS owner_name, 
      hist_length,
      /*-- SQL Server 2012 and higher
      CONCAT(MIN(Problem1) + '; ', 
            MIN(Problem2) + '; ',  
            MIN(Problem3) + '; ' , 
            MIN(Problem4) + '; ' , 
            MIN(Problem5) + '; ', 
            MIN(Problem6) + '; ', 
            MIN(Problem7) + '; ', 
            MIN(Problem8) + '; ', 
            MIN(Problem9) + '; ', 
            MIN(Problem10) + '; ', 
            MIN(Problem11) + '; ', 
            MIN(Problem12) + '; ') AS issues_report,*/
       ISNULL((MIN(Problem1) + '; '), '') +
            ISNULL((MIN(Problem2) + '; '), '') + 
            ISNULL((MIN(Problem3) + '; '), '')  + 
            ISNULL((MIN(Problem4) + '; '), '')  +
            ISNULL((MIN(Problem5) + '; '), '') +
            ISNULL((MIN(Problem6) + '; '), '') +
            ISNULL((MIN(Problem7) + '; '), '') +
            ISNULL((MIN(Problem8) + '; '), '') +
            ISNULL((MIN(Problem9) + '; '), '') +
            ISNULL((MIN(Problem10) + '; '), '') + 
            ISNULL((MIN(Problem11) + '; '), '') +
            ISNULL((MIN(Problem12) + '; '), '') AS issues_report, -- SQL Server 2008 compatible
      (  CASE WHEN MIN(Problem1)  IS NULL THEN 0 ELSE 1 END + 
         CASE WHEN MIN(Problem2)  IS NULL THEN 0 ELSE 1 END  +  
         CASE WHEN MIN(Problem3)  IS NULL THEN 0 ELSE 1 END  +  
         CASE WHEN MIN(Problem4)  IS NULL THEN 0 ELSE 1 END  + 
         CASE WHEN MIN(Problem5)  IS NULL THEN 0 ELSE 1 END  + 
         CASE WHEN MIN(Problem6)  IS NULL THEN 0 ELSE 1 END  + 
         CASE WHEN MIN(Problem7)  IS NULL THEN 0 ELSE 1 END  + 
         CASE WHEN MIN(Problem8)  IS NULL THEN 0 ELSE 1 END  + 
         CASE WHEN MIN(Problem9)  IS NULL THEN 0 ELSE 1 END  + 
         CASE WHEN MIN(Problem10) IS NULL THEN 0 ELSE 1 END  + 
         CASE WHEN MIN(Problem11) IS NULL THEN 0 ELSE 1 END  + 
         CASE WHEN MIN(Problem12) IS NULL THEN 0 ELSE 1 END  ) AS severity
FROM jobs_problems
GROUP BY job_name, job_enabled, owner_sid, hist_length
ORDER BY 
(        CASE WHEN MIN(Problem1)  IS NULL THEN 0 ELSE 1 END + 
         CASE WHEN MIN(Problem2)  IS NULL THEN 0 ELSE 1 END  +  
         CASE WHEN MIN(Problem3)  IS NULL THEN 0 ELSE 1 END  +  
         CASE WHEN MIN(Problem4)  IS NULL THEN 0 ELSE 1 END  + 
         CASE WHEN MIN(Problem5)  IS NULL THEN 0 ELSE 1 END  + 
         CASE WHEN MIN(Problem6)  IS NULL THEN 0 ELSE 1 END  + 
         CASE WHEN MIN(Problem7)  IS NULL THEN 0 ELSE 1 END  + 
         CASE WHEN MIN(Problem8)  IS NULL THEN 0 ELSE 1 END  + 
         CASE WHEN MIN(Problem9)  IS NULL THEN 0 ELSE 1 END  + 
         CASE WHEN MIN(Problem10) IS NULL THEN 0 ELSE 1 END  + 
         CASE WHEN MIN(Problem11) IS NULL THEN 0 ELSE 1 END  + 
         CASE WHEN MIN(Problem12) IS NULL THEN 0 ELSE 1 END  )
         DESC

Here are examples of the script output:

Jobs Check Results #1
Jobs Check Results #2

Note that in the second output we have NULLs as owner_name. These jobs won't be able to run.

We have added the last column ("severity") to display issue severity for each job based on the number of the issues. Then we sorted our results by this column.

You can set your own weighing for each issue in this part of the script:

     (  CASE WHEN MIN(Problem1)  IS NULL THEN 0 ELSE 1 END  + 
        CASE WHEN MIN(Problem2)  IS NULL THEN 0 ELSE 1 END  +  
        CASE WHEN MIN(Problem3)  IS NULL THEN 0 ELSE 1 END  +  
        CASE WHEN MIN(Problem4)  IS NULL THEN 0 ELSE 1 END  + 
        CASE WHEN MIN(Problem5)  IS NULL THEN 0 ELSE 1 END  + 
        CASE WHEN MIN(Problem6)  IS NULL THEN 0 ELSE 1 END  + 
        CASE WHEN MIN(Problem7)  IS NULL THEN 0 ELSE 1 END  + 
        CASE WHEN MIN(Problem8)  IS NULL THEN 0 ELSE 1 END  + 
        CASE WHEN MIN(Problem9)  IS NULL THEN 0 ELSE 1 END  + 
        CASE WHEN MIN(Problem10) IS NULL THEN 0 ELSE 1 END  + 
        CASE WHEN MIN(Problem11) IS NULL THEN 0 ELSE 1 END  + 
        CASE WHEN MIN(Problem12) IS NULL THEN 0 ELSE 1 END  ) as severity

For example if we think that a job with no owner is the most critical issue then we can set the weight to higher number than 1 ("3" in example below):

...WHEN MIN(Problem8) IS NULL THEN 0 ELSE 3 END  +...
Next Steps
  • Download the script and use this script as part of your regular DBA checks.
  • Review schedules that were created to run a job once (“Run Once”) and delete them if they are not required.
  • Review disabled schedules and delete them if they are not required.
  • Review disabled jobs, script them and delete if they are not required. Add a description to the disabled job if it will not be deleted.
  • Review jobs without schedules and add notes to the job's description.
  • Assign the correct operator for Jobs missing an operator.
  • Make sure that every job has a notification.
  • Check all SQL Server Agent tips here.
  • Read this tip about SQL Server agent queries.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Svetlana Golovko Svetlana Golovko is a DBA with 13 years of the IT experience (including SQL Server and Oracle) with main focus on performance.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Tuesday, October 18, 2022 - 2:57:13 AM - Chandu Patel Back To Top (90612)
Msg 15600, Level 15, State 1, Procedure sys.xp_logininfo, Line 18 [Batch Start Line 3]
An invalid parameter or option was specified for procedure 'sys.xp_logininfo'.
Warning: Null value is eliminated by an aggregate or other SET operation.

the above mentioned output when i executed the script

Wednesday, June 12, 2019 - 11:33:08 AM - Svetlana Back To Top (81436)

Hi Wil

Most likely you are missing "use msdb" at the beginning of your script.

Thanks,

Svetlana


Tuesday, June 11, 2019 - 9:44:32 AM - wil salazar Back To Top (81414)

Very glad there are people like you very smart and a big heart to help others.  I run the script on this post but some lines are underlined in red and don't know why...like after the select.. j.name ERROR: The multi-part identifier 'j.name' could not be found.

Wonder if you can help

w















get free sql tips
agree to terms