Monitor for Hung SQL Server Agent Jobs

By:   |   Updated: 2023-11-30   |   Comments   |   Related: > SQL Server Agent


Problem

Sometimes, the SQL Agent may be running a job that stops working and sits in a hung state. You want to implement a process to monitor for these types of jobs, alert if it exceeds a predetermined threshold, and optionally kill the job.

Solution

We'll build a solution that uses a scheduled SQL Agent job that can check for other jobs that may be running for a long time. The msdb.dbo.sysjobactivity table can be used to watch running jobs and determine how long each has been running.

Find Running Jobs

Join the sysjobs and sysjobactivity tables in msdb to obtain the names of running jobs. A job is running if the start_execution_date has a value and the stop_execution_date is NULL.

--     mssqltips.com
SELECT [sj].[name]
FROM   [msdb].[dbo].[sysjobs] [sj]
  JOIN [msdb].[dbo].[sysjobactivity] [sja] ON [sj].[job_id] = [sja].[job_id]
WHERE [sja].[start_execution_date] IS NOT NULL
  AND [sja].[last_executed_step_date] IS NULL
  AND [sja].[stop_execution_date] IS NULL;
GO

There is a job called TestJob running on my system.

Running Job

Find Jobs Running Longer Than a Specified Time

Add a variable of type INT, set it to the number of minutes to check for, and add another filter to the WHERE clause to compare start_execution_date with the current date and time. Here, we'll check for jobs running longer than two hours by setting @RunLimit equal to 120.

----     mssqltips.com
-- begin config variables
DECLARE @RunLimit INT = 120 --run limit 
-- end config variables
    
SELECT [sj].[name]
FROM   [msdb].[dbo].[sysjobs] [sj]
  JOIN [msdb].[dbo].[sysjobactivity] [sja] ON [sj].[job_id] = [sja].[job_id]
WHERE [sja].[start_execution_date] IS NOT NULL
  AND [sja].[stop_execution_date] IS NULL
  AND [sja].[start_execution_date] < DATEADD(MINUTE, -@RUNLIMIT, GETDATE());
GO

We can see that TestJob has been running longer than two hours.

Job Running Over Two Hours

Send Email for Hung Job

To be notified if a job is running longer than specified we can use sp_send_mail. Add variables: email from, email to, mail profile, job name, and email subject. This will watch for jobs running past the specified time and notify you via email. Also, this will return the job that has been running the longest.

/* mssqltips.com */
-- begin config variables
DECLARE @RunLimit INT = 120 -- run limit in minutes 
DECLARE @Recipients VARCHAR(MAX) = '[email protected] ' -- to 
DECLARE @FromAddress VARCHAR(MAX) = '[email protected]' -- from
DECLARE @ProfileName VARCHAR(MAX) = 'default' -- database mail profile to use
-- end config variables
    
DECLARE @HungJob VARCHAR(MAX)
    
SET @HungJob =
(
   SELECT TOP 1 [sj].[name]
   FROM   [msdb].[dbo].[sysjobs] [sj]
     JOIN [msdb].[dbo].[sysjobactivity] [sja] ON [sj].[job_id] = [sja].[job_id]
   WHERE [sja].[start_execution_date] IS NOT NULL
     AND [sja].[last_executed_step_date] IS NULL
     AND [sja].[stop_execution_date] IS NULL
     AND [sja].[start_execution_date] < DATEADD(MINUTE, -@RunLimit, GETDATE())
   ORDER BY [sja].[start_execution_date]
)

IF @HungJob IS NOT NULL
BEGIN
   DECLARE @SUBJECTKILL VARCHAR(MAX) = 'SQL Agent Job ' + @HungJob + ' has been running longer than ' 
                                       + CAST(@RunLimit AS VARCHAR) + ' minutes and has been killed.'
   
   EXEC [msdb].[dbo].[sp_send_dbmail] @profile_name = @ProfileName,
                                      @recipients = @Recipients,
                                      @subject = @SubjectKill,
                                      @from_address = @FromAddress
END;
GO

We're notified the job has run past the specified threshold.

Hung Job Notification Email

Automatically Kill Hung Job and Send Email

To monitor for hung jobs, automatically kill them, and send an email notification, we add a variable to turn the kill job option on or off and an IF/ELSE to check the variable and run the appropriate SQL.

-- mssqltips.com
-- begin config variables
DECLARE @RUNLIMIT INT = 120 -- kill job if it's run more than this number of minutes 
DECLARE @KILLJOB TINYINT = 1 -- 1 = kill hung job automatically
DECLARE @RECIPIENTS VARCHAR(MAX) = '[email protected] ' -- to 
DECLARE @FROMADDRESS VARCHAR(MAX) = '[email protected]' -- from
DECLARE @PROFILENAME VARCHAR(MAX) = 'default' -- database mail profile to use
-- end config variables
    
DECLARE @HUNGJOB VARCHAR(MAX)
    
SET @HUNGJOB = (SELECT TOP 1 [sj].[name]
                FROM   [msdb].[dbo].[sysjobs] [sj]
                  JOIN [msdb].[dbo].[sysjobactivity] [sja] ON [sj].[job_id] = [sja].[job_id]
                WHERE [sja].[start_execution_date] IS NOT NULL
                  AND [sja].[last_executed_step_date] IS NULL
                  AND [sja].[stop_execution_date] IS NULL
                  AND [sja].[start_execution_date] < DATEADD(MINUTE, -@RunLimit, GETDATE())
                ORDER BY [sja].[start_execution_date]
)
IF @HUNGJOB IS NOT NULL AND @KILLJOB = 1
BEGIN
   EXEC [msdb].[dbo].[sp_stop_job] @job_name = @HUNGJOB
    
   DECLARE @SUBJECTKILL VARCHAR(MAX) = 'SQL Agent Job ' + @HUNGJOB + ' has been running longer than ' 
                                       + CAST(@RUNLIMIT AS VARCHAR) + ' minutes and has been killed.'

   EXEC [msdb].[dbo].[sp_send_dbmail] @profile_name = @PROFILENAME,
                                      @recipients = @RECIPIENTS,
                                      @subject = @SUBJECTKILL,
                                      @from_address = @FROMADDRESS
END
ELSE IF @HUNGJOB IS NOT NULL AND @KILLJOB != 1
BEGIN
   DECLARE @SUBJECTNOKILL VARCHAR(MAX) = 'SQL Agent Job ' + @HUNGJOB + ' has been running longer than ' 
                                            + CAST(@RUNLIMIT AS VARCHAR) + ' minutes.'

   EXEC [msdb].[dbo].[sp_send_dbmail] @profile_name = @PROFILENAME,
                                      @recipients = @RECIPIENTS,
                                      @subject = @SUBJECTNOKILL,
                                      @from_address = @FROMADDRESS
END;
GO

We're notified the job has run past the specified threshold and was killed.

Hung Job Killed Notification Email

Create and Schedule SQL Agent Job to Monitor for Hung Processes

Lastly, create a SQL Agent job to automatically monitor for and optionally kill hung processes by configuring these variables and the schedule in the following script to automate the process:

  • @RUNLIMIT
  • @KILLJOB
  • @RECIPIENTS
  • @FROMADDRESS
  • @PROFILENAME
USE [msdb]
GO
    
/****** Object:     Job [HungSqlAgentJobMonitor]             Script Date: 10/24/2023 8:24:22 AM ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
/****** Object:     JobCategory [[Uncategorized (Local)]]             Script Date: 10/24/2023 8:24:22 AM ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    
END
    
DECLARE @jobId BINARY(16)
EXEC @ReturnCode =     msdb.dbo.sp_add_job @job_name=N'HungSqlAgentJobMonitor', 
      @enabled=1, 
      @notify_level_eventlog=0, 
      @notify_level_email=0, 
      @notify_level_netsend=0, 
      @notify_level_page=0, 
      @delete_level=0, 
      @description=N'No description available.', 
      @category_name=N'[Uncategorized (Local)]', 
      @owner_login_name=N'sa', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object:     Step [CheckForHungJobs]             Script Date: 10/24/2023 8:24:22 AM ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'CheckForHungJobs', 
      @step_id=1, 
      @cmdexec_success_code=0, 
      @on_success_action=1, 
      @on_success_step_id=0, 
      @on_fail_action=2, 
      @on_fail_step_id=0, 
      @retry_attempts=0, 
      @retry_interval=0, 
      @os_run_priority=0, @subsystem=N'TSQL', 
      @command=N'/* mssqltips.com */
-- begin config variables
DECLARE @RunLimit INT = 240 -- run limit in minutes 
DECLARE @KillJob TINYINT = 1 -- 1 = kill hung job automatically
--DECLARE @Recipients VARCHAR(MAX) = ''[email protected] '' -- to 
DECLARE @Recipients VARCHAR(MAX) = ''[email protected]'' -- to 
DECLARE @FromAddress VARCHAR(MAX) = ''[email protected]'' -- from
DECLARE @ProfilEName VARCHAR(MAX) = ''default'' -- database mail profile to use
-- end config variables
    
DECLARE @HungJob VARCHAR(MAX)
    
SET @HungJob =
(
             SELECT TOP 1 [sj].[name]
             FROM [msdb].[dbo].[sysjobs] [sj]
               JOIN [msdb].[dbo].[sysjobactivity] [sja] ON [sj].[job_id] = [sja].[job_id]
             WHERE [sja].[start_execution_date] IS NOT NULL
               AND [sja].[last_executed_step_date] IS NULL
               AND [sja].[stop_execution_date] IS NULL
               AND [sja].[start_execution_date] < DATEADD(MINUTE, -@RunLimit, GETDATE())
             ORDER BY [sja].[start_execution_date]
)
IF @HungJob IS NOT NULL AND @KillJob = 1
BEGIN
             EXEC [msdb].[dbo].[sp_stop_job] @job_name = @HungJob
    
             DECLARE @SUBJECTKILL VARCHAR(MAX)
                             = ''SQL Agent Job '' + @HungJob + '' has been running longer than '' + CAST(@RunLimit AS VARCHAR)
                               + '' minutes and has been killed.''
             EXEC [msdb].[dbo].[sp_send_dbmail] @profile_name = @ProfileName,
                                                @recipients = @Recipients,
                                                @subject = @SubjectKill,
                                                @from_address = @FromAddress
END
ELSE IF @HungJob IS NOT NULL AND @KillJob != 1
BEGIN

              DECLARE @SubjectNoKill VARCHAR(MAX)
                                = ''SQL Agent Job '' + @HungJob + '' has been running longer than '' + CAST(@RunLimit AS VARCHAR) 
                                + '' minutes.''
              EXEC [msdb].[dbo].[sp_send_dbmail] @profile_name = @ProfileName,
                                                 @recipients = @Recipients,
                                                 @subject = @SubjectNoKill,
                                                 @from_address = @FromAddress
END;
GO
', 
      @database_name=N'msdb', 
      @flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'HungSqlAgentJobMonitor', 
      @enabled=1, 
      @freq_type=4, 
      @freq_interval=1, 
      @freq_subday_type=8, 
      @freq_subday_interval=2, 
      @freq_relative_interval=0, 
      @freq_recurrence_factor=0, 
      @active_start_date=20231019, 
      @active_end_date=99991231, 
      @active_start_time=0, 
      @active_end_time=235959, 
      @schedule_uid=N'bf3a8a7a-fc57-4a43-970d-a7cb39a5a0d0'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
             IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO
Next Steps

Here are a few of the many SQL Server Agent Tips you'll find on MSSQLTips.com:



sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Joe Gavin Joe Gavin is from the Greater Boston area. He started working with SQL Server and Sybase in 1998 in the financial services industry and has been a SQL Server Database Administrator for a dairy cooperative since 2011. He graduated from Northeastern University in Boston with a Bachelor of Science in Engineering Technology (BSET) degree in Computer Technology. Joe has spoken at the Boston and Providence SQL Saturday events.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2023-11-30

Comments For This Article