Add Last Step to SQL Agent Jobs for Failure Notification

By:   |   Updated: 2022-01-21   |   Comments   |   Related: > SQL Server Agent


Problem

I inherited a Microsoft SQL Server environment where all SQL Agent jobs are configured with the native job failure notification as seen in SQL Server Management Studio (SSMS).

Job_failure_notification Email operator when job fails

This configuration is not always sufficient for our sophisticated business requirements, for example, if a job step fails, we want to send a different email to a different team depending on the failed job step, or when a job fails, we want to log some customized message in a centralized database, or do some automatic post-failure actions, all these can be done if we have a last job step dedicated for job failure processing instead of only using the native job failure notification mechanism.

Is there a way as a Database Administrator (DBA) that I can easily add a last job step in all SQL Agent jobs (about 230+ jobs across 50+ SQL instances)?

Solution

Adding a job step at the bottom of a SQL Agent job is quick, but the challenge is to modify all previous steps.

We want to make sure the following rules are observed for all jobs:

If a job step configures "On failure action" to "Quit the job reporting failure", we need to change the configuration to "go to the [last] step"

On_failure_action Quit the job reporting failure

If a job step configures "On failure action" to "Go to the next step" or "Go to the step [N]…", we keep the configuration unchanged.

On failure action Go to the next step (or another step)

If a job step configures "On failure action" to "Quit the job reporting success", we keep the configuration unchanged for simplicity. (But we can configure it to go to the newly added last step and make further post-failure process if justified)

Tool and Environment

There may be different ways to solve this issue, but I find using the dbatools PowerShell module is probably the most convenient way.

First let’s set up a test environment, I will first create one test job on my local SQL Server 2016 instance.

USE [msdb] -- SQL Server Database
GO
if not exists (select * from dbo.sysoperators where name = N'DBATeam')
EXEC msdb.dbo.sp_add_operator @name=N'DBATeam', 
      @enabled=1, 
      @weekday_pager_start_time=90000, 
      @weekday_pager_end_time=180000, 
      @saturday_pager_start_time=90000, 
      @saturday_pager_end_time=180000, 
      @sunday_pager_start_time=90000, 
      @sunday_pager_end_time=180000, 
      @pager_days=0, 
      @email_address=N'[email protected]', 
      @category_name=N'[Uncategorized]'
GO
 
/****** Object:  Job [Test_Job]    Script Date: 12/5/2021 10:42:12 PM ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
/****** Object:  JobCategory [[Uncategorized (Local)]]    Script Date: 12/5/2021 10:42:12 PM ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
 
END
 
DECLARE @jobId BINARY(16)
EXEC @ReturnCode =  msdb.dbo.sp_add_job @job_name=N'Test_Job', 
      @enabled=1, 
      @notify_level_eventlog=0, 
      @notify_level_email=2, 
      @notify_level_netsend=0, 
      @notify_level_page=0, 
      @delete_level=0, 
      @description=N'No description available.', 
      @category_name=N'[Uncategorized (Local)]', 
      @owner_login_name=N'sa', 
      @notify_email_operator_name=N'DBATeam', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object:  Step [S1]    Script Date: 12/5/2021 10:42:12 PM ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'S1', 
      @step_id=1, 
      @cmdexec_success_code=0, 
      @on_success_action=3, 
      @on_success_step_id=0, 
      @on_fail_action=4, 
      @on_fail_step_id=4, 
      @retry_attempts=0, 
      @retry_interval=0, 
      @os_run_priority=0, @subsystem=N'TSQL', 
      @command=N'print ''hello world'';
', 
      @database_name=N'MSSQLTips', 
      @flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object:  Step [S2]    Script Date: 12/5/2021 10:42:12 PM ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'S2', 
      @step_id=2, 
      @cmdexec_success_code=0, 
      @on_success_action=3, 
      @on_success_step_id=0, 
      @on_fail_action=2, 
      @on_fail_step_id=0, 
      @retry_attempts=0, 
      @retry_interval=0, 
      @os_run_priority=0, @subsystem=N'TSQL', 
      @command=N'print ''hello world''
raiserror (''this is a test error'', 16, 1);', 
      @database_name=N'master', 
      @flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object:  Step [S3]    Script Date: 12/5/2021 10:42:12 PM ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'S3', 
      @step_id=3, 
      @cmdexec_success_code=0, 
      @on_success_action=1, 
      @on_success_step_id=0, 
      @on_fail_action=3, 
      @on_fail_step_id=0, 
      @retry_attempts=0, 
      @retry_interval=0, 
      @os_run_priority=0, @subsystem=N'TSQL', 
      @command=N'print ''hello''', 
      @database_name=N'master', 
      @flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object:  Step [S4]    Script Date: 12/5/2021 10:42:12 PM ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'S4', 
      @step_id=4, 
      @cmdexec_success_code=0, 
      @on_success_action=3, 
      @on_success_step_id=0, 
      @on_fail_action=2, 
      @on_fail_step_id=0, 
      @retry_attempts=0, 
      @retry_interval=0, 
      @os_run_priority=0, @subsystem=N'TSQL', 
      @command=N'print ''hello world''', 
      @database_name=N'master', 
      @flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object:  Step [S5]    Script Date: 12/5/2021 10:42:12 PM ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'S5', 
      @step_id=5, 
      @cmdexec_success_code=0, 
      @on_success_action=1, 
      @on_success_step_id=0, 
      @on_fail_action=2, 
      @on_fail_step_id=0, 
      @retry_attempts=0, 
      @retry_interval=0, 
      @os_run_priority=0, @subsystem=N'TSQL', 
      @command=N'print ''hello world''', 
      @database_name=N'master', 
      @flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
    IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO

The final job step flow is like the following from SSMS:

Job Step Flow

As we can see, in this job:

  1. step 1 (i.e. S1) will go to step 4 (S4) if S1 fails
  2. step 2, 4 and 5 (S2, S4, S5) will quit the job reporting failure if each fails
  3. step 3 will go to next step if it fails

What I want is to add a last step so if S2, S4, S5 ever fail, the "On Failure" action will jump to this newly added step.

In this last step, we can include any workflow we want, for example based on different job steps, sending a notification to different teams, this is especially true in a ETL job where we need to retrieve data from different sources, and each source has its own business owner.

But in my test job, I just send out an email reporting the failed job step and the failure time. Database mail with the mail profile is already setup.

Solution Algorithm

To better understand the script, here is quick description of the solution algorithm.

  1. Add a new step with name [Failure Notification] at the end of the job, and mark down the job step number [N], set both "On success action" and "On failure action" to "Quit the job reporting failure", as this step is used for reporting job failure, so we need the job history to show failure if this step is ever executed.
  2.  Scan through all job steps from the first step (i.e. step 0) to step [N-2], and if a job step "On failure" action is "Quit the job reporting failure", change the action to "Go to step [N-1]" (the last step is N-1 because first step is 0), i.e. to our newly added last step. Otherwise, no change.

Here is the PowerShell script:

# function: update existing jobs by adding a final step for failure notification
# after the final step is added, we need to scan previous job steps
# if that step's onfailure is "exit reporting error", we change it to "on failure, go to last step"
 
import-module dbatools;
[string[]] $ServerList = 'localhost\sql2016'; # you can put a list of servers
 
#loop through each server
foreach ($svr in $ServerList)
{
    $jobs = get-dbaagentjob -SqlInstance $svr | where name -like "test*"; # replace "test*" with your desired table name patterns
 
    # loop through each job
    foreach ($j in $jobs)
    {
        [string]$qry = @'
       declare @recipients varchar(255) ='[email protected]' -- change it to your own email
       , @subject varchar(255)= '[Failed] Job [$(ESCAPE_NONE(JOBNAME))] failed on [$(ESCAPE_NONE(SRVR))]'
       , @body varchar(max), @crlf char(2)= char(0x0d) + char(0x0a);
       select top 1 @body = 'The failed step is step_id=' + cast(h.step_id as varchar(5))
       + '; step_name=' + h.step_name + @crlf + @crlf + 'Message:' + @crlf + h.message
       from dbo.sysjobhistory h
       inner join dbo.sysjobs j
       on h.job_id = j.job_id
       and j.name = '$(ESCAPE_NONE(JOBNAME))'
       where step_id <> 0
       and run_status=0
       order by instance_id desc
       print @body;

       exec sp_send_dbmail @recipients = @recipients
       , @subject = @subject
       , @body = @body;
'@;
 
        $final_step = @{
        sqlinstance = $svr; #
        job = "$($j.Name)";
        stepName = "Failure Notification";
        command = $qry;
        Database = "msdb"
        onFailAction = "QuitWithFailure";
        OnSuccessAction = "QuitWithFailure" };
        $newstep = New-DbaAgentJobStep @final_step; #adding the final job
 
        #scan all previous steps and update any job step with onfailure action as "QuitWithFailure" to "GoToStep" -> the last step.
 
        $j.JobSteps.Refresh();
        $cnt = $j.JobSteps.count; #$cnt already includes the newly added final step
 
        for ($i=0; $i -lt $cnt-1 ; $i++)
        {
          $js = $j.JobSteps[$i];
          if ($js.OnFailAction -eq 'QuitWithFailure')
          {
            $js.OnFailAction = "GoToStep";
            $js.OnFailStep = $cnt;
            $js.alter();
          } #if
        } #foreach $i
        $j.refresh();
    } #foreach $j
} #foreach $svr 

After I run the script, I can see the job steps changed to the following

Results after adding final stp

What we can see is that Step [S1] when on failure still goes to step [S4], i.e. no change, while all others (S2, S4 and S5) will go to the new final step [Failure Notification] on failure. That’s exactly what is required.

If I run the sample job, it will fail at step 2.

Job Execution History

I will get an expected email notification from the final step.

Email Notification

Summary

This tip shows how to add a final step in a job for error notification purposes, and the tricky part is to scan through all previous job steps and pick up the steps whose "on failure" actions are "quit the job reporting failure", and update them to "on failure, go to the last job step".

The script can run against multiple servers and update multiple jobs at one time if the final step is the same.

Next Steps

There may be some other potential common scenarios, for example, adding a first job step to do some business work, such as checking some conditions before starting the job, for this requirement, it is relatively easier as all we need to do is to add a new step and update it to be the starting job step.

There are many SQL Agent job related commands in the dbatools module and they are really handy tools (such as copying a job from one instance to another) and you can find out all of them using the following command:

Get-command -module dbatools -name *agent*

Currently there are 47 cmdlets covering various aspects of SQL Server job management: such as job category, job schedule, proxy, agent log, etc.

dbatools sql agent job cmdlet count

Please take a look at these other useful articles:



sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Jeffrey Yao Jeffrey Yao is a senior SQL Server consultant, striving to automate DBA work as much as possible to have more time for family, life and more automation.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2022-01-21

Comments For This Article

















get free sql tips
agree to terms