Tracking and Counting SQL Server Agent Job Runs
I occasionally receive requests to count SQL Server Agent Job runs for the past week, month, quarter, or year. However, I notice that the msdb sysjobhistory table does not always maintain historical job run data for the requests that I receive. Job run counts are sometimes less than they should be, and the shortfall is typically most common when I have to count jobs over longer periods, jobs with many steps, or jobs that run very often. Please provide some code samples and discuss some administrative issues that will help me to track and count SQL Server Agent Job runs validly for a variety of different requirements.
The sysjobhistory table in the msdb database is the table that maintains historical data on SQL Server Agent job runs, such as when they run, how long they run, and whether a run is successful. This kind of information is maintained at both individual job step and overall job levels. The sysjobhistory table is a log table, and SQL Server Administrators can access the log table through manual or programmatic interfaces. This link introduces SQL Server Agent manual features, including how to view the history of job run statistics retained by SQL Server Agent.
To generate valid historical data about SQL Server Agent Job runs, you need a working knowledge of the kind of data in the sysjobhistory table as well as how SQL Server enables you to administer limits on the amount of data stored in the sysjobhistory table. There are default settings for limits on the amount of history maintained in the sysjobhistory table. Additionally, you can manually or programmatically configure the limits of how much data can be stored in the sysjobhistory table.
In this tip, you will get an introduction to the columns and rows of the sysjobhistory table. You will learn manual and programmatic means of setting limits on the amount of data in the sysjobhistory table. The tip also includes code samples to help you maintain historical data about SQL Server job runs outside of the built-in msdb SQL SERVER Agent tables for special job run tracking requirements.
There are only about a handful of jobs on the SQL Server Agent used to demonstrate code samples within this article. Personal experience confirms that the tips apply to SQL Server Agent installations with at least scores of jobs. The actual maximum number of jobs to which the tips apply is very likely much larger than several scores of jobs.
A Simple Listing of Column Values from the sysjobhistory table
The sysjobhistory table stores historical data about job runs at both the job step and overall job levels. Here's a query for extracted selected column values from the sysjobhistory table. The query joins the sysjobhistory table to the sysjobs table by job_id so that name from the sysjobs table can appear in the result set along with selected column values from the sysjobhistory table.
A set of rows exists in the sysjobhistory table for each run of a job. A job run with a single step will have 2 rows in its set -- one for its sole step and another for the overall job. In general, the row set size for a job run will be one more than the number of executed steps in the job. If a job is disabled and it is never run on demand, the job will not have any rows in the sysjobhistory table.
A summary of each of the selected sysjobhistory columns in the query below appears next.
- Instance_id is an identifier column value with an increment value of 1 for each row in the sysjobhistory table.
- Job_id is a column value for matching rows from the sysjobhistory table to corresponding jobs from the sysjobs table.
- Step_id and step_name column values can denote information for a specific
step in a job or for the overall job.
- When the step_id equals 0 and the step_name value is (Job outcome) then the row is for the overall job.
- When the step_id is greater than 0 and the step_name value is the name of a job step, then the row is for a specific step within the job.
- Sql_message_id and sql_severity provide information about the message_id value and severity of any error that occurs during the running of a job step. These column values are zero when the row is for the overall job.
- Run_status can have values of 0, 1, 2, or 3.
- A run_status value of 1 indicates the job step or the overall job ran successfully.
- Values of 0, 2, or 3 indicate, respectively, statuses of failed, retried, or canceled.
- Run_date returns in yyyymmdd format the date when a job step or the overall job started; this column has an int, instead of a datetime, data type.
- Run_time shows in hhmmss format the time when a job step or overall job started; this column has an int, instead of a datetime, data type.
- Run_duration displays in hhmmss the duration for running a job step or an overall job. If a job executes four steps during a run and each step takes thirty seconds to complete, then the run_duration value will be 30 for each step row and 200 for the overall job row.
-- listing of all runs for job steps by name and instance_id in sysjobhistory select name ,sysjobhistory.instance_id ,sysjobhistory.job_id ,sysjobhistory.step_id ,sysjobhistory.step_name ,sysjobhistory.sql_message_id ,sysjobhistory.sql_severity ,sysjobhistory.run_status ,sysjobhistory.run_date ,sysjobhistory.run_time ,sysjobhistory.run_duration from msdb.dbo.sysjobs left join msdb.dbo.sysjobhistory on sysjobs.job_id = sysjobhistory.job_id order by name, instance_id
The screen shot below presents an excerpt from the result set for the last two runs of the job named Four step job with error branching on July 14, 2017 and the first two runs of the job on July 15, 2017. The shaded rows are for the last two job runs on July 14, and the un-shaded rows are for the first two runs on July 15, 2017.
The excerpt clearly shows that four steps per job execute on July 14, but only three steps per job run execute on July 15. The missing step on July 15 is always the same one - step 3. The code and settings controlling why July 15 job runs have a missing step is shown and described in a prior MSSQLTips.com tip. Two issues are responsible for the omission of the missing step on July 15, which is a Saturday.
- On Saturdays only, such as July 15, the job code raises an error in its second step with a message_id of 50000 and a severity value of 16.
- The On Failure setting for the second job step directs the job flow to step 4 when an error occurs in step 2. If there is no error in step 2, the On Success setting for the second job step routes the job flow to step 3.
The log listing below reflects the raised error in its run_status column values of 0 for the second steps for job runs on July 15. Otherwise, the run-status column value is 1 for all job run steps on July 14 and July 15.
The original tip describing the Four step job with error branching job shows a 10-second wait in each step of the job. Aside from the 10-second wait, the code executes almost instantly. Therefore, the duration of each step is just about 10 seconds. The run_duration column values for individual steps within the job clearly reflects the 10-second job step durations. Additionally, the overall job rows with a step_name value of (Job outcome) have a run_duration column value of 40 seconds on July 14 and 30 seconds on July 15. The reduced durations on July 15 reflect the
Count and Total Duration of Successful and Non-Successful Job Runs for Enabled Scheduled Jobs
Aside from just listing historical data for job runs, you can use the sysjobhistory table to count the number of successful and non-successful job runs in the sysjobhistory table. A result set providing this kind of information can provide information about how a SQL Server instance is being used and issues that merit attention. The sysjobhistory table tracks all kinds of job runs, but the sample query in this section restricts its focus to enabled, scheduled jobs. An enabled job is a job that can run on a schedule, and a scheduled job is a job that has a schedule associated with it.
In developing a count of job runs, you can also split the data in any of several ways. One especially useful way is to report successful job runs versus job runs that were not successful. There are three types of job runs that are not successful. If a job run does not succeed, then it can have a status of failed, retried, or canceled. Recall that jobs which succeed have a run_status value of 1; the three not successful run statuses have values of 0, 2, and 3, respectively, for statuses of failed, retried, or canceled.
Another salient consideration for counting job runs is to recognize that you only require a subset of the rows in the sysjobhistory table. This is because the sysjobhistory table has a separate row for each step within a job as well as an additional row just for the overall job. Recall that sysjobhistory rows for the overall job have a step_id of 0 and a step_name value of (Job outcome). If you do not include a filter for the step_id or the step_name associated with overall job, then you may over count the number of job runs from the sysjobhistory table.
The following script illustrates the application of the preceding guidelines for counting enabled, scheduled job runs according to whether they are successful or not. The script joins result sets from three sub-queries.
- The first sub-query is named enabled_scheduled_jobs. This sub-query selects job_id values and job name values for jobs from the sysjobs table that have an enabled value of 1 as well as a sysjobs job_id values that are also in the sysjobschedules table. Jobs in the sysjobschedules table have a schedule associated with them.
- The second sub-query has the name success_jobs. This sub-query counts rows from the sysjobhistory table that are for overall jobs filtered to just include rows for successful jobs (those having a run_status value equal to 1). In addition to counting the number of rows for successful overall job rows, the sub-query also sums the values of run_duration for filtered rows.
- The third sub-query has the name not_success_jobs. This sub-query is identical to the second sub-query except that it filters for rows with a run_status value of not equal to 1 (!= 1).
The three sub-queries are joined by job_id. The first sub-query returns just enabled, scheduled jobs. The other two sub-queries for successful and not successful job runs are left joined to the first subquery. The main query extracts information from all three sub-queries into a single result set that gathers information from the sysjobs, sysjobschedules, and sysjobhistory tables. This information includes job names, counts of the number of successful and unsuccessful job runs, as well as total duration for successful and unsuccessful job runs in seconds.
The computation of the total duration for job runs merits special commentary. This computation is based on the run_duration column value from the sysjobhistory table. Recall that the format for these column values are hhmmss in an int data type. Furthermore, hh and mm time blocks are omitted if there are no values for them. The computation of a job's total duration as a sum of run_duration values across the executed steps of a job requires the conversion of all three time blocks (hh, mm, ss) to a common metric. The expression in the script converts the duration for each job run step to seconds before summing across all the steps within a job run and then all the job runs overall.
-- count of successful and non-successful job runs for enabled, scheduled jobs select enabled_scheduled_jobs.name ,isnull(success_jobs.number_of_success_runs,0) number_of_success_runs ,isnull(success_jobs.total_duration_for_success_runs_secs,0) total_duration_for_success_run_secs ,isnull(not_success_jobs.number_of_not_success_runs,0) number_of_not_success_runs ,isnull(not_success_jobs.total_duration_for_not_success_runs_secs,0) total_duration_for_not_success_run_secs from ( select distinct sysjobs.job_id,name from msdb.dbo.sysjobs inner join msdb.dbo.sysjobschedules on sysjobs.job_id = sysjobschedules.job_id where enabled = 1 ) enabled_scheduled_jobs LEFT JOIN ( -- listing of all job runs and run_durations by job name for jobs with a success status select sysjobhistory.job_id ,count(*) number_of_success_runs ,sum ( SUBSTRING(RIGHT(REPLICATE('0', 6) + CAST(run_duration as varchar(6)), 6),5,2) +(SUBSTRING(RIGHT(REPLICATE('0', 6) + CAST(run_duration as varchar(6)), 6),3,2) * 60) +(SUBSTRING(RIGHT(REPLICATE('0', 6) + CAST(run_duration as varchar(6)), 6),1,2) * 60 * 60)) total_duration_for_success_runs_secs from msdb.dbo.sysjobhistory where step_id = 0 and run_status = 1 group by sysjobhistory.job_id ) success_jobs ON enabled_scheduled_jobs.job_id = success_jobs.job_id LEFT JOIN ( -- listing of all job runs and run_durations by job name for jobs with a non-success status select sysjobhistory.job_id ,count(*) number_of_not_success_runs ,sum ( SUBSTRING(RIGHT(REPLICATE('0', 6) + CAST(run_duration as varchar(6)), 6),5,2) +(SUBSTRING(RIGHT(REPLICATE('0', 6) + CAST(run_duration as varchar(6)), 6),3,2) * 60) +(SUBSTRING(RIGHT(REPLICATE('0', 6) + CAST(run_duration as varchar(6)), 6),1,2) * 60 * 60) ) total_duration_for_not_success_runs_secs from msdb.dbo.sysjobhistory where step_id = 0 and run_status != 1 group by sysjobhistory.job_id ) not_success_jobs ON enabled_scheduled_jobs.job_id = not_success_jobs.job_id
The next screen shot shows a result set from the preceding script for the enabled, scheduled jobs on the SQL Server Agent installation. Notice there are just 5 columns, and each column has just 5 rows, which includes one row for each enabled, scheduled job.
- The first column is for the name of a job; it is from the sysjobs table.
- The second and third columns are for the count and the total duration in seconds for the set of successful job runs.
- The fourth and fifth columns are for the count and the total duration in seconds for the set of unsuccessful job runs.
Happily, the overwhelming majority of all the job runs for enabled, scheduled jobs are successful (over 99.6%). However, the result set does highlight 3 job runs out of 29 job runs for the Compute Inserts daily at midnight job that are not successful. Because all jobs were unit tested and verified as successful when run by themselves, the pattern of failures points to potential conflicts of the Compute Inserts daily at midnight job with one or more other jobs.
Continued testing with the code above and examination of the underlying job settings pointed to the root cause of the job run failures for the Compute Inserts daily at midnight job. It turns out that this job and another job named Create a two-step reporting job can insert rows into the same table at about the same time (midnight). Additional testing verified that either of the two jobs (Compute Inserts daily at midnight or Create a two-step reporting job) could fail. The two jobs were created for two prior but different tip articles on how to create SQL Server Agent jobs. Because both jobs perform the same function, one job can be deleted (or disabled if you wish to retain it for future reference). The more robust job has the name Create a two-step reporting job. Therefore, the conflict was eliminated by disabling the job named Compute Inserts daily at midnight.
It should also be noted that at least two jobs (Four step job with error branching and Insert a value in a table with a schedule) have more than 50 successful job runs each. The default limit on the number rows for a job in the sysjobhistory table is 100. Therefore, for a single-step job that requires two rows in the sysjobhistory table for each run, the maximum number of tracked job runs is 50. In order to track more than 50 job runs for a job that can run many times within a day and/or on most days of each week, you will typically need to modify the default limit value for the number of rows per job. The next several sections dive more deeply into the limits on the number of job runs tracked by the sysjobhistory table and how to modify those limits as well as even bypass them for providing job run reports, such as variants of the one appearing in the preceding screen shot.
Guidelines for Controlling Job Run Limits for the Sysjobhistory Table
As mentioned, the results in the preceding screen shot were derived with non-default limits for the number of runs tracked in the sysjobhistory table. You can view the current job run limits for the sysjobhistory table by right-clicking SQL Server Agent in Object Explorer, selecting Properties, and displaying the History page. The following screen shot shows the settings that were in place for the result set in the preceding screen shot.
Notice the limits for the history log table (sysjobhistory) from the screen below can be set based on a number of rows or on a historical retention period, such as a number of weeks. Recall that a job with just one step will have two rows per job run, and the number of rows per job run grows as the number of steps in a job grow. As mentioned previously, the default limit for job run rows is 100 per job and 1000 rows over all jobs. These default limits allow the job history table to be opened quickly, but the limits may not be appropriate for tracking jobs that run often and/or have many steps or which need to be tracked over several weeks or even months. When your primary objective is counting job runs instead of just quickly opening the history log window in the SQL Server Agent user interface, you may find it desirable to set limits based on the number of steps and schedules for the jobs for which you plan to count job runs. The limits in the screen below are 1000 rows per job and 20000 rows overall.
This link to the Microsoft.com site gives step-by-step instructions on how to manually change the log history retained in terms of number of rows within the sysjobhistory table or length in time from today.
- If you want to re-size from the history log manually based on number of rows in the History page of the SQL Server Agent Properties dialog
- make sure the Limit size of job history log box is checked
- type the new values into text boxes for the Maximum job history log size (in rows) and Maximum job history rows per job
- click OK
- If you want to re-size the history log based on a retained history window duration
- start by checking the Remove agent history box in the preceding screen
- select the appropriate period (days, weeks, or months) and enter the number of periods for which you want to retain history
- click OK
This link from MSSQLTips.com includes instructions and sample code for programmatically modifying the length of the history log via either sysjobhistory table rows or days of history. Additionally, when using a programmatic approach to manage the size of the history log, you can specify different log lengths for different jobs. With the manual approach, the length of the history log must be the same for all jobs.
Demonstration of Controls for Job Run Limits for the Sysjobhistory Table
The preceding section introduces you to manual and programmatic approaches for controlling how much history is retained in the sysjobhistory table. No matter which approach you use for history retention, there will come a time when retaining more job history is no longer feasible - especially if your SQL Server Agent has a lot of jobs and/or some jobs have many steps that occur often, such as multi-step jobs that run at short intervals throughout the day. The history retention issue arises because the job history table also is the source for showing job run details in the SQL Server Agent user interface, and big tables can be relatively slow to open and tedious to navigate manually.
You may be able to mitigate partially the negative consequences of retaining more history by retaining more history for jobs that run less frequently and less history for jobs that run more frequently. The following script demonstrates the use of some code for resetting the amount of history retained for enabled, scheduled jobs in the sysjobhistory table. Recall that the Compute Inserts daily at midnight job was disabled based on its conflict with the job named Create a two-step reporting job. As a consequence, there are just four jobs for which job runs are tracked in this section instead of the five jobs in the Count and Total Duration of Successful and Non-Successful Job Runs for Enabled, Scheduled Jobs section.
In the following script, the history window is set to 2 days for 2 jobs and 14 days for 2 other jobs. There was no attempt to set the history window size based on the frequency of job runs. In any event, you will need to modify the code for the names of jobs your SQL Server Agent manages. When adapting the code at that time, you should factor in a consideration of how frequently a job runs when setting the history window size over which to retain history.
The first job batch invokes the sp_set_sqlagent_properties stored procedure while passing values of -1 for the @jobhistory_max_rows and @jobhistory_max_rows_per_job parameters. This code causes SQL Server Agent to ignore any settings for the Maximum job history log size (in rows) and Maximum job history rows per job settings in the History page of the SQL Server Agent Properties dialog. Any Maximum job history log size (in rows) and Maximum job history rows per job settings in the History page remain in the dialog, but they are ignored when retaining job run history from the sysjobhistory table.
The second batch starts by setting up to assign a history window of 14 days for two jobs. You can do this through assigning an expression for 14 days before today to the @OldestDate local variable. The transformation of the history window size is achieved by invoking the sp_purge_jobhistory stored procedure twice - for the jobs named Insert a value in a table with a schedule and syspolicy_purge_history. On each invocation of the stored procedure, the @OldestDate local variable value is assigned to the @oldest_date stored procedure parameter.
After modifying the 14-day history window expression to a 2-day history window expression, the @OldestDate local variable value is re-specified for a 2-day history window. Once the local variable value is assigned to the @oldest_date stored procedure parameter, the remainder of the code can be re-used just by changing the @JobName local variable for the remaining two jobs named: Four step job with error branching and Create a two-step reporting job.
-- Run this before the script to count successful and non-successful job -- runs for enabled, scheduled jobs -- Reset retention history length before re-running code to count job runs EXEC msdb.dbo.sp_set_sqlagent_properties @jobhistory_max_rows=-1, @jobhistory_max_rows_per_job=-1 GO -- Compute variable retention history lengths for different jobs -- 14 days for Insert a value in a table with a schedule and syspolicy_purge_history -- 2 days for Four step job with error branching, Create a two-step reporting job, and -- Compute Inserts daily at midnight DECLARE @OldestDate datetime DECLARE @JobName varchar(256) -- Keep Last 14 days for -- Insert a value in a table with a schedule and SET @OldestDate = GETDATE()-14 SET @JobName = 'Insert a value in a table with a schedule' EXEC msdb.dbo.sp_purge_jobhistory @[email protected], @[email protected] SET @JobName = 'syspolicy_purge_history' EXEC msdb.dbo.sp_purge_jobhistory @[email protected], @[email protected] -- Keep Last 2 days SET @OldestDate = GETDATE()-2 SET @JobName = 'Four step job with error branching' EXEC msdb.dbo.sp_purge_jobhistory @[email protected], @[email protected] SET @JobName = 'Create a two-step reporting job' EXEC msdb.dbo.sp_purge_jobhistory @[email protected], @[email protected]
The following screen shot shows the result set from the script presented in the Count and Total Duration of Successful and Non-successful Job Runs for Enabled, Scheduled Jobs section after running the preceding code in this section to modify job history retention and after disabling the Compute Inserts daily at midnight job.
The two jobs with 14-day history retention windows were impacted differently regarding how many job runs are retained.
- The Insert a value in a table with a schedule job was limited to 500 historical job runs in the previous section, but in this section it shows more than 500 job runs. This single-step job runs frequently on a daily basis so that it quickly reached and exceeded the 1000 row limit per job in the prior section, but this section has a limit not based on sysjobhistory rows but on days from when the new retained history window was set. The retained history window for this job allows for the retention of more job runs.
- On the other hand, the syspolicy_purge_history job shows fewer historical job runs in this section than the prior section. This is because this job runs just once daily, and there are just 4 history log rows per job run. Therefore, the historical jobs never reached the 1000-row limit in the prior section. However, the 14-day history window for the result set in this section curtailed the storage of job runs after a specified date (14 days from when new retained history window were designated). In fact, there are fewer than 14 historical runs because the SQL Server Agent was stopped on one of the days in the 14-day interval.
The two jobs with 2-day history retention windows illustrate other features of time-based settings for a retention interval. The 2-day interval specifies a beginning date as of the time the retention interval is set, but the beginning date does not automatically reset with each day. Therefore, because the retention interval was set after midnight but before 3 PM on July 20, 2017, historical data is available as early as July 20, 2017 depending on the intersection of the schedule for a job and when the retention interval is set. In interpreting the result set outcomes, you should keep in mind that SQL Server Agent references midnight as the start of a day and not the end of day.
- Because the schedule for the job named Four step job with error branching starts at 3PM and runs through the end of the day (just before the start of the next day at midnight), it has a full day of job runs for July 20 and 21 along with a part-day of job runs for July 22 through when the query was run for the result set in the screen shot below.
- On the other hand, there are just two days of job runs on July 21 and July 22 for the Create a two-step reporting job. This job runs once per day on midnight of each day. Because the retention interval was not set until after midnight on July 20, there is no job run on July 20 in the sysjobhistory table. Recall also that the unsuccessful job run for this job occurred on the day before the Compute Inserts daily at midnight job was disabled.
Creating an External Job History Table
One workaround for deciding how much history to store in the sysjobhistory table is to export periodically job run history to a table in another database besides the msdb database. You can build an external job history table to store as much history as you like in a table within another database. By using a table in a non-msdb database, you are able to store more job run history without any impact on the size of the sysjobhistory table in the msdb database.
To give you some ideas about how to approach this kind of task, this tip first shows how to make a table in a non-msdb database for storing job run history. Then, it shows how to populate the external table created to store the history.
The following script creates a table named external_job_history in the dbo schema of the for_sql_server_agent_jobs database. You are likely to create this table just once, but you may want to back it up on a regular basis after you start populating it.
-- create an external table to msdb for holding the -- history of sysjobhistory table begin try drop table for_sql_server_agent_jobs.[dbo].[external_job_history] end try begin catch end catch CREATE TABLE for_sql_server_agent_jobs.[dbo].[external_job_history]( [name] [sysname] NOT NULL, [step_name] [sysname] NULL, [sql_message_id] [int] NULL, [sql_severity] [int] NULL, [run_status] [int] NULL, [run_date] [int] NULL, [run_time] [int] NULL, [run_duration] [int] NULL ) ON [PRIMARY]
You can use the next simple script as the basis for a process of inserting rows into the external_job_history table. Recall that our tip is concerned exclusively with enabled, scheduled jobs. You may care to tweak the code if your interests are broader or narrower.
-- populate the external history table INSERT INTO for_sql_server_agent_jobs.dbo.external_job_history select name ,sysjobhistory.step_name ,sysjobhistory.sql_message_id ,sysjobhistory.sql_severity ,sysjobhistory.run_status ,sysjobhistory.run_date ,sysjobhistory.run_time ,sysjobhistory.run_duration from msdb.dbo.sysjobs left join msdb.dbo.sysjobhistory on sysjobs.job_id = sysjobhistory.job_id where sysjobs.enabled = 1 and sysjobhistory.run_date is not null order by name, instance_id, run_date, run_time
If you do not build some controls to guard against the insertion of duplicate history rows, you may from time to time insert duplicate job run history rows into the external_job_history table. These can be easily excluded from job run reports by using a query such as the following one for all historical job run analyses.
-- list distinct rows from external_job_history table select distinct * from for_sql_server_agent_jobs.dbo.external_job_history
- The easiest next step may be to use the script in the Count and Total Duration of Successful and Non-Successful Job Runs for Enabled, Scheduled Jobs section to count the number of enabled, scheduled jobs on your SQL Server Agent. You might just discover some unsuccessful runs that were previously undetected. At the very least, you will discover how many runs per job the history log is storing.
- Next, you can examine the History page of the SQL Server Agent Properties dialog. If it is still set at the default values of 100 rows per job and 1000 rows overall, you may benefit from scanning the retained run_date values for jobs using the script in the A Simple Listing of Column Values from the sysjobhistory table section. Do the run-date values span a range that is long enough for your job run reporting requirement? If not, change the amount of retained job run history using any of the approaches described in this tip.
- Another easy and potentially high-impact next step would be to create a custom table with job run history that is not subject to the limits of the sysjobhistory table in the msdb database. See the Creating an External Job History Table section for some code to help you start implementing this step.
- Finally, I urge you to re-read this tip and study some of the referenced links to expand your understanding of how SQL Server Agent tracks job run history and how to manipulate SQL Server Agent to track job runs in any ways that your needs require.
About the author
View all my tips
Article Last Updated: 2017-08-11