Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Scripts          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Find SSIS Package Stuck in Running Status


By:   |   Last Updated: 2019-05-30   |   Comments   |   Related Tips: More > Integration Services Administration

Problem

SQL Server Integration Services (SSIS) executions sometimes go off the rails and runs indefinitely instead of failing gracefully. It is extremely important to stop such executions to allow new instances to kick off at the next scheduled interval.  In this tip, we will set up a basic monitoring and alerting mechanism to stop orphaned SSIS executions.

Solution

As described in the problem section, I have noticed multiple times that SSIS executions go orphaned for one reason or the other and silently eat server resources. Not only that, these executions won’t stop until a server restart. During a server restart, the start-up stored procedure ([dbo].[sp_ssis_startup]) will take care of these ghost executions, but we probably do not want to wait until a server restart and let it steal server resources.

Unfortunately, I do not have a demo script to create orphaned executions, but most of the time I have noticed this situation due to severed connections either due to a remote server restart or some form of network interruption while a data transfer is in progress.

To overcome this situation of ghost executions, I wrote a monitoring solution. The monitoring solution consists of a configuration table and a stored procedure. We have a dedicated user database to hold monitoring and maintenance utilities developed by the DBA group, but you can create these objects in any user database.

We will go through the logic step by step. The complete stored procedure is at the end of this article.

Step 1: Configuration for Monitoring and Alerting

This is a table-driven monitoring solution. We will first create a table to hold configuration data for our monitoring solution. The configuration data will include SSIS package name along with its environment name (optional), threshold duration in seconds and alerting email address.

The [threshold_time_sec] duration represents the maximum amount of time (in seconds) each package should take to finish. Any package execution running over that duration will be picked up by the monitoring procedure and will be stopped.  The [alert_email] column holds the values of email address for the group of people/person, to be notified if there is an issue.

---- create table to hold configuration data
CREATE TABLE [dbo].[ssis_monitor_configure] (
  [package_name] [nvarchar](100) NOT NULL,
  [environment_name] [nvarchar](100) NULL,
  [threshold_time_sec] [int] NULL,
  [monitored] [bit] NULL,
  [alert_email] [varchar](100) NULL
) ON [primary];
GO
CREATE CLUSTERED INDEX ix_ssis_monitor_configure_package_name_env_name_monitored
ON dbo.ssis_monitor_configure (package_name, environment_name, monitored);
GO

Step 2: Populating Configuration Data

Once the table is created, we will insert the configuration data for each package. You can use the below query to get started and populate it with SSIS package data from your environment. MAKE SURE to tune/change the configuration data as per your need. Alternatively, you can start with just a few packages in the configuration table and grow from there.

---- populate configuration data

INSERT INTO [dbo].[ssis_monitor_configure] ([package_name]
, [environment_name]
, [threshold_time_sec]
, [monitored]
, [alert_email])
  SELECT DISTINCT
    p.name 'pacakge_name',
    NULL 'environment_name', --populate environment name
    3600 'threshold_sec', --change it as per your need? 
    0, --monitored 
    '[email protected]' 'alert' --alert email
  FROM ssisdb.internal.packages p (NOLOCK);

Step 3: Find Long Running Executions

We will query the SSISDB catalog system view to find long running executions on the server and load them into a temporary table.

-- load long running package information in a temp table
SELECT
  ex.execution_id,
  ex.package_name,
  ex.environment_name,
  ex.caller_name,
  DATEDIFF(SECOND, CAST(ex.start_time AS datetime2), SYSDATETIME()) AS 'run_time_sec',
  smc.threshold_time_sec,
  smc.alert_email INTO #execution_info
FROM ssisdb.catalog.executions(nolock) ex
JOIN dbo.ssis_monitor_configure(nolock) smc
  ON ex.package_name = smc.package_name
WHERE ex.end_time IS NULL  -- still running 
AND ex.status = 2 -- ( 1-created , 2-running ,3-canceled,4-failed ,5-pending,6-ended unexpectedly,7-succeeded ,8-stopping, 9-completed )
AND (ex.environment_name = smc.environment_name
OR smc.environment_name IS NULL)
AND (DATEDIFF(SECOND, CAST(ex.start_time AS datetime2), SYSDATETIME()) > smc.threshold_time_sec);

Step 4: Alert On-Call Person if Necessary

If there are multiple long running executions on the server (more than defined threshold) – we will alert the on call device instead of taking any automated action as there may be a bigger issue in the environment and we probably want some human intervention.

-- alert DBA On call if long running instances are higher than threshold 
IF (@@ROWCOUNT > @oncallthreshold)
BEGIN
  SET @emailsubject = 'SSIS Alert: ' + @@servername + ': More than ' + CAST(@oncallthreshold AS varchar(10)) + ' long running SSIS executions detected.'
  EXEC msdb.dbo.sp_send_dbmail @recipients = @dbaOncall,
                               @subject = @emailsubject,
                               @body_format = 'HTML';
  RETURN;
END   

Step 5: Stop Executions Automatically

In case of only a few long running executions, we will go through each execution and stop them. We will use a cursor as we must stop these executions one by one. We will use system stored procedure (ssisdb.catalog.stop_operation) to stop these executions and notify the appropriate group/person entered in the configuration table.

DECLARE ssis_monitor_cursor CURSOR FAST_FORWARD FOR
SELECT
  execution_id,
  package_name,
  alert_email
FROM #execution_info;
OPEN ssis_monitor_cursor;

FETCH NEXT FROM ssis_monitor_cursor INTO @execution_id, @package_name, @alertemail;

WHILE @@fetch_status = 0
BEGIN

  --stop the long running execution
  EXEC ssisdb.catalog.stop_operation @operation_id = @execution_id;

  --build email body
  SET @emailsubject = 'SSIS Alert: ' + @@servername + ': ' + @package_name + ' is past defined threshold and has been stopped';
  SET @tableHTML = N'<table border="2" cellpadding="2" cellspacing="0">' + CHAR(13);
  SET @tableHTML = @tableHTML + N'<tr><th>execution_id</th><th>package_name</th><th>environment_name</th><th>calling_user</th><th>exec_time_seconds</th><th>threshold_sec</th></tr>' + CHAR(13);
  SET @tableHTML = @tableHTML + CAST((SELECT
    execution_id AS td,
    package_name AS td,
    COALESCE(environment_name, 'not defined') AS td,
    caller_name AS td,
    run_time_sec AS td,
    threshold_time_sec AS td
  FROM #execution_info
  WHERE execution_id = @execution_id
  FOR xml RAW ('tr'), ELEMENTS)
  AS nvarchar(max)) + CHAR(13);
  SET @tableHTML = @tableHTML + N'</table>' + '<br>' + CHAR(13);

  --- notify dba group 
  EXEC msdb.dbo.sp_send_dbmail @recipients = @alertemail,
                               @subject = @emailsubject,
                               @body = @tableHTML,
                               @body_format = 'HTML';

  FETCH NEXT FROM ssis_monitor_cursor INTO @execution_id, @package_name, @alertemail;

END;

CLOSE ssis_monitor_cursor;
DEALLOCATE ssis_monitor_cursor;

Step 6: User Inputs

Make sure to change the following variable values or column values in the stored procedure/configuration table for this solution to work.

  • @dbaOncall - email address for the on-call device
  • @oncallthreshold - threshold for number of long running executions to alert on call device
  • [threshold_time_sec] - column in the configuration table dbo.ssis_monitor_configure
  • [alert_email] - column in the configuration table dbo.ssis_monitor_configure

Finally, hook up the stored procedure to a SQL Server Agent Job and run it on scheduled interval to monitor for ghost executions of SSIS packages.

Complete SSIS Monitoring Stored Procedure

Here is the complete script.

CREATE PROCEDURE [dbo].[ssis_monitor]
----------------------------------------------------------------------------------------------------------------------------
-- LOCATION : SSIS servers 
-- AUTHOR : Aakash Patel
-- DATE : 05/09/2018
-- INPUTS : 
-- OUTPUTS : 
-- DEPENDENCIES : none
-- DEPENDENTS : none
-- DESCRIPTION : This proc checks for long running ssis executions and stops them based on the defined threshold from dbo.ssis_monitor_configure table
-- which app calls this proc  : agent job
-- FREQUENCY : every xx seconds
-- MODIFICATION HISTORY : 
-- 05/09/2018  Aakash Patel : Initial Build 
----------------------------------------------------------------------------------------------------------------------------
AS
BEGIN
  BEGIN TRY
    DECLARE @execution_id bigint
    DECLARE @package_name nvarchar(100)
    DECLARE @tableHTML varchar(max)
    DECLARE @emailsubject AS varchar(150)
    DECLARE @alertemail varchar(50)
    DECLARE @dbaOncall varchar(50) = '[email protected]'  --email address for on call device
    DECLARE @oncallthreshold tinyint = 5  --threshold for long running instances to alert on call device

    -- load long running package information in a temp table

    SELECT
      ex.execution_id,
      ex.package_name,
      ex.environment_name,
      ex.caller_name,
      DATEDIFF(SECOND, CAST(ex.start_time AS datetime2), SYSDATETIME()) AS 'run_time_sec',
      smc.threshold_time_sec,
      smc.alert_email INTO #execution_info
    FROM ssisdb.catalog.executions(nolock) ex
    JOIN dbo.ssis_monitor_configure(nolock) smc
      ON ex.package_name = smc.package_name
    WHERE ex.end_time IS NULL  -- still running 
    AND ex.status = 2 -- ( 1-created , 2-running ,3-canceled,4-failed ,5-pending,6-ended unexpectedly,7-succeeded ,8-stopping, 9-completed )
    AND (ex.environment_name = smc.environment_name
    OR smc.environment_name IS NULL)
    AND (DATEDIFF(SECOND, CAST(ex.start_time AS datetime2), SYSDATETIME()) > smc.threshold_time_sec);

    -- alert DBA On call if long running instances are higher than threshold 
    IF (@@ROWCOUNT > @oncallthreshold)
    BEGIN
      SET @emailsubject = 'SSIS Alert: ' + @@servername + ': More than ' + CAST(@oncallthreshold AS varchar(10)) + ' long running SSIS executions detected.'
      EXEC msdb.dbo.sp_send_dbmail @recipients = @dbaOncall,
                                   @subject = @emailsubject,
                                   @body_format = 'HTML';
      RETURN;
    END
    -- loop through long running executions and stop them
    ELSE
    BEGIN
      DECLARE ssis_monitor_cursor CURSOR FAST_FORWARD FOR
      SELECT
        execution_id,
        package_name,
        alert_email
      FROM #execution_info;
      OPEN ssis_monitor_cursor;

      FETCH NEXT FROM ssis_monitor_cursor INTO @execution_id, @package_name, @alertemail;

      WHILE @@fetch_status = 0
      BEGIN

        --stop the long running execution
        EXEC ssisdb.catalog.stop_operation @operation_id = @execution_id;

        --build email body
        SET @emailsubject = 'SSIS Alert: ' + @@servername + ': ' + @package_name + ' is past defined threshold and has been stopped';
        SET @tableHTML = N'<table border="2" cellpadding="2" cellspacing="0">' + CHAR(13);
        SET @tableHTML = @tableHTML + N'<tr><th>execution_id</th><th>package_name</th><th>environment_name</th><th>calling_user</th><th>exec_time_seconds</th><th>threshold_sec</th></tr>' + CHAR(13);
        SET @tableHTML = @tableHTML + CAST((SELECT
          execution_id AS td,
          package_name AS td,
          COALESCE(environment_name, 'not defined') AS td,
          caller_name AS td,
          run_time_sec AS td,
          threshold_time_sec AS td
        FROM #execution_info
        WHERE execution_id = @execution_id
        FOR xml RAW ('tr'), ELEMENTS)
        AS nvarchar(max)) + CHAR(13);
        SET @tableHTML = @tableHTML + N'</table>' + '<br>' + CHAR(13);

        --- notify dba group 
        EXEC msdb.dbo.sp_send_dbmail @recipients = @alertemail,
                                     @subject = @emailsubject,
                                     @body = @tableHTML,
                                     @body_format = 'HTML';

        FETCH NEXT FROM ssis_monitor_cursor INTO @execution_id, @package_name, @alertemail;

      END;

      CLOSE ssis_monitor_cursor;
      DEALLOCATE ssis_monitor_cursor;
    END;

  END TRY

  ----- error handling 
  BEGIN CATCH
  --insert your standard error handling code
  END CATCH
END;
Next Steps


Last Updated: 2019-05-30


get scripts

next tip button



About the author
MSSQLTips author Aakash Patel Aakash Patel is a Senior SQL Server DBA for a software firm in Connecticut with 10+ years of experience.

View all my tips




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Learn more about SQL Server tools