Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 
Failover Clustering in the Cloud - Understanding Your Options - Free Webinar
 

Parallel package execution in Azure-SSIS Runtime


By:   |   Last Updated: 2018-11-29   |   Comments   |   Related Tips: More > Azure

Problem

I have created an SSIS project in Visual Studio. In this project, I have a master package starting multiple child packages. These child packages run in parallel. When executing my project on a local server, everything runs fine and performance is optimal. However, after migrating the project to the Azure cloud using the Azure-SSIS runtime, performance has degraded. What can be the cause of this issue?

Solution

As explained in the tips Configure an Azure SQL Server Integration Services Integration Runtime and Executing Integration Services Packages in the Azure-SSIS Integration Runtime you can easily lift and shift your existing SSIS projects to the Azure-SSIS runtime with minimal or no changes. However, to make sure performance stays optimal, some changes do have to be made to make sure packages keep running in parallel. In this tip, we’ll share a solution on how to ensure your packages use the maximum of the available resources on the Azure-SSIS cluster. If you haven’t already read the two mentioned tips, please do so as this tip will built upon the knowledge shared in those tips.

Executing Packages in Parallel in the Azure-SSIS Runtime

Test Set-up

First, we need a package performing a task that takes a little while so we can monitor the process. We’re going to use a query to generate 5 million rows and insert them into a table in an Azure SQL database. We’re using the SSIS data flow, to make sure data is transferred over the network to the cluster in the Azure-SSIS runtime where it will consume CPU and memory. The query uses a tally table to generate the rows and is based on the article Creative Solutions by Using a Number Table by Greg Larsen.

WITH
    L0   AS(SELECT 1 AS C UNION ALL SELECT 1 AS O), -- 2 rows
    L1   AS(SELECT 1 AS C FROM L0 AS A CROSS JOIN L0 AS B), -- 4 rows
    L2   AS(SELECT 1 AS C FROM L1 AS A CROSS JOIN L1 AS B), -- 16 rows
    L3   AS(SELECT 1 AS C FROM L2 AS A CROSS JOIN L2 AS B), -- 256 rows
    L4   AS(SELECT 1 AS C FROM L3 AS A CROSS JOIN L3 AS B), -- 65,536 rows
    L5   AS(SELECT 1 AS C FROM L4 AS A CROSS JOIN L4 AS B), -- 4,294,967,296 rows
    Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS N FROM L5)
SELECT TOP 5000000 N AS N1, N AS N2 FROM Nums;

This query generates two integer columns, with values ranging from 1 to 5 million. That’s 8 bytes per row, or 40 Megabytes in total.

We’re going to run three identical packages in parallel. The data flow looks like this:

data flow

The control flow simply truncates the destination table and loads the generated data into the database.

control flow

One master package kicks off the three packages in parallel using Execute Package Tasks.

master package with execute package tasks

Each package writes to its own destination table, so there is no concurrency and locking on a single table.

destination tables

After we’ve deployed the SSIS project, we can start the master package in the SSIS catalog.

start master package

In Azure Data Factory, we can monitor the IR environment to see how many resources are consumed and how many packages are running.

monitor the IR runtime

At the resource monitor page, we can see only one job is running, while there are actually 4 packages running (the master package and the three child packages).

resource manager no parallel

We can verify if all packages are running at the same time by doing a count on the destination tables with the NOLOCK hint.

sql query with nolock

The problem is all the child packages are being executed inside the execution of the master package. In other words, they are being treated as part of the master package, so their executions are not spread out over the cluster.

Solution to Solve Problem

Luckily, this can be fixed. Instead of using Execute Package Tasks, we using the SSIS catalog stored procedures to start the child packages. The following stored procedure can be used to simplify the T-SQL needed to create and start an execution:

CREATE PROC [dbo].[RunPackage]
    (@PackageName   VARCHAR(50)
    ,@FolderName    VARCHAR(50)
    ,@ProjectName   VARCHAR(50)
    ,@Synchronized  BIT = 1 -- run synchronously by default
    )
AS
BEGIN
    DECLARE @execution_id BIGINT;

    EXEC [SSISDB].[catalog].[create_execution]
         @package_name      = @PackageName
        ,@execution_id      = @execution_id OUTPUT
        ,@folder_name       = @FolderName
        ,@project_name      = @ProjectName
        ,@use32bitruntime   = False
        ,@reference_id      = NULL;
 
    --SELECT @execution_id;
    EXEC [SSISDB].[catalog].[set_execution_parameter_value]
         @execution_id
        ,@object_type       = 50
        ,@parameter_name    = N'SYNCHRONIZED'
        ,@parameter_value   = @Synchronized;
 
    EXEC [SSISDB].[catalog].[start_execution] @execution_id;
END

This stored procedure creates an SSIS execution, sets the Synchronized parameter and starts the execution. By default, the Synchronized parameter is set to true, which means the stored procedure will wait till the SSIS package has finished executing. A sample T-SQL script to call this stored procedure would be:

EXEC dbo.RunPackage 'CloudTest1.dtsx', 'MSSQLTips', 'MSSQLTIPS', 1;

Since Azure SQL databases don’t allow cross-database queries, this stored procedure must be created inside the SSISDB database.

Now we need to replace the Execute Package Tasks with Execute SQL Tasks. Each of those tasks will call the stored procedure to start a child package.

new master package

Let’s deploy this master package to the catalog and execute it to monitor its behavior. When running this master package, the resource monitor now shows that every node of the IR environment is executing jobs:

parallellism achieved

Conclusion

By abandoning Execute Package Tasks and rather using the SSIS catalog stored procedures, we can optimally spread out the execution of packages over the Azure-SSIS integration runtime. This tip provides you with an easy to use stored procedure that creates and starts an SSIS package execution. Keep in mind this stored proc has to be deployed in the SSISDB database.

Next Steps


Last Updated: 2018-11-29


next webcast button


next tip button



About the author
MSSQLTips author Koen Verbeeck Koen Verbeeck is a BI professional, specializing in the Microsoft BI stack with a particular love for SSIS.

View all my tips
Related Resources





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Learn more about SQL Server tools