mssqltips logo

Azure Data Factory ForEach Activity Example

By:   |   Updated: 2019-10-16   |   Comments   |   Related: More > Azure

Problem

Data integration flows often involve execution of the same tasks on many similar objects. A typical example could be - copying multiple files from one folder into another or copying multiple tables from one database into another. Azure Data Factory's (ADF) ForEach and Until activities are designed to handle iterative processing logic. We are going to discuss the ForEach activity in this article.

Solution

Azure Data Factory ForEach Activity

The ForEach activity defines a repeating control flow in your pipeline. This activity could be used to iterate over a collection of items and execute specified activities in a loop. This functionality is similar to SSIS's Foreach Loop Container.

ForEach activity's item collection can include outputs of other activities, pipeline parameters or variables of array type. This activity is a compound activity- in other words, it can include more than one activity.

Creating ForEach Activity in Azure Data Factory

In the previous two posts (here and here), we have started developing pipeline ControlFlow2_PL, which reads the list of tables from SrcDb database, filters out tables with the names starting with character 'P' and assigns results to pipeline variable FilteredTableNames. Here is the list of tables, which we get in this variable:

  • SalesLT.Product
  • SalesLT.ProductCategory
  • SalesLT.ProductDescription
  • SalesLT.ProductModel
  • SalesLT.ProductModelProductDescription

In this exercise, we will add ForEach activity to this pipeline, which will copy tables, listed in this variable into DstDb database.

Before we proceed further, Let's prepare target tables. First, Let's remove foreign key relationships between these tables in the destination database using below script, to prevent ForEach activity from failing:

ALTER TABLE [SalesLT].[Product] DROP CONSTRAINT [FK_Product_ProductCategory_ProductCategoryID]
GO
ALTER TABLE [SalesLT].[Product] DROP CONSTRAINT [FK_Product_ProductModel_ProductModelID]
GO
ALTER TABLE [SalesLT].[ProductCategory] DROP CONSTRAINT [FK_ProductCategory_ProductCategory_ParentProductCategoryID_ProductCategoryID]
GO
ALTER TABLE [SalesLT].[ProductModelProductDescription] DROP CONSTRAINT [FK_ProductModelProductDescription_ProductDescription_ProductDescriptionID]
GO
ALTER TABLE [SalesLT].[ProductModelProductDescription] DROP CONSTRAINT [FK_ProductModelProductDescription_ProductModel_ProductModelID]
GO
ALTER TABLE [SalesLT].[SalesOrderDetail] DROP CONSTRAINT [FK_SalesOrderDetail_Product_ProductID]
GO

Next, let's create stored procedure to purge target tables, using below script. We'll need to call this procedure before each copy, to avoid PK errors:

CREATE PROCEDURE Usp_PurgeTargetTables
AS
BEGIN
delete from [SalesLT].[Product]
delete from [SalesLT].[ProductModelProductDescription]
delete from [SalesLT].[ProductDescription]
delete from [SalesLT].[ProductModel]
delete from [SalesLT].[ProductCategory]
END

Let's follow the below steps to add a ForEach activity to the ControlFlow2_PL pipeline:

Select pipeline ControlFlow2_PL, expand Iterations & Conditionals group on the Activities panel, drag-drop ForEach activity into the central panel and assign a name (I've named it as ForEach_AC):

azure data factory foreach activity

Switch to the Settings tab and enter an expression @variables('FilteredTableNames') into Items text box:

azure data factory foreach activity

Switch to Activities tab and click Add activity button:

azure data factory foreach activity

Drag-drop copy activity to central panel (I've named it as CopyFiltered_AC), switch to Source tab and click '+New' button to start creating source dataset:

azure data factory foreach activity

Next, create Azure SQL Db dataset, pointing to SrcDb database (I've named it as ASQLSrc_DS) and add dataset parameter TableName of string type:

azure data factory foreach activity

Switch to Connection tab and enter an expression @dataset().TableName in the Table text box, which will ensure that table names for this dataset will be assigned dynamically, using dataset parameter:

azure data factory foreach activity

Now that source dataset has been created, let's return to parent pipeline's design surface and enter an expression @item().name in the TableName text box. This expression will ensure that items from the ForEach activity's input list are mapped to its copy activity's source dataset:

azure data factory foreach activity

Next, let's create parameterized Sink dataset for CopyFiltered_AC activity, using a similar method. Here is how your screen should look like:

azure data factory foreach activity

Now that we've completed configuration of CopyFiltered_AC activity, let's switch to the parent pipeline's design surface, using navigation link at the top of the screen:

azure data factory foreach activity

Next, let's add Stored Procedure activity (I've named it as SP_Purge_AC), pointing to the Usp_PurgeTargetTables procedure we created earlier and link it to Set_Variable_AC activity on Success criteria:

azure data factory foreach activity

As the last configuration step, let's link activities SP_Purge_AC and ForEach_AC on Success criteria. This will ensure that target tables will be purged prior to the beginning of copy activities:

azure data factory foreach activity

Finally, let's start the pipeline in Debug mode and examine execution logs in the Output window to ensure that five copy activities (one per each item from FilteredTableNames variable list) have finished successfully:

azure data factory foreach activity

We can also examine the input of the ForEach activity, using the Input button and confirm that it received five items:

azure data factory foreach activity

Since pipeline works as expected, we can publish all the changes now.

I have attached JSON scripts for this pipeline here.

Optional attributes of ForEach activity in Azure Data Factory

ForEach activity has few optional attributes, which allow controlling parallelism degree of its child activities. Here are those attributes:

  • Sequential - This setting instructs ForEach activity to run its child activities in sequential order, one at a time
  • Batch Count - This setting allows specifying parallelism degree of ForEach activity's child activities

Here is the screenshot with these attributes:

azure data factory foreach activity
Next Steps


Last Updated: 2019-10-16


get scripts

next tip button



About the author
MSSQLTips author Fikrat Azizov Fikrat Azizov has been working with SQL Server since 2002. Hes currently working as Senior BI Consultant at BDO Canada.

View all my tips
Related Resources




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name
*Email
Email me updates

Signup for our newsletter

I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.






download

























get free sql tips

I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



Learn more about SQL Server tools