Azure Data Factory ForEach Activity Example

By:   |   Comments (5)   |   Related: > Azure Data Factory


Problem

Data integration flows often involve execution of the same tasks on many similar objects. A typical example could be - copying multiple files from one folder into another or copying multiple tables from one database into another. Azure Data Factory's (ADF) ForEach and Until activities are designed to handle iterative processing logic. We are going to discuss the ForEach activity in this article.

Solution

Azure Data Factory ForEach Activity

The ForEach activity defines a repeating control flow in your pipeline. This activity could be used to iterate over a collection of items and execute specified activities in a loop. This functionality is similar to SSIS's Foreach Loop Container.

ForEach activity's item collection can include outputs of other activities, pipeline parameters or variables of array type. This activity is a compound activity- in other words, it can include more than one activity.

Creating ForEach Activity in Azure Data Factory

In the previous two posts (here and here), we have started developing pipeline ControlFlow2_PL, which reads the list of tables from SrcDb database, filters out tables with the names starting with character 'P' and assigns results to pipeline variable FilteredTableNames. Here is the list of tables, which we get in this variable:

  • SalesLT.Product
  • SalesLT.ProductCategory
  • SalesLT.ProductDescription
  • SalesLT.ProductModel
  • SalesLT.ProductModelProductDescription

In this exercise, we will add ForEach activity to this pipeline, which will copy tables, listed in this variable into DstDb database.

Before we proceed further, Let's prepare target tables. First, Let's remove foreign key relationships between these tables in the destination database using below script, to prevent ForEach activity from failing:

ALTER TABLE [SalesLT].[Product] DROP CONSTRAINT [FK_Product_ProductCategory_ProductCategoryID]
GO
ALTER TABLE [SalesLT].[Product] DROP CONSTRAINT [FK_Product_ProductModel_ProductModelID]
GO
ALTER TABLE [SalesLT].[ProductCategory] DROP CONSTRAINT [FK_ProductCategory_ProductCategory_ParentProductCategoryID_ProductCategoryID]
GO
ALTER TABLE [SalesLT].[ProductModelProductDescription] DROP CONSTRAINT [FK_ProductModelProductDescription_ProductDescription_ProductDescriptionID]
GO
ALTER TABLE [SalesLT].[ProductModelProductDescription] DROP CONSTRAINT [FK_ProductModelProductDescription_ProductModel_ProductModelID]
GO
ALTER TABLE [SalesLT].[SalesOrderDetail] DROP CONSTRAINT [FK_SalesOrderDetail_Product_ProductID]
GO

Next, let's create stored procedure to purge target tables, using below script. We'll need to call this procedure before each copy, to avoid PK errors:

CREATE PROCEDURE Usp_PurgeTargetTables
AS
BEGIN
delete from [SalesLT].[Product]
delete from [SalesLT].[ProductModelProductDescription]
delete from [SalesLT].[ProductDescription]
delete from [SalesLT].[ProductModel]
delete from [SalesLT].[ProductCategory]
END

Let's follow the below steps to add a ForEach activity to the ControlFlow2_PL pipeline:

Select pipeline ControlFlow2_PL, expand Iterations & Conditionals group on the Activities panel, drag-drop ForEach activity into the central panel and assign a name (I've named it as ForEach_AC):

azure data factory foreach activity

Switch to the Settings tab and enter an expression @variables('FilteredTableNames') into Items text box:

azure data factory foreach activity

Switch to Activities tab and click Add activity button:

azure data factory foreach activity

Drag-drop copy activity to central panel (I've named it as CopyFiltered_AC), switch to Source tab and click '+New' button to start creating source dataset:

azure data factory foreach activity

Next, create Azure SQL Db dataset, pointing to SrcDb database (I've named it as ASQLSrc_DS) and add dataset parameter TableName of string type:

azure data factory foreach activity

Switch to Connection tab and enter an expression @dataset().TableName in the Table text box, which will ensure that table names for this dataset will be assigned dynamically, using dataset parameter:

azure data factory foreach activity

Now that source dataset has been created, let's return to parent pipeline's design surface and enter an expression @item().name in the TableName text box. This expression will ensure that items from the ForEach activity's input list are mapped to its copy activity's source dataset:

azure data factory foreach activity

Next, let's create parameterized Sink dataset for CopyFiltered_AC activity, using a similar method. Here is how your screen should look like:

azure data factory foreach activity

Now that we've completed configuration of CopyFiltered_AC activity, let's switch to the parent pipeline's design surface, using navigation link at the top of the screen:

azure data factory foreach activity

Next, let's add Stored Procedure activity (I've named it as SP_Purge_AC), pointing to the Usp_PurgeTargetTables procedure we created earlier and link it to Set_Variable_AC activity on Success criteria:

azure data factory foreach activity

As the last configuration step, let's link activities SP_Purge_AC and ForEach_AC on Success criteria. This will ensure that target tables will be purged prior to the beginning of copy activities:

azure data factory foreach activity

Finally, let's start the pipeline in Debug mode and examine execution logs in the Output window to ensure that five copy activities (one per each item from FilteredTableNames variable list) have finished successfully:

azure data factory foreach activity

We can also examine the input of the ForEach activity, using the Input button and confirm that it received five items:

azure data factory foreach activity

Since pipeline works as expected, we can publish all the changes now.

I have attached JSON scripts for this pipeline here.

Optional attributes of ForEach activity in Azure Data Factory

ForEach activity has few optional attributes, which allow controlling parallelism degree of its child activities. Here are those attributes:

  • Sequential - This setting instructs ForEach activity to run its child activities in sequential order, one at a time
  • Batch Count - This setting allows specifying parallelism degree of ForEach activity's child activities

Here is the screenshot with these attributes:

azure data factory foreach activity
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Fikrat Azizov Fikrat Azizov has been working with SQL Server since 2002 and has earned two MCSE certifications. He’s currently working as a Solutions Architect at Slalom Canada.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Monday, September 7, 2020 - 7:50:53 PM - Fikrat Back To Top (86431)
Thanks for the feedback, Nicholas and I'm glad you found this tip useful!

Sunday, September 6, 2020 - 7:21:27 PM - Nicholas Back To Top (86423)
Hi Fikrat Azizov,

Thanks you for providing this awesome tutorial.
I am a new ADF and get lots of TIPS from here. It's really PRACTICAL approaches and full of useful tips.
For this lecture, I just added 'SCHEMA' variable since the ADF UI slightly changed and it works very well.

Thanks again and hope to see your more updates here.

Rgds,
Nicholas

Saturday, April 25, 2020 - 8:22:32 AM - Nalin Back To Top (85474)

Thank you very much for the post.


Wednesday, March 4, 2020 - 5:48:59 PM - Fikrat Back To Top (84931)

Hi there,

Mapping is not really required when you copy between tables of the same structure and I didn't use it either in this tutorial.

However, there might be different reason for your error- if your target table has an identity column, you might receive an error like that.

If that's the case, you might want to enable identity insert at the beginning  of flow and disable at the end (you can read more about it here:https://docs.microsoft.com/en-us/sql/t-sql/statements/set-identity-insert-transact-sql?view=sql-server-ver15)


Wednesday, March 4, 2020 - 9:32:28 AM - Ati Back To Top (84919)

Hi,

Nice and clear tutorial. I' ve a similar problem: try to load 75 tables from the source to my staging. the tables have the same structure and names. But while I configure my ' for each'  I can not get my source en sink data set validated and I get: Table is required for Copy activity. It looks like he can't get mapping information..

any idea? thanks in advance!















get free sql tips
agree to terms