Azure Data Factory Filter Activity and Debugging Capabilities
In a previous post (see Stairway to Azure Data Factory: Lookup activity) we used a Lookup activity to read data from the database system. In some cases, data generated by Lookup activities may require some filtering before passing to other activities. ADF's Filter activity allows filtering its input data and we are going to discuss this activity in this post. We will also look into pipeline debugging functionality in ADF.
Debugging Functionality in Azure Data Factory
ADF's debugging functionality allows testing pipelines without publishing changes. This functionality also allows setting breakpoints on activities, which would ensure partial pipeline execution.
To demonstrate debugging functionality, I will use pipeline ControlFlow2_PL we built in Stairway to Azure Data Factory: Lookup activity. This pipeline reads data from the database and assigns it to the pipeline variable.
Let's select the pipeline ControlFlow2_PL and click the Debug button at the top of the central panel:
Once the pipeline starts execution, a new Output window appears at the bottom of the screen, with the execution related details, like activity name, type, start time and duration, as well as input/output values:
We can examine the input and output values of each activity from this window, as follows:
Now, if we want to execute this pipeline partially (say only until activity Lookup_AC), we can set a breakpoint on this activity, using a small red circle on its corner:
If we start this pipeline in debug mode again, we will see that it only executed the first activity:
Azure Data Factory Creating Filter Activity
The Filter activity allows filtering its input data, so that subsequent activities can use filtered data.
To show the Filter activity at work, I am going to use the pipeline ControlFlow2_PL. Here are the required steps:
Select the pipeline ControlFlow2_PL and add array-type variable FilteredTableNames to its variable list:
Next, let's drag-drop Filter activity from Iteration & Conditionals group and link it to the Lookup_AC activity on Success criteria (I've named this activity as Filter_AC):
Next, select Filter_AC activity and enter an expression @activity('Lookup_AC').output.value to its Items text box, which reflects output from the Lookup_AC' activity. Next, enter @startswith(string(item().name),'SalesLT.P') to the Condition text box, which sets the filtering condition for table names starting with 'P' character. Here's how your screen should look:
Add another Set Variable activity with the variable name FilteredTableNames and an expression @activity('Filter_AC').output.value as its value. This will ensure that filtered output from Filter_AC activity is assigned to the variable FilteredTableNames:
Finally, let's start this pipeline in Debug mode and examine the results in the Output window:
Now, if we examine output values for activity Set_Variable2_AC, we will see that it contains only table names starting with the character 'P', which is what we expected:
I have attached JSON scripts for both pipelines here.
- Read: Azure Data Factory Pipeline Variables
- Read: Stairway to Azure Data Factory: Lookup activity
- Read: Filter activity
About the author
View all my tips
Article Last Updated: 2019-09-27