Azure Data Factory Lookup Activity Example
One of the frequently used SQL Server Integration Services (SSIS) controls is the lookup transform, which allows performing lookup matches against existing database records. In this post, we will be exploring Azure Data Factory's Lookup activity, which has similar functionality.
Azure Data Factory Lookup Activity
The Lookup activity can read data stored in a database or file system and pass it to subsequent copy or transformation activities. Unlike SSIS's Lookup transformation, which allows performing a lookup search at the row level, data obtained from ADF's Lookup activity can only be used on an object level. In other words, you can use ADF's Lookup activity's data to determine object names (table, file names, etc.) within the same pipeline dynamically.
Lookup activity can read from a variety of database and file-based sources, you can find the list of all possible data sources here.
Lookup activity can work in two modes:
- Singleton mode - Produces first row of the related dataset
- Array mode - Produces the entire dataset
We will look into both modes of Lookup activity in this post.
Azure Data Factory Lookup Activity Singleton Mode
My first example will be creating Lookup activity to read the first row of SQL query from SrcDb database and using it in subsequent Stored Procedure activity, which we will be storing in a log table inside the DstDb database.
For the purpose of this exercise, I have created a pipeline ControlFlow1_PL and view in SrcDb database to extract all table names, using the below query:
CREATE VIEW [dbo].[VW_TableList] AS SELECT TABLE_SCHEMA+'.'+TABLE_NAME AS Name FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE='BASE TABLE' GO
I have also created a log table and stored procedure to write into it. I am going to use this procedure for the purpose of Stored Procedure activity. Here are the required scripts to be executed inside DstDb database:
CREATE TABLE [dbo].[TableLogs]( [TableName] [varchar](max) NULL ) GO CREATE PROCEDURE [dbo].[usp_LogTableNames] @TableName varchar(max) AS BEGIN INSERT INTO [TableLogs] Values(@TableName) END GO
Let's follow the below steps to add Lookup and Stored Procedure activities to ControlFlow1_PLpipeline:
Select pipeline ControlFlow1_PL, expand General group on Activities panel, drag-drop the Lookup activity into the central panel and assign the name (I've named it as Lookup_AC):
Switch to the Settings tab, click '+New' button to create a dataset, linked to the VW_TableList view in the SrcDb database:
I've named the new dataset TableList_DS, see the below properties:
The below screenshot shows the properties of the Lookup activity, with the new dataset configured. Please note that 'First row only' checkbox is checked, which will ensure that this activity produces only the first row from its data source:
Next, let's add Stored Procedure activity, pointing to the usp_LogTableNames procedure we created earlier and link it to Lookup_Ac activity on Success criteria:
Next, switch to Stored Procedure tab, enter [dbo].[usp_LogTableNames] as the procedure's name, fetch the procedure's parameter, using the Import parameter button and enter the dynamic expression @activity('Lookup_AC').output.firstRow.name as its value. This expression reflects the data output from the Lookup activity:
Finally, let's publish the changes, trigger it manually, switch to the Monitor page and open the Activity Runs window to examine the detailed execution logs:
Using the Output button, we can examine the output of the lookup activity and see the value it produced:
Now that we know how Lookup activity works in singleton mode, let's explore the array mode.
Azure Data Factory Lookup Activity Array Mode
To explore Lookup activity's array mode, I am going to create copy of the pipeline, created earlier and customize it, as follows:
Clone the pipeline ControlFlow1_PL and name it as ControlFlow2_PL.
Select Lookup_AC activity in the ControlFlow2_PLpipeline, switch to the Settings tab and clear the First row only checkbox:
Because we're expecting multiple rows from Lookup activity, we can no longer use LogTableName_AC activity with a string parameter, so let's remove it and drag-drop a Set Variable activity, located under the General category (I've named it as Set_Variable_AC):
Add array type variable TableNames to the ControlFlow2_PL pipeline:
Link two activities on Success criteria, select Set_Variable_AC activity and choose TableNames from the Names drop-down list as a variable name and enter expression @activity('Lookup_AC').output.value as a value. If you compare this expression to the previous one (@activity('Lookup_AC').output.firstRow.name) you can notice that, we've replaced the firstRow property with the value property because the Lookup activity in array mode doesn't support firstRowproperty. Here's how your screen should look:
Let's publish the changes again, trigger the pipeline, switch to the Monitor page and examine input for SetVariable_AC activity. As you can see from the below screenshot, multiple table names have been passed to this activity from the Lookup activity:
The Lookup activity can make your data flow configuration more flexible, as it allows reading object names from the database or file system dynamically. Similar to the If Condition activity we have discussed in a previous post. Data produced by the Lookup activity can only be used at the object level within a pipeline, which makes them different from SSIS's similar components.
You can download the JSON scripts for both pipelines here.
- Read: Azure Data Factory Pipeline Variables
- Read: Azure Data Factory If Condition Activity
- Read: Lookup activity in Azure Data Factory
- Read: Lookup transformation
About the author
View all my tips