Azure Data Factory Lookup Activity Example


By:   |   Updated: 2019-09-25   |   Comments   |   Related: More > Azure

Problem

One of the frequently used SQL Server Integration Services (SSIS) controls is the lookup transform, which allows performing lookup matches against existing database records. In this post, we will be exploring Azure Data Factory's Lookup activity, which has similar functionality.

Solution

Azure Data Factory Lookup Activity

The Lookup activity can read data stored in a database or file system and pass it to subsequent copy or transformation activities. Unlike SSIS's Lookup transformation, which allows performing a lookup search at the row level, data obtained from ADF's Lookup activity can only be used on an object level. In other words, you can use ADF's Lookup activity's data to determine object names (table, file names, etc.) within the same pipeline dynamically.

Lookup activity can read from a variety of database and file-based sources, you can find the list of all possible data sources here.

Lookup activity can work in two modes:

  • Singleton mode - Produces first row of the related dataset
  • Array mode - Produces the entire dataset

We will look into both modes of Lookup activity in this post.

Azure Data Factory Lookup Activity Singleton Mode

My first example will be creating Lookup activity to read the first row of SQL query from SrcDb database and using it in subsequent Stored Procedure activity, which we will be storing in a log table inside the DstDb database.

For the purpose of this exercise, I have created a pipeline ControlFlow1_PL and view in SrcDb database to extract all table names, using the below query:

CREATE VIEW [dbo].[VW_TableList]
AS
SELECT TABLE_SCHEMA+'.'+TABLE_NAME AS Name FROM INFORMATION_SCHEMA.TABLES 
  WHERE TABLE_TYPE='BASE TABLE'
GO

I have also created a log table and stored procedure to write into it. I am going to use this procedure for the purpose of Stored Procedure activity. Here are the required scripts to be executed inside DstDb database:

CREATE TABLE [dbo].[TableLogs](
   [TableName] [varchar](max) NULL
) 
GO
 
CREATE PROCEDURE [dbo].[usp_LogTableNames]
@TableName varchar(max)
AS
BEGIN
  INSERT INTO [TableLogs] Values(@TableName)
END
GO

Let's follow the below steps to add Lookup and Stored Procedure activities to ControlFlow1_PLpipeline:

Select pipeline ControlFlow1_PL, expand General group on Activities panel, drag-drop the Lookup activity into the central panel and assign the name (I've named it as Lookup_AC):

azure data factory lookup activity

Switch to the Settings tab, click '+New' button to create a dataset, linked to the VW_TableList view in the SrcDb database:

azure data factory lookup activity

I've named the new dataset TableList_DS, see the below properties:

azure data factory dataset

The below screenshot shows the properties of the Lookup activity, with the new dataset configured. Please note that 'First row only' checkbox is checked, which will ensure that this activity produces only the first row from its data source:

azure data factory lookup activity

Next, let's add Stored Procedure activity, pointing to the usp_LogTableNames procedure we created earlier and link it to Lookup_Ac activity on Success criteria:

azure data factory lookup activity

Next, switch to Stored Procedure tab, enter [dbo].[usp_LogTableNames] as the procedure's name, fetch the procedure's parameter, using the Import parameter button and enter the dynamic expression @activity('Lookup_AC').output.firstRow.name as its value. This expression reflects the data output from the Lookup activity:

azure data factory lookup activity

Finally, let's publish the changes, trigger it manually, switch to the Monitor page and open the Activity Runs window to examine the detailed execution logs:

azure data factory activity runs

Using the Output button, we can examine the output of the lookup activity and see the value it produced:

azure data factory lookup activity output

Now that we know how Lookup activity works in singleton mode, let's explore the array mode.

Azure Data Factory Lookup Activity Array Mode

To explore Lookup activity's array mode, I am going to create copy of the pipeline, created earlier and customize it, as follows:

Clone the pipeline ControlFlow1_PL and name it as ControlFlow2_PL.

azure data factory lookup activity array mode

Select Lookup_AC activity in the ControlFlow2_PLpipeline, switch to the Settings tab and clear the First row only checkbox:

azure data factory lookup activity array mode

Because we're expecting multiple rows from Lookup activity, we can no longer use LogTableName_AC activity with a string parameter, so let's remove it and drag-drop a Set Variable activity, located under the General category (I've named it as Set_Variable_AC):

azure data factory lookup activity array mode

Add array type variable TableNames to the ControlFlow2_PL pipeline:

azure data factory lookup activity array mode

Link two activities on Success criteria, select Set_Variable_AC activity and choose TableNames from the Names drop-down list as a variable name and enter expression @activity('Lookup_AC').output.value as a value. If you compare this expression to the previous one (@activity('Lookup_AC').output.firstRow.name) you can notice that, we've replaced the firstRow property with the value property because the Lookup activity in array mode doesn't support firstRowproperty. Here's how your screen should look:

azure data factory lookup activity array mode

Let's publish the changes again, trigger the pipeline, switch to the Monitor page and examine input for SetVariable_AC activity. As you can see from the below screenshot, multiple table names have been passed to this activity from the Lookup activity:

azure data factory lookup activity array mode

Conclusion

The Lookup activity can make your data flow configuration more flexible, as it allows reading object names from the database or file system dynamically. Similar to the If Condition activity we have discussed in a previous post. Data produced by the Lookup activity can only be used at the object level within a pipeline, which makes them different from SSIS's similar components.

You can download the JSON scripts for both pipelines here.

Next Steps


Last Updated: 2019-09-25


get scripts

next tip button



About the author
MSSQLTips author Fikrat Azizov Fikrat Azizov has been working with SQL Server since 2002. Hes currently working as Senior BI Consultant at BDO Canada.

View all my tips
Related Resources




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name
*Email
Email me updates

Signup for our newsletter

I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.






download

























get free sql tips

I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



Learn more about SQL Server tools