Integrate Azure Function into Azure Data Factory Pipeline

By:   |   Comments (7)   |   Related: > Azure Data Factory


Problem

I’m orchestrating a data pipeline using Azure Data Factory. One of the activities the pipeline needs to execute is loading data into the Snowflake cloud data warehouse. Since Azure Data Factory currently doesn’t support a native connection to Snowflake, I’m thinking about using an Azure Function to accomplish this task. How can I integrate this in into my pipeline?

Solution

Snowflake is a database vendor who offer a cloud native data warehouse solution. This data warehouse can be hosted on all major cloud platforms (Azure, AWS and Google Cloud). For an introduction to Snowflake and their offerings, I refer to the Snowflake website.

In the two-part tip Using an Azure Function to execute SQL on a Snowflake Database (part 1 and part 2), an Azure Function was created which is able to take a SQL statement as a parameter and execute this on a Snowflake database. As output, the duration and the number of affected rows are returned. If you haven’t already, please read the tip about creating the function, as we will use it as an example in this tip. However, the explanation in this tip is useful for integrating any kind of Azure Function into Azure Data Factory (ADF), even if you’re not using Snowflake.

Integrating an Azure Function into ADF

Test Set-up

We’re going to execute two SQL statements in our ADF pipeline:

  • A TRUNCATE TABLE statement to empty a staging table
  • A COPY INTO statement to load the data from an XML file stored on Azure Blob Storage into the table. The COPY INTO statement is a powerful statement which is able to load multiple files in parallel into a table with great performance. It supports multiple types of compression and different file types. For more information, please check the documentation.

The data that is going to be loaded is an XML file of the StackOverflow sample database, stored in an Azure Blob container:

sample file

The PostLinks XML file is about 80MB large (compressed using the gzip format. Keep in mind the original files are in the 7z format which is currently not supported by the COPY INTO statement) and contains 5,292,624 lines. It will be loaded into a table named STAGE.DEMO_Postlinks_XML. It can be created with the following SQL:

CREATE OR REPLACE TABLE STAGE.Demo_PostLinks_XML(src_xml VARIANT);

The XML is stored in one single column of the VARIANT data type. The COPY INTO statement takes the following format:

COPY INTO STAGE.Demo_PostLinks_XML
FROM @AZURESTAGINGDEV/PostLinks.xml.gz
    FILE_FORMAT=(TYPE=XML STRIP_OUTER_ELEMENT = TRUE)
    ON_ERROR='CONTINUE';

The STRIP_OUTER_ELEMENT parameter tells the COPY INTO statement to drop the root node of the XML file and load each individual child node as a single row to the destination table. If you don’t enable this parameter, Snowflake will try to load the entire XML file into one row. However, there’s a 16MB size limit per row, so this will fail. By setting this parameter to True, 5 million lines will be loaded instead of one. This comes with a computational overhead since the XML has to be parsed.

Adding the Azure Function to ADF

The first step is to add a new Linked Service to your ADF environment:

add new linked service

In the Compute tab, choose Azure Function.

choose azure function

Choose your Azure Function from your subscription:

choose correct Azure Function

The Function Key can be found in the Azure Portal. In the Function App, search for the Function itself, and then go to the Manage page. There you can copy the Function Key to the clipboard or add new ones.

copy function key

Once the Function Key is pasted to the configuration, the Linked Service can be created.

In your pipeline, drag the Azure Function activity to the canvas and give it a descriptive name.

azure function activity

In the Settings, choose the Azure Function Linked Service we just created. Pick the Azure Function you want to use and select the POST method from the dropdown.

configure azure function activity

For the body, we pass along the SQL statement to the Query parameter using a JSON format:

{"Query":"TRUNCATE TABLE STACKOVERFLOW.STAGE.Demo_PostLinks_XML;"}

Repeat the same steps to add a new Azure Function activity to the canvas and configure it to execute the COPY INTO statement:

{"Query":"COPY INTO STACKOVERFLOW.STAGE.Demo_PostLinks_XML
FROM @AZURESTAGINGDEV/PostLinks.xml.gz
    FILE_FORMAT=(TYPE=XML STRIP_OUTER_ELEMENT = TRUE)
    ON_ERROR='CONTINUE';"}

When the pipeline is executed, we can see in the output window the number of affected rows and the duration of the query:

execute functions in ADF

In Snowflake, we can verify the data has actually been loaded:

row count check

Each line of the XML file has been loaded to one row in the destination table:

verify xml parsing

Conclusion

In this tip, we saw how we can integrate an Azure Function into an Azure Data Factory pipeline using the native Linked Service and activity. Using an Azure Function, we’re able to execute SQL statement on a Snowflake database and return the output to ADF.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Koen Verbeeck Koen Verbeeck is a seasoned business intelligence consultant at AE. He has over a decade of experience with the Microsoft Data Platform in numerous industries. He holds several certifications and is a prolific writer contributing content about SSIS, ADF, SSAS, SSRS, MDS, Power BI, Snowflake and Azure services. He has spoken at PASS, SQLBits, dataMinds Connect and delivers webinars on MSSQLTips.com. Koen has been awarded the Microsoft MVP data platform award for many years.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Wednesday, July 14, 2021 - 2:14:13 AM - Koen Verbeeck Back To Top (88999)
Hi Siva,

I would use an expression on the body of the Azure Function Activity in ADF to create a valid json that contains your array.
In this article you can find an example of the json function which might be of assistance:

https://www.sqlservercentral.com/blogs/passing-json-arrays-between-pipelines-in-azure-data-factory

Regards,
Koen

Tuesday, July 13, 2021 - 1:53:14 AM - Siva Back To Top (88995)
Hi Koen,

Thanks for the reply! For now I am ok with connection string.

Could you please share some idea on how can we pass pipeline parameters of type array to Azure function stored procedure as input. Ex: pipeline parameter is an array consisting of source, target table names, I want to read that target table name from that parameter and pass it to snowflake procedure inside azure function.

sample pipeline parameter:

[{"source":{"tableName":"`ABC`"},"destination":{"tableName":"tgt_ABC"},
{"source":{"tableName":"`xyz`"},"destination":{"tableName":"tgt_xyz"}]

Regards,
Siva

Friday, July 9, 2021 - 9:04:18 AM - Koen Verbeeck Back To Top (88978)
Hi Siva,

I'm not sure you can parameterize the connection string itself in the config file. It might be possible, I just haven't tried it myself. But, the connection string is just a string. You can load it from the config and then do a search/replace the replace the schema with the one you need for the SQL Statement.

Regards,
Koen

Friday, July 9, 2021 - 7:25:28 AM - Siva Back To Top (88976)
Hi Koen,

Thanks for your post, it was really helpful.
Can we parameterize the snowflake connection details in config file of connection string? I need to call a procedure on snowflake for different schemas.

Regards,
Siva

Monday, December 2, 2019 - 11:01:56 AM - Murray Back To Top (83254)

We use the Web Activity at the moment, just wondered if I was missing a trick. 


Monday, December 2, 2019 - 10:12:10 AM - Koen Verbeeck Back To Top (83253)

Hi Murray,

from what I can tell you can either use the function key (as displayed in the article) or Azure Key Vault.
I've seen some people on StackOverflow trying to use the ADF Web Activity to call the Azure Function so that MSI can be used.

Koen


Monday, December 2, 2019 - 2:00:28 AM - Murray Back To Top (83246)

Do you know if there is an option for MSI authentication as yet?















get free sql tips
agree to terms