Process Blob Files Automatically using an Azure Function with Blob Trigger


By:   |   Updated: 2020-03-25   |   Comments   |   Related: More > Azure

Problem

We have blob files arriving in a blob container continuously throughout the day. A new file should be read and the contents should be written to a table in an Azure SQL Database, as soon as the file is dropped in the blob container. Is there a way to automate this?

Solution

In the Azure ecosystem there are a number of ways to process files from Azure Blob Storage:

  • Azure Logic Apps. With these you can easily automate workflows without writing any code. You can find an example in the tip Transfer Files from SharePoint To Blob Storage with Azure Logic Apps. They are better suited though to process the contents of a file in its whole, and not for writing data to Azure SQL Database. The reason for this is because they will process the file line by line, which is not optimal for performance.
  • Azure Data Factory is most likely the easiest tool to copy data into an Azure SQL Database. However, the Copy activity doesn't allow for any transformations on the data. A mapping data flow is a good alternative, but since this runs on top of an Azure Databricks cluster, it might be overkill for a small file. ADF has support for blog triggers.
  • Azure Functions are little pieces of event-driven code which run on serverless compute. There are many programming languages available and there's also a template for using a blob trigger. This means the Azure Function will automatically run every time a new file is created in a blob container.

In this tip, we'll give you an example of a simple Azure Function which will read a Json file from a blob container and write its contents to an Azure SQL Database table. Even though there's some code involved, Azure Functions are a flexible and powerful tool and they are useful in your "cloud ETL toolkit". To create an Azure Function, we need the following prerequisites:

  • Visual Studio with the Azure Development workload enabled. You can also create an Azure Function directly in the portal, but Visual Studio is preferred when you want easy source control integration.
  • An Azure storage account
  • An Azure SQL Database

The Sample Data

We're going to load a small Json file holding the top 250 movies. A sample of the data:

{
   "TopMovies": [
      {
         "Index": 1,
         "Title": "The Shawshank Redemption"
      },
      {
         "Index": 2,
         "Title": "The Godfather"
      },
      {
         "Index": 3,
         "Title": "The Godfather: Part II"
      },
      {
         "Index": 4,
         "Title": "The Dark Knight"
      },
      {
         "Index": 5,
         "Title": "12 Angry Men"
      },
      {
         "Index": 6,
         "Title": "Schindler's List"
      },
…
      }
   ]
}

You can download the sample file here. The destination table is created with the following SQL statement:

CREATE TABLE [dbo].[topmovies](
   [Index] [BIGINT] NULL,
   [MovieTitle] [NVARCHAR](500) NULL
);

Creating the Azure Function

Start Visual Studio and create a new project. Choose Azure Functions from the list of available project templates:

create new project in VS

In the next screen, give your project a name and location:

configure project name and location

In the following screen, you can configure the Azure Function itself. You can choose the version and the type of trigger.

choose version and trigger

Even though v3 is the newest version, we're choosing v2 as at the time of writing, there were no templates for v3. Choose the Blob trigger template. On the right-hand side, you need to choose a storage account. You can either choose to emulate one, create a new account or choose an existing one:

choose storage account

Finally, you need to specify a name for the connection string to your blob storage and the blob container where the files will be dropped:

specify connection name and blob container

Once you click on create, the Azure Function will be created using the template for the blob trigger. The following code will be added to a file called Function1.cs.

using System;
using System.IO;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Host;
using Microsoft.Extensions.Logging;
 
namespace myTriggerFunction
{
    public static class Function1
    {
        [FunctionName("Function1")]
        public static void Run([BlobTrigger("functionstest/{name}", Connection = "AzureWebJobsStorage")]Stream myBlob, string name, ILogger log)
        {
            log.LogInformation($"C# Blob trigger function Processed blob\n Name:{name} \n Size:{myBlob.Length} Bytes");
        }
    }
}

You might want to rename the actual function to something more clear than Function1. If you rename the file, you get the following warning:

rename file warning

Click yes. In our example, I renamed the function to copyBlobtoSQL. In the boiler plate code, there are two important variables created: name and myBlob.

name and myblob variables def

The name variable will hold the name of the file found in the blob container, while myBlob will hold the actual contents of the file.

Setting up the Configuration

When the Azure Function runs, it needs to know how to connect to the blob container and the Azure SQL Database. Visual Studio has created a file called local.settings.json. In this file, we can already find the connection information for the storage account:

local settings file

The connection name is AzureWebJobsStorage, which is what we specified in the set-up of the Azure Function. Now we also need to add the database connection information:

azure sql connection added

An example of a connection string:

"SQLConnectionString":
"Server=tcp:mydb.database.windows.net,1433;Initial 
Catalog=myDB;Persist Security Info=False;User ID=myUser;Password=myPW;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection 
Timeout=30;" 

Keep in mind this config file will not be deployed to Azure, so you'll need to specify the information there as well. It's also a good idea to not include this file in source control.

To make use of this config file, we need to add the following directive:

using 
Microsoft.Extensions.Configuration; 

We also need to add an ExecutionContext to the function:

public static void Run([BlobTrigger("functionstest/{name}", Connection = "AzureWebJobsStorage")]Stream myBlob, string name, ILogger log, ExecutionContext context)
add execution context

In the code itself, we specify where we can find the configuration info and we store the connection information into a variable:

var config = new ConfigurationBuilder()
    .SetBasePath(context.FunctionAppDirectory)
    .AddJsonFile("local.settings.json", optional: true, reloadOnChange: true)
    .AddEnvironmentVariables()
    .Build();
 
string myConnectionString = config["SQLConnectionString"];

Reading the Json data

Before we read in the data, we check first if we're dealing with a Json file, by simply checking the extension of the file:

if 
(name.EndsWith(".json")){

}

Then we use a StreamReader object to read in the blob data and store the result into a string.

StreamReader reader = new StreamReader(myBlob);
string 
jsonContent = reader.ReadToEnd(); 

An easy way to write data to SQL Server is using SqlBulkCopy class. There's a method WriteToServer, but it has a DataTable as input, not a string. Luckily we can convert the Json data into a table. The method used is found in this StackOverflow question.

public static DataTable Tabulate(string json)
{
   var jsonLinq = JObject.Parse(json);
 
   // Find the first array using Linq
   var srcArray = jsonLinq.Descendants().Where(d => d is JArray).First();
   var trgArray = new JArray();
   foreach (JObject row in srcArray.Children<JObject>())
   {
      var cleanRow = new JObject();
      foreach (JProperty column in row.Properties())
      {
         // Only include JValue types
         if (column.Value is JValue)
         {
            cleanRow.Add(column.Name, column.Value);
         }
      }
      trgArray.Add(cleanRow);
   }
   return JsonConvert.DeserializeObject<DataTable>(trgArray.ToString());
}

To use all this code, the following using directives are needed:

using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using System.Data;
using System.Linq;

Now we can convert the Json to a DataTable and write it to our destination table:

var dt = Tabulate(jsonContent);
SqlBulkCopy bulkcopy = new SqlBulkCopy(myConnectionString);
bulkcopy.DestinationTableName = "topmovies";
 
try
{
   bulkcopy.WriteToServer(dt);
}
catch (Exception e)
{
   System.Diagnostics.Debug.WriteLine(e.Message.ToString());
}
System.Diagnostics.Debug.WriteLine("Wrote some JSON blob to Azure SQL!");

Even though we're using System.Data, it's possible you get the following error:

import error

In this case, we'll have to add the System.Data.SqlClient manually to the project. Right-click on Packages in the solution and choose Manage NuGet Packages…

install nuget package

In the NuGet package manager, go to Browse and search for the term sql.

search for the sqlclient package

Install the package:

install the latest version of the package

Accept the license:

accept license

And then the package is added to your project:

package added to project.

The entire code now looks like this:

using System;
using System.IO;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Host;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Configuration;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using System.Data;
using System.Linq;
using System.Data.SqlClient;
 
namespace myTriggerFunction
{
    public static class copyBlobtoSQL
    {
        [FunctionName("copyBlobtoSQL")]
        public static void Run([BlobTrigger("functionstest/{name}", Connection = "AzureWebJobsStorage")]Stream myBlob
            , string name, ILogger log, ExecutionContext context)
        {
            log.LogInformation($"C# Blob trigger function Processed blob\n Name:{name} \n Size:{myBlob.Length} Bytes");
 
            var config = new ConfigurationBuilder()
             .SetBasePath(context.FunctionAppDirectory)
             .AddJsonFile("local.settings.json", optional: true, reloadOnChange: true)
             .AddEnvironmentVariables()
             .Build();
 
            string myConnectionString = config["SQLConnectionString"];
 
            if (name.EndsWith(".json"))
            {
                StreamReader reader = new StreamReader(myBlob);
                string jsonContent = reader.ReadToEnd();
                var dt = Tabulate(jsonContent);
 
                SqlBulkCopy bulkcopy = new SqlBulkCopy(myConnectionString);
                bulkcopy.DestinationTableName = "topmovies";
 
                try
                {
                    bulkcopy.WriteToServer(dt);
                }
                catch (Exception e)
                {
                    System.Diagnostics.Debug.WriteLine(e.Message.ToString());
                }
                System.Diagnostics.Debug.WriteLine("Wrote some JSON blob to Azure SQL!");
            }
        }
 
        public static DataTable Tabulate(string json)
        {
            var jsonLinq = JObject.Parse(json);
 
            // Find the first array using Linq
            var srcArray = jsonLinq.Descendants().Where(d => d is JArray).First();
            var trgArray = new JArray();
            foreach (JObject row in srcArray.Children<JObject>())
            {
                var cleanRow = new JObject();
                foreach (JProperty column in row.Properties())
                {
                    // Only include JValue types
                    if (column.Value is JValue)
                    {
                        cleanRow.Add(column.Name, column.Value);
                    }
                }
                trgArray.Add(cleanRow);
            }
            return JsonConvert.DeserializeObject<DataTable>(trgArray.ToString());
        }
    }
}

Testing the Azure Function

Now it's time to see if the code actually works. In Visual Studio, hit F5 or click on the "play" icon to start debugging.

start debugging

This will open up new command line window. There, a host will be created that will run the Azure Function on your local machine.

localhost server for azure function

When the booting has finished, you can see the application is now waiting for a trigger to arrive:

waiting for file to arrive

When you drop a file in the blob container (you can use Azure Storage Explorer for this task), you can see it will be picked up by our function:

function processing file

And the data has been loaded to the Azure SQL Database:

data has entered the building

If you receive the following error during debugging, it's likely there's a compatibility issue between the version of the SqlClient package and the .NET environment the Azure Function is using:

System.PlatformNotSupportedException: 'System.Data.SqlClient is not supported on this platform.'

The issue is described in this GitHub post.

version issue

The issue can be fixed by downgrading the package to an earlier version. In this tip, version 4.5.1 is used.

If you want more information on publishing the Function to Azure and configuring the connections, you can refer to the tip Create an Azure Function to execute SQL on a Snowflake Database - Part 2, where a similar set-up is used.

Next Steps


Last Updated: 2020-03-25


get scripts

next tip button



About the author
MSSQLTips author Koen Verbeeck Koen Verbeeck is a BI professional, specializing in the Microsoft BI stack with a particular love for SSIS.

View all my tips
Related Resources




More SQL Server Solutions











Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name
*Email
Email me updates

Signup for our newsletter

I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.






download


get free sql tips

I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



Learn more about SQL Server tools