Execute Databricks Jobs via REST API in Postman


By:   |   Updated: 2020-11-19   |   Comments   |   Related: More > Azure


Problem

In the modern technology fabric, there are many different technologies at play, and thus, there are usually requirements for interoperability between these technologies. At the root of all technical platforms is data, and as data grows, distributed data processing technologies are becoming a standard for transforming and serving data across layers.

So, what happens when you have an API based orchestration system for processing data, or need to access the power of distributed computing via API?

Databricks Jobs are Databricks notebooks that can be passed parameters, and either run on a schedule or via a trigger, such as a REST API, immediately. Databricks Jobs can be created, managed, and maintained VIA REST APIs, allowing for interoperability with many technologies. The following article will demonstrate how to turn a Databricks notebook into a Databricks Job, and then execute that job through an API call.

Solution

Executing Databricks Jobs Via REST API in Postman

If you would like to follow along and complete the demo, please make sure to:

  1. Create a free Postman account
    1. Additionally, make sure to download and install the Desktop Postman agent, as this will be required to make the API call.
  2. Have an active Azure subscription that has been switched to pay-as-you-go. This will allow us to create clusters in Databricks.
    1. Get a free trial for Azure
    2. To switch to pay-as-you-go, follow these instructions

Create a Databricks Workspace

Please follow the following anchor link to read on Getting Started with Azure Databricks.

Import a Databricks Notebook

Next, we need to import the notebook that we will execute via API. I have created a basic Python notebook that builds a Spark Dataframe and writes the Dataframe out as a Delta table in the Databricks File System (DBFS).

Download the attachment 'demo-etl-notebook.dbc' on this article – this is the notebook we will be importing.

Open up Azure Databricks. Click Workspace > Users > the carrot next to Shared. Then click 'Import'.

Shows the Databricks Workspace menu to import a new notebook.  Click Workspace, Shared, the carrot next to shared, and then import.

Browse to the file you just downloaded and click import. We are now ready to turn this notebook into a Databricks Job.

Create a Databricks Job

Databricks Jobs are Databricks notebooks that have been wrapped in a container such that they can be run concurrently, with different sets of parameters, and not interfere with each other. Jobs can either be run on a schedule, or they can be kicked off immediately through the UI, the Databricks CLI, or the Jobs REST API. The Jobs REST API can be used to for more than just running jobs – you can use it to create new jobs, delete existing ones, get info on past runs, and much more.

To build our Job, navigate to the Jobs tab of the navigation bar in Databricks.

Shows the Databricks nav bar with the Jobs icon highlighted.

This brings us to the Jobs UI. Click on 'Create Job'.

The create job button on the Jobs UI.

The first thing we need to do is name our Job. As with everything, it is good to adopt a standard naming convention for your Databricks Jobs. An example convention: dbricks_<type>_<function>.

In the create a new Job UI, the name field highlighted.

The Job Id is set automatically. In this case, the JobID is 7.

In the create a new Job UI, the Job_ID field highlighted.

Next, select the notebook that was previously imported. Click on 'Select Notebook' and navigate to the notebook 'demo-etl-notebook'.

In the create a new Job UI, the select notebook field highlighted.

Next, edit the parameters so that dynamic values can be passed into the notebook. If you open the notebook previously imported, you will see at the top we have configured a 'colName' parameter. This parameter determines the column name of our Delta table. On the Jobs screen, click 'Edit' next to 'Parameters', Type in 'colName' as the key in the key value pair, and click 'Confirm'.

In the create a new Job UI, how to set a new parameter.

In the Cluster section, the configuration of the cluster can be set. This is known as a 'Job' cluster, as it is only spun up for the duration it takes to run this job, and then is automatically shut back down. Job clusters are excellent for cost savings. Click 'Edit' to change our job cluster configuration.

In the create a new Job UI, shows how to edit the cluster type..

Since we are working on a very small demo job, we are going to configure a small cluster. Fill out the fields as you see here:

Shows the Job cluster configuration we will be using - main change is to ensure there is just one cluster.

Click 'Confirm'.

The Jobs UI automatically saves, so the job is now ready to be called.

Build the Postman API Call

The next step is to create the API call in Postman. Log in to Postman via a web browser with the account created earlier.

In the top left-hand corner, click 'New', and subsequently select 'Request'.

In Postman, the New button highlighted, along with the Request button highlighted to create a new request.

Name your request 'Test Databricks run-now Post'. If you don't have a collection, you will need to click the 'Create Folder' plus sign to create one. A collection is a container within Postman to save the requests you build. Create a new collection called 'Databricks'.

Click 'Save to Databricks'.

Shows the naming of the request, and the save to Databricks button.

Switch the request type to 'Post'. Different functions of the Jobs API require different request types, so make sure to read the documentation when testing calls.

Shows the request type in Postman - we are selecting

The request URL is a combination of the Databricks location your workspace is hosted in paired with the specific API call you want to make. To get the first part of the URL, navigate to the Azure Portal, and find your Databricks Workspace resource. On the Overview tab, find the URL field in the top right corner, and copy it:

Shows the Databricks resource page and the URL path highlighted

Then, add the API call to the end of the URL. In this case, we are calling the 'run-now' API. The URL should now look like the following:

Shows the completed request URL in postman.

To authenticate to Databricks, we need to generate a user token in the Databricks workspace. Navigate back to Databricks and click the workspace icon in the top right corner. Then select 'User Settings'.

Shows the user settings button in Databricks where we will be able to generate a new token.

This will bring you to the 'Access Tokens' tab. Click 'Generate New Token', name the token 'Postman', and change the 'Lifetime' to 1 day. This means the token will only be active for a single day. Click 'Generate'. Copy the access token and save it somewhere safe. Once it is saved, click done.

Navigate back to Postman and click on the 'Authorization' tab. The Auth type for this request is a 'Bearer token', so select that from the TYPE dropdown. Then, paste the token into the 'Token' field. The authorization page should look like the following:

Shows the filled out authorization section in the Postman request we are building.

Finally, we need to fill in the body to make the actual request. The documentation has samples of these calls. Here is the link for the run-now API call.

Copy the example JSON request from the above link and navigate to the 'Body' tab in Postman.

Before you can paste in the request, you need to select the type as 'raw' and drop down the raw type carrot to the right and select JSON.

Shows the fields to check off on the body section of the Postman request.

Now, paste in the sample request you copied from the documentation, but modify it so that the JobID matches the JobID previously created (you can find that on the Job page in Databricks). This will likely be '1' if it is the first job created. Additionally, change the parameter key to be 'colName', and the value to be 'age'. The result should look similar to the following:

test databricks run now post

We are now ready to execute the Job via API!

Execute the Job via API and View Results

To execute the job, simply click 'send' in Postman. You may get the following error, and if you do, click 'Use Postman Desktop Agent' to resolve it. This step requires that you installed the Postman agent on your desktop.

postman response

Click 'Send' again, and if successful, the response should look like the following:

Shows the response you should get when you have successfully configured the Postman job.

If you get an error, double check each section to make sure there are no typos and that the authorization token was entered properly.

The job has now been initiated. Navigate to Databricks, where we can view the results of the Job.

Open the Jobs UI, and under Active Runs there should be a job running. At this stage, the job cluster is spinning up, which usually takes around 5 minutes. Once it completes, you will see it moved to the 'Completed in past 60 days' section.

Wait for the job to move to the completed section, and then click 'Run 1'.

Shows the Jobs run UI in Databricks.

This opens an ephemeral version of the notebook, meaning that you can see all the outputs of the notebook with the parameters you passed in VIA the job. This is a valuable tool for debugging and to see how changes are affecting the notebook code.

An image of the ephemeral Databricks notebook that was run as a result of the job we created.

We can see above that the column name of our Dataframe was name 'age', which we passed in through the colName parameter.

There it is you have successfully kicked off a Databricks Job using the Jobs API.

Next Steps


Last Updated: 2020-11-19


get scripts

next tip button



About the author
MSSQLTips author Ryan Kennedy Ryan Kennedy is a Solutions Architect for Databricks, specializing in helping clients build modern data platforms in the cloud that drive business results.

View all my tips
Related Resources





Comments For This Article





download





Recommended Reading

Adding Users to Azure SQL Databases

Connect to On-premises Data in Azure Data Factory with the Self-hosted Integration Runtime - Part 1

Transfer Files from SharePoint To Blob Storage with Azure Logic Apps

Continuous database deployments with Azure DevOps

Reading and Writing data in Azure Data Lake Storage Gen 2 with Azure Databricks














get free sql tips
agree to terms