Getting Started with Snowflake Tasks

By:   |   Updated: 2020-12-09   |   Comments   |   Related: More > Data Warehousing


Problem

You have just implemented a mass exodus from your on-premises data warehouse to Snowflake. Now that the schema and data portions of your migration are completed, you have one final migration type and this is the workflows that operated in your previous environment (think SQL Server Agent jobs in SQL Server terms). These workflows are largely batch processing loads from different data sources on a particular schedule. The problem we will be looking at today is how to implement these workflows in the new Snowflake environment.

Solution

For the solution to this problem, we will be covering Snowflake tasks. Snowflake tasks are schedulable scripts that are run inside of your Snowflake environment. This tip will cover:

  1. Introduction to Snowflake
  2. High level concepts of a task
  3. Building a simple task
  4. Turning a task on and off
  5. Adding descending tasks onto an existing task

Requirements:

  • A free Snowflake instance.

Introduction to Snowflake

Snowflake is the first cloud first data warehouse solution, the service is built on top of the customer's choice of cloud provider (AWS, Azure or GCP) infrastructure.

Snowflake SQL is ANSI compliant and supports common analytic and windowing functions. You will definitely notice some unique syntax in Snowflake, but you will also notice some similarities coming from a heavy SQL Server background the transition has been reasonable for me.

In terms of what the IDE is for Snowflake it is entirely web based. To interact with your instance, you will navigate to your unique URL for example XXXXXXXX.us-east-1.snowflakecomputing.com. Once logged in you will be in the main web GUI, this is the main IDE and can start interacting with your data assets. Also, for reference each query tab in the Snowflake console is referred to as a "worksheet". These "worksheets" are automatically saved and can be reviewed at any time, which is equivalent to a tab history feature.

What is a Snowflake task?

The Snowflake documentation defines tasks as having the ability to execute single SQL statements including calling a stored procedure. Tasks in Snowflake may only be run by a schedule. DDL statements relevant to Snowflake tasks are as follows:

Name Description
CREATE TASK Create a new task, also takes parameter for scheduling
ALTER TASK Alter an existing task
DROP TASK Delete an existing task
DESCRIBE TASK Returns task meta data. Must be owner of the task to view this data
SHOW TASKS Provides a list of tasks available to the current role

Scheduling a Snowflake task

The Snowflake task engine has a CRON and NONCRON variant scheduling mechanisms. The CRON variant should look familiar syntactically if you are an avid Linux user. CRON or CRONTAB is the Linux version of Windows task schedule. It is extremely simplified in regards to how it runs a job. It supports a few parameters and points to a SQL statement as part of the task definition. The parameters control the frequency of the job being run, days of week and time.

Here is a table with a few configurations and descriptions of CRON job timings to give you an idea of how it works.

cron job timings

Building a simple Snowflake task

In this example, the task we will be inserting a random letter as well as a timestamp into our TASKDEBUG table. The timestamp column will allow us to see the schedule of the job executing.

Start by creating a database and setting the database context in your Snowflake worksheet.

CREATE DATABASE MSSQLTIPS;
USE MSSQLTIPS;

Here is an example script creating a task using NON-CRON notation:

CREATE OR REPLACE TASK TASK_DEBUG
  WAREHOUSE = COMPUTE_WH
  SCHEDULE = '1 MINUTE'
  TIMESTAMP_INPUT_FORMAT = 'YYYY-MM-DD HH24'
AS
insert into taskdebug
with arr as (select array_construct('A','B','C','D','E','F') arr)
select arr[ABS(MOD(RANDOM(), array_size(arr)))],CURRENT_TIMESTAMP() from arr; 

Here is the same script using CRON notation.

CREATE OR REPLACE TASK TASK_DEBUG
  WAREHOUSE = COMPUTE_WH
   SCHEDULE = 'USING CRON 1 * * * * America/New_York'
  TIMESTAMP_INPUT_FORMAT = 'YYYY-MM-DD HH24'
AS
insert into taskdebug
with arr as (select array_construct('A','B','C','D','E','F') arr)
select arr[ABS(MOD(RANDOM(), array_size(arr)))],CURRENT_TIMESTAMP() from arr;

Turning the task on and off

An important note is that even though the task is created in Snowflake, which we can verify by running the show tasks command.

--VIEW TASK WITH SHOW TASKS COMMAND
SHOW TASKS;
show tasks output

We can see the state is "suspended" which means the task will not be triggering. To turn our task on, we issue an alter task command.

--AFTER TASK IS CREATED, ITS INITAL STATE IS SUSPENDED, ENABLE TASK
ALTER TASK TASK_DEBUG RESUME;

Now that the task is on, we just have to wait 1 minute for the task to trigger and query the TASKDEBUG table for proof that our task is running as expected.

--WAIT 1 MINUTE
SELECT * FROM TASKDEBUG;

Adding descending tasks onto an existing task

Lastly, let's suppose that we would like another task to trigger after our initial task TASK_DEBUG runs. Snowflake offers descending task functionality; we will define a similar insert statement as previously shown. In this task we will hard code the first value as 'x' so that we can be sure the 2nd task is triggering.

Before we can stack any tasks on our initial task, we must suspend the initial task.

--TURN TASK OFF
ALTER TASK TASK_DEBUG SUSPEND;

Now that the task is in a suspended state, we can create the second task to run after the first.

--CREATE A 2ND TASKS WITH AFTER CLAUSE
CREATE OR REPLACE TASK TASK_DEBUG_2
  WAREHOUSE = COMPUTE_WH
  AFTER TASK_DEBUG --SPECIFY WHICH TASK IT OPERATES AFTER
AS
insert into taskdebug
select 'X',CURRENT_TIMESTAMP() from arr;

We can verify the task dependence by using the following query:

--CHECK DEPENTANT TASKS
select * from table(information_schema.task_dependents(task_name => 'MSSQLTIPS.PUBLIC.TASK_DEBUG', recursive => false));
check dependent tasks

Here we can verify two things, first that TASK_DEBUG_2 has a Predecessor task of TASK_DEBUG. And also, that TASK_DEBUG_2 does not have a schedule of its own.

Now turn the original task back on and also enabled the second task. Its worth noting here that TASK_DEBUG_2 must be enabled first and then the root task.

--TURN ORIGINAL TASK BACK ON
ALTER TASK TASK_DEBUG_2 RESUME;
ALTER TASK TASK_DEBUG RESUME;

Wait 1 minute for the task to run and check the TASKDEBUG table.

--WAIT 1 MINUTE, VERYIFY THAT 'X' AND OTHER LETTERS ARE IN THE TASKDEBUG TABLES
SELECT * FROM TASKDEBUG;

Here we can see an 'X' value and a random letter which verifies that both the tasks were operated.

taskdebug output

To save on compute charges suspend both of the tasks.

--TURN BOTH TASKS OFF
ALTER TASK TASK_DEBUG SUSPEND;
ALTER TASK TASK_DEBUG_2 SUSPEND;

In summary tasks are very handy in Snowflake, they can be combined with streams, snowpipe and other techniques to make them extremely powerful.

Next Steps



Get Started Now - Click here to get your free 14 day trial of SolarWinds Database Insights




get scripts

next tip button



About the author
MSSQLTips author Ian Fogelman Ian Fogelman is professional DBA with a hunger and interest for data science and application development. Ian enjoys leveraging cloud-based technologies and data to solve and automate tasks. Ian holds a MS in computer science with a concentration in software development. Ian is also a member of his local SQL Server user group and a member of his local data science user group in Columbus Georgia.

View all my tips


Article Last Updated: 2020-12-09

Comments For This Article





download














get free sql tips
agree to terms