Snowflake SnowPro Advanced Architect Certification Preparation

Problem

In the previous article, we discussed some key Snowflake concepts that may appear on the exam. In this article, we will continue with additional features you may see in exam questions. Please note that this covers only a small portion of the full range of exam topics.

Solution

Without further ado, let’s discuss some topics that, based on my experience, can be touched on in exam questions, and briefly highlight their key characteristics. Once again, this article is aimed at serving as an exam cram, and it is assumed that the reader is already aware of the concepts discussed below.

Key Takeaways

  • This article continues the discussion of Snowflake concepts relevant to the Snowflake SnowPro Advanced Architect Certification exam.
  • Key features like Streams, Search Optimization, Snowpipe, External Tables, and Parameters are examined, highlighting their essential characteristics.
  • Understanding Stream types and their monitoring functions is crucial for exam success.
  • Snowpipe allows immediate file loading, while external tables enable querying staged data without storing it in Snowflake.
  • This guide serves as an exam cram, but further study is necessary for comprehensive exam preparation.

Streams

A stream is a Snowflake object that captures DML (Data Manipulation Language) changes (inserts, updates, and deletes) on a table and stores change records with related metadata, enabling change data capture (CDC).

I have noticed that some questions test knowledge about stream types, stream columns, and how to monitor streams. Below is a short reminder of these topics.

Stream types:

  • Standard – Tracks all DML changes (inserts, updates, deletes, and truncates) to the source object (standard tables, dynamic tables, Snowflake-managed Apache Iceberg tables, directory tables, or views).
  • Append-only – Tracks only inserts to the source object (standard tables, dynamic tables, Snowflake-managed Apache Iceberg tables, or views).
  • Insert-only – Tracks only inserts to externally managed Apache Iceberg tables or external tables.

Stream Columns:

  • METADATA$ACTION – Shows DML operation type (INSERT or DELETE).
  • METADATA$ISUPDATE – Shows whether the DML operation (INSERT or DELETE) was part of an UPDATE statement.
  • METADATA$ROWID – Unique, immutable change identifier.

Monitoring:

  • SYSTEM$STREAM_HAS_DATA – A system function that shows whether a given stream has CDC records.
  • MAX_DATA_EXTENSION_TIME_IN_DAYS – A parameter that specifies the maximum number of days Snowflake can extend a table’s data retention period to keep streams on that table from becoming stale. By default, if a source table’s DATA_RETENTION_TIME_IN_DAYS is less than 14 days and the stream hasn’t been consumed, Snowflake temporarily extends the retention period to match the stream’s offset—up to a maximum of 14 days.
  • DATA_RETENTION_TIME_IN_DAYS – A parameter that specifies the number of days Snowflake retains historical data for Time Travel actions.

Search optimization

Snowflake’s Search Optimization Service can improve performance for specific types of queries. It allows skipping some micro-partitions during a table scan by creating a persistent data structure called the search access path, which stores information about column values and micro-partition mapping. This allows us to easily locate values for the defined columns in search optimization.

Required Snowflake edition:

  • Enterprise Edition and higher

Most effective for the queries with the following characteristics:

  • Query involves column(s) that are not primary key columns
  • Query’s runtime is at least a few seconds or more
  • At least one column accessed by the query filter has 100,000 or more distinct values

Costs:

  • The Search Optimization Service consumes storage as it stores the search access path
  • The Search Optimization Service consumes compute (serverless) to create and maintain the search access path metadata.

Some query types that can benefit from Search Optimization:

  • Point lookups
  • Highly selective filters
  • Joins
  • Semi-structured queries (VARIANT)

Snowpipe

Snowpipe allows loading files immediately after they appear in the stage. It is achieved via pipes, which are Snowflake first-class objects that contain a COPY statement defining the source files’ location and the target table.

Methods of detecting staged files:

  • Cloud messaging – using event notifications for cloud storage to notify Snowpipe and initiate the process
  • Snowpipe REST endpoints – the client application calls a public REST endpoint for a pipe to detect new files and invoke the process

“Least privileges” for a role to use Snowpipe:

  • OWNERSHIP – on the named pipe
  • USAGE and READ -on the named stage
  • USAGE – on the Target database and schema. Also, on the Named storage integration and Named file format, if used
  • INSERT and SELECT – on the target table

Other facts about Snowpipe:

  • Compute – Snowflake automatically provides compute resources for Snowpipe, and a warehouse does not need to be specified. The cost is defined according to the compute resource consumed.
  • Load order – There is no guarantee that files will be loaded according to their staging order. Having said that, Snowpipe usually loads older files first.
  • Load history – Snowpipe stores the load history for 14 days in the pipe’s metadata. Snowpipe uses this metadata to avoid reloading already loaded files and prevent duplication.
  • Transactions – Depending on the size and number of rows in the files, load can happen in one transaction or can be split into multiple transactions.

External Tables

Questions related to external tables can appear during the exam, and understanding their main characteristics is very important. Snowflake’s External Table feature allows querying data in an external stage as if it were stored in a Snowflake table. It does not store the actual data but only metadata.

External Table Columns:

  • VALUE – A VARIANT type column storing the actual data of a single row in the external table
  • METADATA$FILENAME – Stores the names of staged data files in the external table with their paths in the stage
  • METADATA$FILE_ROW_NUMBER – Stores the row number of each record in the staged data file

Billing:

  • Snowflake includes extra charges for automatically refreshing external tables’ metadata via event notifications
  • Snowflake includes some maintenance charges for manually refreshing external tables’ metadata

Permissions required for creating and managing external tables:

  • USAGE on DATABASE
  • USAGE, CREATE STAGE (if it is needed to create a new stage), CREATE EXTERNAL TABLE on SCHEMA
  • USAGE on STAGE if an existing stage is used

Some useful facts about external tables:

  • External tables can be partitioned (strongly recommended). Partitions can be added automatically as well as manually, which is defined in the external table definition.
  • Materialized views can be created on external tables, and often, materialized views over external tables can increase query performance.

Parameters

Questions about Snowflake parameters can also appear during the exam. Here are some quick facts about parameter management in Snowflake:

Parameter types:

  • Account – applies to the account level. Can be set only at the account level.
  • Session – applies to users and their sessions. Can be set at the account, user and session levels.
  • Object – applies to objects. Can be set at the account and object levels.

Parameter values:

  • All parameters have default values.
  • Default values can be overridden at the account level (ACCOUNTADMIN role is required).
  • Default values of session and object parameters can also be overridden at each supported level.

Viewing and modifying parameters:

  • SHOW PARAMETERS command – shows the list of the parameters, their current and default values
  • ALTER ACCOUNT SET <param> = <value> – setting parameters
  • ALTER ACCOUNT UNSET <param> – resetting a parameter value to its default

Conclusion

In conclusion, we discussed several important Snowflake features and their main characteristics that may appear on the exam. From my experience, knowing these details can help significantly during the exam. However, the materials discussed here, as well as in the previous article, are helpful but not enough on their own for exam preparation, and should be seen as extra supporting material—an exam cram before the exam.

Next Steps

For additional information, please follow the links below:

Additional Reading:

Leave a Reply

Your email address will not be published. Required fields are marked *