Snowflake Architecture and Pricing Example
By: Koen Verbeeck
Snowflake has a unique architecture, built for the cloud. One of the main advantages is the true separation of storage and compute. The storage layer keeps the data on immutable cloud storage, such as Amazon S3 or Azure Blob Storage. The compute layer consists of warehouses. You can look at a warehouse as a virtual machine which does the computations on the data. The architecture can be summarized as follows:
- Cloud services: the is the brain of Snowflake and it manages crucial resources for the accounts, such as the metadata, authentication and access control, transactions and so on.
- Warehouses: the muscle of the system; elastic virtual machines for compute.
- Data Storage: hybrid columnar storage using micro-partitions.
When a query is executed on Snowflake, the data is read into cache and it is processed. This processing is done by a warehouse. Warehouses come in various sizes, which are depicted as T-shirt sizes. The smallest unit of processing is an extra-small warehouse. When you need more compute, you can upsize it to a small warehouse. This has double the processing power than an XS warehouse. Then you can enlarge it to medium, large and so on until 4XL! That’s a lot of compute.
Besides scaling vertically, you can also scale horizontally: you can create a cluster of warehouses of the same sizes. For example, you can create a cluster of 4 XS warehouses. You can also enable auto-scaling: if there’s no compute needed, the warehouses in the cluster shut down, saving you costs. You can combine horizontal and vertical scaling to suit your demands.
In Snowflake, cost is calculated by two factors:
- Storage - You pay a certain amount per terabyte per month.
- Compute - For each second a warehouse is running, you use an amount of credits. The price of a credit depends on the edition of Snowflake you use. An XS warehouse consumes one credit for one full hour of compute. A small warehouse 2 credits, a medium warehouse 4 credits and so on.
When you execute a query, the warehouse starts using compute for at least one minute. After the first minute, you pay for each extra second the warehouse is running. To save costs, you can configure a warehouse to auto-suspend after a certain amount of time. If no queries are executed after said time period, the warehouse shuts down. When a new query is launched, the warehouse auto-resumes.
Because Snowflake always has a pool of idle virtual machines, resuming a warehouse or changing its scale is super-fast. It’s hardly noticeable and transparent for running queries.
The following table gives an overview of credit consumption for horizontal and vertical scaling:
The largest unit of compute is a 10-node cluster of 4XL warehouses. This consumes 1280 credits per hour. For Enterprise edition, a credit is about $4. This means over $5,000 an hour! It’s clear you can save the most costs by minimizing compute. Storage typically has a much lower impact than compute.
To create a new warehouse in Snowflake, you can execute the following SQL script:
CREATE OR REPLACE WAREHOUSE "my_Warehouse" WITH WAREHOUSE_SIZE = 'XSMALL' AUTO_SUSPEND = 60 AUTO_RESUME = TRUE INITIALLY_SUSPENDED = TRUE;
This will create a warehouse of size XS, which will automatically shut down after 60 seconds of inactivity. In the browser, you can view the suspended warehouse in the context of the worksheet:
- For more information about warehouses and credit consumption, check out this page.