Choosing Snowflake vs SQL Server for a Data Warehouse
We are embarking on a new project where we need to build out a data platform. We have a long and successful history with the SQL Server database server on our on-premises server, but management is investigating if the cloud is a viable option as well. We’ve heard good things about the Snowflake database offering. What should we look out for when making a decision?
For readers not familiar with Snowflake: it’s a data warehouse vendor and its database solution is a cloud data warehouse offering which is available on Azure as well as on AWS and the Google Cloud platform. You can learn more about Snowflake in this tutorial, or you can read the tip Why Would You Choose For Snowflake to discover some of the advantages of the Snowflake platform.
In this tip, we will present a couple of reasons and use cases on why you should choose for one database platform or the other. When we talk about SQL Server, we are referring to the installation of SQL Server on on-premises servers. Azure SQL DB, Azure Managed Instance, Azure Synapse Analytics and other cloud offerings are not included. Most arguments presented in this tip for SQL Server are most likely true though for SQL Server installed on a virtual machine hosted in Azure.
Cloud vs On-Premises
The most obvious decision criterion is the following: do you want your database in the cloud or not? If you’re adamant on staying on-premises, be it due to regulations or other reasons, Snowflake is out of the question as they only have a cloud offering.
Similarly, if you specifically need the cloud, you can’t use SQL Server on-premises. However, you can use other database offerings in Azure, or even install SQL Server on a virtual machine hosted in Azure (this eliminates the need for a physical server, but you still need to manage the operating system and the SQL Server engine itself). As mentioned before, those are out of scope for this tip.
Let’s be clear: Snowflake is a data warehouse. Sure, you can integrate streaming data, work with unstructured data and build it into an encompassing data lakehouse. But it’s still a data warehouse at its core. It’s meant to process data for analytical use cases. If your use case is not building a data warehouse, but rather an OLTP database (or some use cases of NoSQL databases, such as a document database), Snowflake is definitely the wrong choice. Some anecdotal evidence: I needed to load some metadata into a Snowflake database. This was stored into some Excel sheets (the biggest was probably 200 rows). I loaded the data with an ETL tool into Snowflake. Due to the way the ETL and Snowflake worked together, the rows were most likely inserted one by one. It took over one minute. In SQL Server, such an operation would take less than a second. The way Snowflake is built (with features like time travel) means it’s very well suited for processing large amounts of data at once. But not for doing little tricky inserts (RBAR or row-by-agonizing-row as it’s sometimes called). Even updates and deletes might suffer a penalty on Snowflake. Sometimes it’s just faster to insert data using a SELECT with a WHERE clause into a new table, then deleting data from the original table.
INSERT INTO myNewTable(colA, colB, …) SELECT * FROM myOldTable WHERE colA <> 'someCondition';
This query might be faster than running the following DELETE statement:
DELETE FROM myOldTable WHERE colA = 'someCondition';
SQL Server on the other hand can be used for almost any use case. You can use it as a highly available OLTP database supporting your applications, with point-in-time restores and many other features, or can you can use it as a data warehouse as well. Certain features like the simple recovery model and columnstore indexes make SQL Server well-suited for large analytical loads.
For Snowflake, the cost is a recurring one like most cloud offerings. You pay a monthly fee, depending on two factors:
- the amount of storage your data uses (this is most likely not the biggest contributor to the monthly fee)
- the number of minutes the Snowflake warehouses (the compute engines) have been running, in combination with the size of those warehouses.
In other words, the more you use Snowflake, the more you’ll need to pay. In contrast, SQL Server is a fixed cost upfront. You pay your license fee and that’s it. However, there are other costs that also need to be taken into account: you’ll need a server (which needs cooling, electricity etc.), you might need a SAN for storage, you need to have decent network infrastructure implemented and personnel to manage all of this. If something breaks, it needs to be replaced. So you have a big upfront cost (purchasing the server and the licenses), but also an ongoing maintenance cost.
It’s impossible to predict something in the lines of "SQL Server will be x times cheaper" or "Snowflake will be more cost effective in the long run". It just depends on too many factors and on your specific scenario. When trying to make a decision, keep the cost structure in mind and try to make a prediction of the monthly cost. After some time, reevaluate.
Out-of-the box, Snowflake has stellar performance which will be hard to match with an on-premises SQL Server database engine. SQL Server isn’t slow, but to get such good query performance over very large data sets, you need some serious hardware for SQL Server (which is expensive) and a database development team who knows what they’re doing (which is also expensive). The advantage of Snowflake here is it takes no time to get up and running and you can focus solely on modelling your data warehouse and writing queries. Another advantage is that it absolutely takes no effort to scale Snowflake. Queries are slow? Either change the computing warehouse to a bigger size, or put more warehouses into a cluster. You can scale up and down any time you want, adapting your computing strength to the current demand. With an on-premises SQL Server, this is much harder to do.
You have already bought the server on which SQL Server is running. You can perhaps add more RAM, but this still takes time (and money). If SQL Server is running inside a virtual machine and the host machine still has RAM and CPU to spare, you can swap in more compute power but even this is not effortless. A lot of shops have an overpowered server (with an expensive SQL Server license since it is tied to the number of CPU cores), just to be able to handle "that one heavy load every month" or "any big spike in user activity", but most of the time a big chunk of the resources are sitting idle.
On the other hand, the "zero management" philosophy of Snowflake also means "zero options". If a query is slow for some reason, performance tuning options are limited. You can try clustering on a column or scale up the compute. If that doesn’t work, you have to rewrite your query and maybe split it up to dump intermediate results in tables. With SQL Server, you have many options available. The most important one surely is indexing.
With SQL Server, you have complete control over everything that happens with the database engine. You are in control over the database backup schedule, over the high-availability and the disaster recovery, over the encryption used, over the amount of logging and so on. If you need this control, SQL Server is probably your best choice. With Snowflake – and many other "as-a-service" offerings – you give up part of the control but you get ease of management in return. It’s up to you to decide how much control you want or need.
Choosing between Snowflake or an on-premises SQL Server essentially boils down to the choice if you want to move to the cloud or not. With the cloud comes ease-of-management, great performance and elastic scalability. On-premises, you have absolute control and a more predictable cost.
- Check out other Snowflake resources:
- See this SQL Server Tutorial
About the author
View all my tips
Article Last Updated: 2022-03-09