Choosing Snowflake vs SQL Server for a Data Warehouse

By:   |   Updated: 2022-03-09   |   Comments (6)   |   Related: More > Other Database Platforms


Free SQL Server Performance and Monitoring Report


Dear Database Professional,

Download your free copy of the MSSQLTips.com SQL Server Performance and Monitoring Report. This survey was conducted in 2022 and polled 588 database professionals about various aspects of tuning and optimizing SQL Server.

Click here to download the free report

Problem

We are embarking on a new project where we need to build out a data platform. We have a long and successful history with the SQL Server database server on our on-premises server, but management is investigating if the cloud is a viable option as well. We’ve heard good things about the Snowflake database offering. What should we look out for when making a decision?

Solution

For readers not familiar with Snowflake: it’s a data warehouse vendor and its database solution is a cloud data warehouse offering which is available on Azure as well as on AWS and the Google Cloud platform. You can learn more about Snowflake in this tutorial, or you can read the tip Why Would You Choose For Snowflake to discover some of the advantages of the Snowflake platform.

In this tip, we will present a couple of reasons and use cases on why you should choose for one database platform or the other. When we talk about SQL Server, we are referring to the installation of SQL Server on on-premises servers. Azure SQL DB, Azure Managed Instance, Azure Synapse Analytics and other cloud offerings are not included. Most arguments presented in this tip for SQL Server are most likely true though for SQL Server installed on a virtual machine hosted in Azure.

Cloud vs On-Premises

The most obvious decision criterion is the following: do you want your database in the cloud or not? If you’re adamant on staying on-premises, be it due to regulations or other reasons, Snowflake is out of the question as they only have a cloud offering.

Similarly, if you specifically need the cloud, you can’t use SQL Server on-premises. However, you can use other database offerings in Azure, or even install SQL Server on a virtual machine hosted in Azure (this eliminates the need for a physical server, but you still need to manage the operating system and the SQL Server engine itself). As mentioned before, those are out of scope for this tip.

Data Warehouse

Let’s be clear: Snowflake is a data warehouse. Sure, you can integrate streaming data, work with unstructured data and build it into an encompassing data lakehouse. But it’s still a data warehouse at its core. It’s meant to process data for analytical use cases. If your use case is not building a data warehouse, but rather an OLTP database (or some use cases of NoSQL databases, such as a document database), Snowflake is definitely the wrong choice. Some anecdotal evidence: I needed to load some metadata into a Snowflake database. This was stored into some Excel sheets (the biggest was probably 200 rows). I loaded the data with an ETL tool into Snowflake. Due to the way the ETL and Snowflake worked together, the rows were most likely inserted one by one. It took over one minute. In SQL Server, such an operation would take less than a second. The way Snowflake is built (with features like time travel) means it’s very well suited for processing large amounts of data at once. But not for doing little tricky inserts (RBAR or row-by-agonizing-row as it’s sometimes called). Even updates and deletes might suffer a penalty on Snowflake. Sometimes it’s just faster to insert data using a SELECT with a WHERE clause into a new table, then deleting data from the original table.

For example:

INSERT INTO myNewTable(colA, colB, …)
SELECT *
FROM myOldTable
WHERE colA <> 'someCondition';

This query might be faster than running the following DELETE statement:

DELETE FROM myOldTable
WHERE colA = 'someCondition';

SQL Server on the other hand can be used for almost any use case. You can use it as a highly available OLTP database supporting your applications, with point-in-time restores and many other features, or can you can use it as a data warehouse as well. Certain features like the simple recovery model and columnstore indexes make SQL Server well-suited for large analytical loads.

Cost

For Snowflake, the cost is a recurring one like most cloud offerings. You pay a monthly fee, depending on two factors:

  • the amount of storage your data uses (this is most likely not the biggest contributor to the monthly fee)
  • the number of minutes the Snowflake warehouses (the compute engines) have been running, in combination with the size of those warehouses.

In other words, the more you use Snowflake, the more you’ll need to pay. In contrast, SQL Server is a fixed cost upfront. You pay your license fee and that’s it. However, there are other costs that also need to be taken into account: you’ll need a server (which needs cooling, electricity etc.), you might need a SAN for storage, you need to have decent network infrastructure implemented and personnel to manage all of this. If something breaks, it needs to be replaced. So you have a big upfront cost (purchasing the server and the licenses), but also an ongoing maintenance cost.

It’s impossible to predict something in the lines of "SQL Server will be x times cheaper" or "Snowflake will be more cost effective in the long run". It just depends on too many factors and on your specific scenario. When trying to make a decision, keep the cost structure in mind and try to make a prediction of the monthly cost. After some time, reevaluate.

Performance

Out-of-the box, Snowflake has stellar performance which will be hard to match with an on-premises SQL Server database engine. SQL Server isn’t slow, but to get such good query performance over very large data sets, you need some serious hardware for SQL Server (which is expensive) and a database development team who knows what they’re doing (which is also expensive). The advantage of Snowflake here is it takes no time to get up and running and you can focus solely on modelling your data warehouse and writing queries. Another advantage is that it absolutely takes no effort to scale Snowflake. Queries are slow? Either change the computing warehouse to a bigger size, or put more warehouses into a cluster. You can scale up and down any time you want, adapting your computing strength to the current demand. With an on-premises SQL Server, this is much harder to do.

snowflake scaling vs traditional on-premises server

You have already bought the server on which SQL Server is running. You can perhaps add more RAM, but this still takes time (and money). If SQL Server is running inside a virtual machine and the host machine still has RAM and CPU to spare, you can swap in more compute power but even this is not effortless. A lot of shops have an overpowered server (with an expensive SQL Server license since it is tied to the number of CPU cores), just to be able to handle "that one heavy load every month" or "any big spike in user activity", but most of the time a big chunk of the resources are sitting idle.

On the other hand, the "zero management" philosophy of Snowflake also means "zero options". If a query is slow for some reason, performance tuning options are limited. You can try clustering on a column or scale up the compute. If that doesn’t work, you have to rewrite your query and maybe split it up to dump intermediate results in tables. With SQL Server, you have many options available. The most important one surely is indexing.

Control

With SQL Server, you have complete control over everything that happens with the database engine. You are in control over the database backup schedule, over the high-availability and the disaster recovery, over the encryption used, over the amount of logging and so on. If you need this control, SQL Server is probably your best choice. With Snowflake – and many other "as-a-service" offerings – you give up part of the control but you get ease of management in return. It’s up to you to decide how much control you want or need.

Conclusion

Choosing between Snowflake or an on-premises SQL Server essentially boils down to the choice if you want to move to the cloud or not. With the cloud comes ease-of-management, great performance and elastic scalability. On-premises, you have absolute control and a more predictable cost.

Next Steps





get scripts

next tip button



About the author
MSSQLTips author Koen Verbeeck Koen Verbeeck is a BI professional, specializing in the Microsoft BI stack with a particular love for SSIS.

View all my tips


Article Last Updated: 2022-03-09

Comments For This Article




Tuesday, March 29, 2022 - 11:16:07 AM - Koen Verbeeck Back To Top (89950)
Hi Donna,

this article is specifically aimed at the "should I stay on-prem with SQL Server" or "should I go to the cloud with Snowflake" crowd. There's another article coming with a comparison between Snowflake and Azure SQL DB. Ideally there's a third article "Snowflake vs Synapse", but I don't feel I have enough experience with Synapse to confidently write such an article, so I leave that one to other people :)

Monday, March 28, 2022 - 11:47:15 AM - Donna Kelly Back To Top (89938)
Koen,
you write "Choosing between Snowflake or an on-premises SQL Server" in the summary.
But surely this is not an appropriate comparison. Wouldn't it be better to compare Snowflake (the product, not the datawarehouse design schema) to Azure Synapse Analytics?
I mean, the decision points between
(a) cloud versus on-prem are completely different from
(b) ok, I've decided on a cloud solution for our enterprise data warehouse, now what technology should I employ?
The first is a business strategy decision . . . the second is a tactical technical decision.
Cheers, Donna

Monday, March 28, 2022 - 11:02:40 AM - Brad Back To Top (89937)
One long-standing issue with Snowflake as a Warehouse product. The veracity of it's data is dubious at best without enforceable constraints ... primary key, foreign key or otherwise.

Thursday, March 10, 2022 - 4:09:20 PM - Dave Wentzel Back To Top (89869)
_who writes the SQL_?

Both. Business analysts with a modicum of sql knowledge are usually ok with writing basic ETL using CTAS patterns. This is usually good enough for non-IT folks to semantically enrich existing data with outside data. At some point someone will want these productionized...and it should be as easy as IT takes those CTAS patterns, laughs hysterically, and rebuilds them in a sustainable, performant manner. IT/traditional ETL developers can continue to use whatever tool they want but can also use the CTAS patterns.

Thursday, March 10, 2022 - 10:12:29 AM - Koen Verbeeck Back To Top (89868)
Hi Dave,
thanks for your comment! Yes, the article was mainly written from a technical perspective. More precisely, from the viewpoint "we're starting a new DWH project, are we going to keep using SQL Server, or rather use Snowflake in the cloud".
Out of curiosity, in your situation, who writes the SQL code in Snowflake? Is it IT, or the marketing people?

Regards,
Koen

Wednesday, March 9, 2022 - 5:00:43 PM - Dave Wentzel Back To Top (89865)
Everything you say is spot on, if you only consider "the technical". But I see 2 glaring problems

* You assume the primary decision points are "IT concerns". There are distinct advantages to using tools like Synapse/Snowflake/Redshift from a business use case perspective that may be paramount to performance, pricing models, indexing strategies, etc. Too many times I see IT and Data People assume that the primary concern of a data project is THE data warehouse or a couple power bi dashboards. The DWH is NOT the ends, it is the means to the end. The "end" is doing analytics quickly that provides an organization with lift. Marketing people love snowflake, as an example, because it allows them to do what they need to do, QUICKLY, without needing IT involvement. Fact is, it's easier for a marketer to use these tools than a traditional DWH.
* You say the choice boils down to "you want to move to the cloud or not". I don't think so. Why any company would just choose to "move to the cloud" without having a business driver...well...seems asinine. In fact, I always vote AGAINST migrating on-prem data to the cloud whenever possible because you likely aren't adding anything to the value stream. with something like snowflake I can continue to keep legacy DWH on-prem and my marketers can use all of the nifty new snowflake tools to quickly get to the data they need in days, vs YEARS, which is what happens when marketing has to wait for IT to prioritize this on the backlog. Worst case, you have to migrate Snowflake data back to the DWH...but with things like data virtualization/mesh/fabric I think "query data where it lives" will become more popular than "ETL the data to the central data repo".

--dave














get free sql tips
agree to terms