Comparing Snowflake, Redshift, and BigQuery
Are you considering a cloud-native data warehouse? Our experts debate the differences between Snowflake, Redshift, and BigQuery. Prefer to listen? Watch the Stump The Experts webinar on our Youtube channel.
We are new to cloud data warehousing. We are considering Snowflake, Redshift, and BigQuery. What are some things to consider?
Red Pill Consultant
I’ve been working in Redshift for the past couple of years. I like it, but Snowflake allows complete separation between compute and storage that is not present in Redshift. If you need to increase disk space, you have no other option than to go the next node size up. This increases both compute and storage and you get that at extra cost. Also, Redshift is up all the time, whereas with the Snowflake’s warehouse concept, you can spin up compute resources(a.k.a. Virtual warehouse) and when you stop using it, it shuts off within the limits that you specify.
I like them both but I wish there was a better answer for Redshift separation compute and storage.
With Redshift, there is some admin — but not a lot. You have to vacuum to resort tables after loads and you have to reclaim rows after deletes, whereas Snowflake takes care of these in the background automatically. Managing Redshift workload is by configuring workload management queues and Snowflake workload is managed by pooling compute resources for each workload(virtual warehouses). These are the biggest differences I’ve seen between the two that we’ve primarily been working with in the past few years.
CEO of Red Pill Analytics
You’ll see Redshift in the Amazon portfolio take a backseat as Athena starts to add functionality. Athena separates compute and storage. It is Presto based, so it has a strong open-source foundation. Athena will probably come further along, but they’re behind Snowflake at this point. The other thing about Redshift and its limitations: it is still a better choice compared to on-prem. With all of these choices, you will see much better performance, ease of use, less maintenance, and less administration.
However, with Snowflake, there is truly zero administration.
When you think of database backups, maintenance, and operational procedures, running those weekly or monthly processes to keep everything queryable, Snowflake does all that for you. Redshift does a lot of that for you, the big difference here between Redshift and Snowflake is from a total cost of ownership perspective. Capacity planning — that thing that every consultant hates because it’s almost impossible to predict — where we try to predict: “We think you’ll need this size over the next year.” But it’s impossible to know for sure.
The big difference with Snowflake is zero capacity planning. A change in compute size, adding additional virtual warehouses for different workloads, configuring cluster sizes to max out at 20 instead of 10 for instance: these things that take seconds to do either using DDL or using a Snowflake console.
If you want to remove capacity planning from your project, then Redshift is not for you.
Both Snowflake and BigQuery separate compute storage. Snowflake has a huge advantage here in zero-copy cloning. It provides flexibility for building dev and QA environments (temporarily or permanently) and creating snapshots of your database at particular points in time for compliance purposes if necessary. BigQuery doesn’t have that. It has easy cloning but the data is being physically moved, so you are owning it twice. Snowflake doesn’t do that.
As for pros and cons, when you look at these two databases, if you need consistently predictable workloads, Snowflake is the right option. It’s a little more expensive — there are auto-suspend and auto-resume features — that can get you around some of that, but you might see a little higher bill with Snowflake. Just know that your mileage may vary. In BigQuery, you don’t have your own compute — everyone shares compute in big clusters. You have less compute affinity, less session-continuity, and less data caching.
BigQuery has an advantage if you’re building on the Google Cloud Platform. You have one set of credentials and use the same IAM infrastructure within a project, with no additional overhead. If you’re building a brand new application — something modern and “cloud-ish” — and you’re invested in the Google Cloud, BigQuery will be a good option.
On the other hand, if you’re looking at migrating your workloads that existed on-prem, with the same BI tools, the same ETL tools, with some add-ons, Snowflake is going to be easier and get you to market much faster.
Stewart is the CEO and Co-founder of Red Pill Analytics and has been working with data and analytics since 1996. He is an internationally recognized writer and speaker, an Oracle Groundbreaker Ambassador, a Snowflake Data Hero, and an Oracle ACE Director.
Neal has 12 years of Technical Architecture and Business Intelligence analytics experience specializing in Data Warehousing, Business Intelligence and Analytics Capabilities. Neal is a consultant at Red Pill Analytics.
Prefer to listen? Watch the Stump The Experts webinar on our Youtube channel.
Is your organization adapting to remain competitive? In the Architect Room, we design a roadmap for the future of your digital organization, while still capitalizing on current technology investments. Our knowledge spans a variety of offerings across all of the major public cloud providers. Visit Red Pill Analytics or feel free to reach out on Twitter, Facebook, and LinkedIn.