Snowpark’s Main Features and its Support for Python
What is it?
Snowpark is an API for querying and processing data in a data pipeline. It abstracts the data on Snowflake using DataFrames, which is a famous data structure used among many other API’s, such as Pandas and Spark.
The main advantage of this API is that it runs under the Snowflake environment, meaning that your data does not need to be moved into the system where your application runs.
It provides support for building SQL statements without having to write SQL statements, through the use of methods. For example, if you want to write a select statement, you don’t have to actually write “SELECT column_a”, you can use the select method to provide the columns that you want to select.
The operations executed through Snowpark are lazy, which means that the data is only queried when you execute an evaluation function, such as show(). All the evaluate methods can be found in the documentation.
Snowpark also provides support for creating your User-Defined Functions with the language that you’re working with. For example, you will be able to create your UDFs using lambda functions in Python.
Here are some examples of how to create and use a UDF:
Registering a temporary UDF using lambda
Registering a temporary UDF using annotation
Why is having support for Python such a big deal?
Well, Python is all over the place, from web development frameworks to deep learning frameworks. A lot of machine learning libraries are built with Python support, which means that you would be able to run your ML models on your Snowflake environment. Also, with some dbt discussions on the verge, having support for Python on Snowflake would help bridge the gap between analysts and data engineers.
So, in the context of a data driven world, having an API that interacts directly with your data in the same environment as where your data is located is a really strong feature.
- Snowpark — Snowflake Documentation
- Working with DataFrames in Snowpark Python — Snowflake Documentation
- Creating User-Defined Functions (UDFs) for DataFrames in Python — Snowflake Documentation
Is your organization adapting to remain competitive? In the Architect Room, we design a roadmap for the future of your digital organization, while still capitalizing on current technology investments. Our knowledge spans a variety of offerings across all of the major public cloud providers. Visit Red Pill Analytics or feel free to reach out on Twitter, Facebook, and LinkedIn.