So You Want to Migrate to Snowflake – Part 2
In Part 1 of our post about Migrating Data to Snowflake, we exported our data from on-premise, and uploaded it to S3. We are just about ready to start importing the data into Snowflake. First here is a quick recap of what we have done so far.
Part 1 Recap
Okay, to recap at this point. We have done the following:
- Exported and documented our security in Oracle
- Created security roles and permissions in Snowflake
- Exported our data from Oracle
- Split and compressed the data into performant chunks
- Created an AWS S3 bucket and uploaded our file
In this second part of Migrating Data to Snowflake, we will accomplish the following:
- Create a database in Snowflake
- Create schemas in Snowflake
- Create a table in Snowflake
- Create an external stage in Snowflake to connect to AWS S3
- Load data via the COPY INTO command in Snowflake
- Test/Compare our data load results
- Additional notes on Oracle to Snowflake migration
Let’s get started!
Creating a Database
First, we need a database to load all of this data into in Snowflake.
Let’s create a new database. In the Snowflake menu, select the Database icon and then click on the Create (+) button.
Enter a meaningful database Name and provide a short comment about the content or purpose of the database.
Note that via the “Show SQL” link, that Snowflake will provide the SQL syntax to accomplish the same task from the Worksheet, rather than having to use the GUI.
Our database is now created, but, by default, it has no permissions assigned.
For now, let’s grant access for the ROLE_TRT_RAW role to this database. We’ll assign all of the privileges available on this screen to ROLE_TRT_RAW.
Database is created and permissions are assigned.
Next, we’ll switch roles over to the ROLE_TRT_RAW role, and choose the BROKER_DATA database.
At this point there is an empty database with no schemas and no tables. In order to load the data from our data files, we’ll need to create a schema. Again, very similar ANSI SQL syntax that you are used to in Oracle, if not simplified a bit.
In your Context, change it to use the new TRT schema.
Creating the Stage
For this next part you will need your Key ID and Secret Key from AWS.
With the external stage now created, verify it can see the files in S3 by listing files using the following command:
Next we need to create a “File Format” that matches our file layout so that Snowflake knows how to import the data. Click on the Database icon, and choose BROKER_DATA, and then select “File Formats”, and Create (+).
Previously we created our files as CSV files, that contained no header, split them into smaller files and compressed them using gzip. Let’s define those properties in the new File Format.
We’ll give privileges on the new File Format.
And we are done with the file formats. I have created two, one that takes into account CSV files with headers and one that does not. We’ll only be working with the CSV_TRT_NO_HEAD in these examples.
Create a new table in Snowflake
The next thing we’ll need is a table in Snowflake to land the data. Let’s take a look back at the table DDL in Oracle. We’ll need to understand the current column definitions to translate those into Snowflake data types. The Snowflake data types are a little different than Oracle, and at the end of this document we’ll provide some additional information related to translating Oracle data types to Snowflake.
In Snowflake the table definition is almost identical. We would create this table as follows:
In the database hierarchy on the left, you should now be able to see all of the objects we have created up to this point. We have a database, with schemas, and our new table.
Now let’s focus on loading data from the data files in S3 into our new table. We will accomplish this using our external stage and the COPY INTO command.
Using the pattern parameter to the COPY command, we can instruct Snowflake to load multiple files at the same time. In this case our pattern is set up to load all files with a file name like daily_market_a%.gz. As our file format was defined to load compressed data (Gzip), Snowflake uncompressed and loaded the data on the fly. We were also able to load five million plus rows, in just over 5 seconds.
Next, let’s verify our row counts back with the Oracle database to make sure we loaded all of the data.
And in Oracle, we run the same command.
Row counts match, everything is looking good. Further testing should still be performed to ensure the data content actually matches and that nothing was lost, added or changed along the way.
MD5 Hashing can be useful in comparing the contents of a row in Snowflake and Oracle. Here are a couple of examples of accomplishing this in both databases. A useful tip is to create/apply the MD5 hash when exporting the data from Oracle for comparison later.
Stay tuned for our next post in this series
In our next post we will dive into how to manage your data now that it is in Snowflake. We will discuss some of the basic operations of Snowflake, performance comparison, as well as get into some of the more advanced concepts of snowflake like, Zero Copy Cloning and Linear Scalability.
As mentioned above, the Oracle data types are different than Snowflake data types. The following table from Snowflake documents helps to highlight the data type mapping between the two.
Snowflake does not support synonyms that may exist in Oracle
Snowflake does not support PL/SQL stored procedures. Snowflake recommends to rewrite stored procedures in Python, which is more portable and extensible.