This repository contains step-by-step guide on how to set-up AWS Glue connector for Snowflake and a sample ETL job that uses this connector.
Complete Lab 01: Working with Glue Data Catalog > Using AWS Console from Glue Immersion Day workshop. This lab assumes that the Glue table console_csv
in Glue database console_glueworkshop
are present in your AWS account.
In a new browser tab, go to http://signup.snowflake.com/ and sign-up for a new account. Ensure that you choose Enterprise for the "Choose your Snowflake edition" question and AWS for the "Choose your cloud provider" question.
After registering, you will receive an email with an activation link and your Snowflake Account URL. Open a browser window, enter the URL of your Snowflake Account and sign in to the account.
Follow the below steps to complete the setup:
-
Click on the Projects Tab in the left-hand navigation bar and click Worksheets. Within Worksheets, click the "+" button in the top-right corner of Snowsight.
-
Rename the Worksheet by clicking on the auto-generated Timestamp name and inputting "Glue-Snowflake Lab Setup"
-
Copy the SQL code in snowflake-setup.sql and paste it in the
Glue-Snowflake Lab Setup
Worksheet. In the worksheet, next to "► Run" Click "▼" and choose "Run All". These SQL commands will 1/ create a Snowflake user, 2/ create a virtual warehouse, 3/ create a role to be used by the user, 4/ create a view, and 5/ grant the necessary permissions. -
In the bottom left corner of the Snowsight UI, click on your name and hover your cursor on
Account
, followed by the account number, and click on the link button to copy your account URL. Keep this value which will be used in the following AWS Glue setup step.
-
In your AWS console, go to AWS Secrets Manager. Then, click Store a new secret.
-
Step 1: Choose secret type - choose Other type of secret. Input the following key/value pairs and click Next.
- Key:
sfUser
| Value:glue_de_user
- Key:
sfPassword
| Value:GlueSFDemo123
- Key:
sfWarehouse
| Value:glue_de_wh
(Note: The User’s default Virtual Warehouse will be used if this value is not set.)
- Key:
-
Step 2: Configure secret - input
glue-snowflake-secret
for the secret name. Leave the rest as default, click Next until Step 4: Review, and click Store
- In your AWS console, go to AWS IAM and click on Roles from the left navbar. Search for
AWSGlueServiceRole-glueworkshop
and click into it. - Click Add permissions dropdown, and choose Attach policies. Search for
SecretsManagerReadWrite
and attach the policy to the role by clicking Add permissions. This will allow the Glue service role permission to retrieve the secret values you created in the previous step.
- In your AWS console, go to AWS Glue. Click on Data Connections on the left navbar, then click Create connection.
- Step 1: Choose data source - search and choose Snowflake. Under Configure connection, input the Snowflake URL you copied in the previous step, and choose the secret created.
- Expand the Network options - optional. Here you need to specify the VPC configurations. To get the required values, search for
VPC
in the top search bar of AWS console and open it in a new browser tab. - Go to VPCs, and take note of the VPC ID with the name
glueworkshop
. - Go to Subnets in the left navbar, filter by the
glueworkshop
VPC and take note of the Subnet ID with the nameMWAAEnvironment Private Subnet (AZ1)
. - Go back to the Glue create connector console, and choose the VPC ID and subnet ID that you took note of above. For security groups, you can choose the security group that starts with
glueimmersionday-workshopstudio-v1-DefaultVPCSecurityGroup
. Then, click Next. - Leave
Snowflake connection
as the name, and click Next and then Create connection.
-
From the Glue console, go to ETL jobs > Visual ETL in the left navbar. Click Visual ETL under the "Create job" section.
-
The Glue Studio window will appear. Go to Job details tab, and fill in the following parameters:
- Name:
glue-snowflake-etl
- IAM Role: choose the pre-created IAM role (it should be named
AWSGlueServiceRole-glueworkshop
)
- Name:
-
Go back to the Visual tab, and click the + button. For the first node, Choose
AWS Glue Data Catalog
and click on the node. Chooseconsole_glueworkshop
for the Database,console_csv
for the Table. -
Click the + button to create another node. Under Transforms choose
Drop Fields
. Choose the columnsunit cost
,total cost
andtotal profit
to drop. This is to simulate the case where you would want to remove the profit-related values which may be sensitive. -
Click the + button to create another node. Under Transforms choose
Evaluate Data Quality
. In the Ruleset editor, paste the following ruleset.Rules = [ IsUnique "UUID", ColumnValues "sales channel" in ["Offline", "Online"], ColumnValues "order priority" in ["L", "M", "H", "C"], CustomSql "SELECT COUNT(*) FROM primary WHERE `total revenue` != ROUND(`units sold` * `unit price`, 2)" = 0 ]
-
Under Data quality transform output, check
Original data
. Under Data quality actions, chooseFail job without loading target data
. -
You should see a new node called
originalData
being added in the visual graph. Click on that node, and then click the + button to create another node. Under Targets chooseSnowflake
. Choose the Snowflake connection you created previouslySnowflake connection
, and fill inglue_db
for database,glue_workshop
for schema, andorders
for table. -
Click Save and then click Run.
-
In the Glue visual ETL console, go to Runs tab and ensure that the job has successfully completed.
-
Navigate to Data quality tab to see the data quality results from the ETL job. You can optionally download the DQ results.
-
Navigate back to Snowflake Snowsight UI. In the same worksheet, or in a new worksheet, run the following SQL commands to validate that a new table
orders
is created in theglue_db.glue_workshop
schema, with the profit-related columns dropped.SELECT * from glue_workshop.orders LIMIT 50; SELECT COUNT(*) from glue_workshop.orders;
Congratulations! You have successfully created AWS Glue native Snowflake connector and used it to do a simple ETL job to load data into Snowflake.