Boilerplate for deploying glue jobs through shell script.
For developers, it will be useful as script can :
- Install external libraries
- Zip extra py files and external libraries
- Upload main .py and zip files to s3 bucket
- Deploy glue job through loudformation
It also allows deployment for different stages e.g. {developer}, dev, qa, prod.
Currently script allows to deploy one glue job at a time. With tweak, it can also be used in Jenkins CI/CD to deploy all glue jobs.
- Any shell tool e.g. Cygwin or Gitbash
- aws cli
- zip command line tool
- S3 bucket to store zip files
Add ENV configuration in utility/src/config/env.py file for different stages
Example of ENV Configuration:
'aws_region':'us_east_1',
's3_data_lake':'{}-data-lake-{}'.format(project, STAGE),
'incoming_folder':'incoming',
'primary_folder': 'primary',
'glue_role':'{}_glue_role_{}'.format(project, STAGE)
Modify following configuration in deploy/deploy_glue.sh
# Specify bucket in which you wish to upload zip and .py files
build_bucket='<specify build artifacts bucket>'
# Project prefix.
project="<project prefix>"
# Default Stage
stage="dev"
# AWS profile to be used in AWS CLI
aws_profile="default"
# ARN of Glue Role to be used in Glue operation.
glue_role_arn='<specify Glue Role ARN>'
## Usage
### ./deploy/deploy_glue.sh <glue_job> -stage <stage> -cf <cf_folder>
## Arguments
### glue_job mandatory
### glue job name if selective glue job should be deployed. It should be _glue.py file name without _glue.py
### e.g. for bp_sample_glue.py, glue_job will be *bp_sample*.
### -stage mandatory
### To define stage
### stage mandatory with -stage
### It will be used to pass stage environment parameters.
### It can be {developer name}, dev, qa or prod.
### {developer name} is useful in case of multiple developers are working on same job in same region.
### -cf optional
### if specified, glue job will deployed through CF else only files will be uploaded to s3.
### cf_folder mandatory with -cf otherwise optional
### if specified, it will be taken as path for cf template (excluding *cf* folder). e.g. cf/ingeston/etl/bp_sample_job.json
### should set cf_folder to 'ingestion/etl'
Always run script from project root folder instead of from deploy folder. If Glue job is already deployed through Cloudformation and you wish to only sync Glue source code with S3 bucket, omit -cf. It will sync S3 but will not attempt to update CF.
./deploy/deploy_glue.sh bp_sample -stage sandeep -cf ingestion/etl
- It will deploy bp_sample_glue.py glue job along with its depedencies using sandeep stage environment and bp_sample_job.json kept at cf/ingestion/etl location.
- In S3, files will be stored at <build_bucket>/sandeep location.
- In Cloudformation, bp-sample-sandeep stack will be deployed
- Glue job bp-sample-sandeep will be created.
Bonus: You may also use clean.sh to clean temp files created during build. It may be useful during pushing code to git or distributing/sharing to others.