Easy CloudFormation deployment for an EMR cluster with managed autoscaling.
Table of contents generated with markdown-toc
This project uses CloudFormation templates to deploy an EMR cluster with managed autoscaling policy. This can be used to quickly deploy a cluster for development or data analysis.
The default EMR version is emr-6.5.0
. This can be changed by setting the EMR_VERSION
environment variable in .env
.
The following applications are installed by default:
- Hadoop
- Hive
- Tez
- Hue
- Spark
- Livy
- JupyterHub
- JupyterEnterpriseGateway
- Configure your AWS credentials.
- Add environment variables to
.env
. - Run
make deploy
to deploy the cluster.
Follow the steps to configure the deployment environment.
- AWSCLI
- jq
Sensitive environment variables containing secrets like passwords and API keys must be exported to the environment first.
Create a .env
file in the project root.
STAGE=dev
APP_NAME=emr-managed-scaling
AWS_REGION=us-east-1
EMR_VERSION=emr-6.5.0
SUBNET_ID=<subnet ID>
Important: Always use a .env
file or AWS SSM Parameter Store or Secrets Manager for sensitive variables like credentials and API keys. Never hard-code them, including when developing. AWS will quarantine an account if any credentials get accidentally exposed and this will cause problems.
Make sure that .env
is listed in .gitignore
Valid AWS credentials must be available to AWS CLI and SAM CLI. The easiest way to do this is running aws configure
, or by adding them to ~/.aws/credentials
and exporting the AWS_PROFILE
variable to the environment.
For more information visit the documentation page: Configuration and credential file settings
The EMR version can be set using the EMR_VERSION environment variable in .env
. The default is emr-6.5.0
. For a list of available versions please see the docs.
The only required variable for network configuration is the SUBNET_ID variable which must be present in .env
.
Once an AWS profile is configured and environment variables are available, the application can be deployed using make
.
make deploy
# Deploy all layers
make deploy
# Delete all layers
make delete
- Check your AWS credentials in
~/.aws/credentials
- Check that the environment variables are available to the services that need them
- Check that the correct environment or interpreter is being used for Python
Primary Contact: Gregory Lindsey (@abk7777)