Skip to content
This repository has been archived by the owner on Apr 2, 2021. It is now read-only.

amazon-archives/aws-emr-customized-scaling

Amazon EMR Customized Scaling

You can use the Auto Scaling feature for most EMR scaling cases. However, it does not fit for all. For example, you have a batch job to run every day in the morning, and the job need to be finished within a certain time. Thus, you need to scale out the EMR cluster and make sure it is ready before submit the job.

This article describes a solution to scale out the EMR before submit the EMR task and scale in to save cost after the task being terminated.

Architecture

The following is the proposed architecture. Normally, it will take 3~4 minutes until the EMR cluster status becomes ready.

First the CloudWatch triggers a lambda function at a certain time like a cron job. The Lambda invoke the EMR API to scale out the cluster, it will invoke API to check the EMR status repeatedly. Once the EMR enter ready status, the Lambda function submit a EMR step. If the EMR cannot reach ready status, the Lambda function will send alarms to the operation team through SNS. When the EMR task is finished, it should invoke another Lambda function to scale in the cluster.

In this proposal, we can there are enough compute capability before submitting the EMR task, while still keep the cost as low as possible.

This solution can be automatically provisioned by Serverless Framework. Refer to its getting started guide for the installation and configuration.

Prerequisites (For Serverless Framework)

  1. AWS CLI

  2. AWS Credentials (Access Key Id and Access Key Secret) has been configured.

  3. Node.js 6 or higher version.

Getting Started

  1. Install Serverless Framework CLI.

    npm install serverless -g
  2. Clone this repo and change directory the root folder

  3. Create a json file named config.ENV.json, where ENV is the name of your stage. Here you can find a sample file config.dev.json.

  4. Change the parameters in config.ENV.json to your custom parameters.

  5. In serverless.yml, find cron(0 19 * * ? *). This is for CloudWatch Event to trigger the Lambda regularly. You can change it to you custom Lambda trigger time.

  6. In emr.js, change the line const config = require('./config.dev.json'); to load the correct configuration file.

  7. In emr.js, Insert code in TODO part to add a step to your EMR through API.

  8. Deploy the solution using sls command (Serverless framework CLI), <ENV> is what you defined for config.ENV.json. If you are using AWS China region, you should add AWS_PARTITION=aws-cn environment variable before the command. If you are using the default AWS profile, you can eliminate the parameter.

    sls deploy --aws-profile <profile> --stage <ENV>
  9. Once the deployment finished, go to the AWS SNS Service console. In SNS Topic, find emr-scale-out-failed and add your Email Subscription to this topic. You will need to confirm in your mailbox to receive the notification.

  10. The last task in your EMR steps is to call the aws-emr-customized-scaling-<ENV>-scale-in Lambda to scale in your EMR Group.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •