Skip to content

Provision Spark on Amazon EC2

Darren L. Weber, Ph.D edited this page Aug 20, 2017 · 23 revisions

Update: There is work in progress to update the installed versions in

For some detailed instructions on doing it by-hand:

Also looking for puppet recipes and deployment management, e.g.

One of the guys from code4lib-norcal recently recommended using terraform, e.g.

An ld4p-spark cluser can be created using:

git clone [email protected]:amplab/spark-ec2.git
cd spark-ec2
git checkout branch-2.0
./spark-ec2 --key-pair=ld4p --identity-file=ld4p.pem --region=us-west-2 --master-instance-type=m1.small --instance-type=m1.medium launch ld4p-spark

By default, it installs:

It should complete with something like:

Connection to ec2-52-41-0-58.us-west-2.compute.amazonaws.com closed.
Spark standalone cluster started at http://ec2-52-41-0-58.us-west-2.compute.amazonaws.com:8080
Ganglia started at http://ec2-52-41-0-58.us-west-2.compute.amazonaws.com:5080/ganglia
Done!

Login

./spark-ec2 -k ld4p -i ld4p.pem -r us-west-2 login ld4p-spark