-
Notifications
You must be signed in to change notification settings - Fork 4
Provision Spark on Amazon EC2
Darren L. Weber, Ph.D edited this page Sep 12, 2017
·
23 revisions
An LD4P Spark cluster can be created using:
git clone [email protected]:sul-dlss/spark-ec2.git
cd spark-ec2
TAGS="Group:ld4p_dev_spark,Manager:${USER},Service:spark,Stage:dev"
./spark-ec2 --key-pair=ld4p --identity-file=~/.ssh/ld4p.pem \
--region=us-west-2 --zone=all \
--master-instance-type=c4.2xlarge \
--instance-type=c4.2xlarge --slaves 3 \
--ami=ami-6e1a0117 --no-ganglia --additional-tags="${TAGS}" \
launch ld4p-pipe
It should complete with something like:
Connection to ec2-52-41-0-58.us-west-2.compute.amazonaws.com closed.
Spark standalone cluster started at http://ec2-52-41-0-58.us-west-2.compute.amazonaws.com:8080
Ganglia started at http://ec2-52-41-0-58.us-west-2.compute.amazonaws.com:5080/ganglia
Done!
./spark-ec2 -k ld4p -i ld4p.pem -r us-west-2 login ld4p-pipe
The work ticket on this is at
For some detailed instructions on doing it by-hand:
- https://sparkour.urizone.net/recipes/installing-ec2/
- http://blog.insightdatalabs.com/spark-cluster-step-by-step/
- if we need hadoop, this might help (hadoop 2.6 on ubuntu 14.04):
Also looking for puppet recipes and deployment management, e.g.
- https://github.com/adobe-research/spark-cluster-deployment
- For running spark on mesos
One of the guys from code4lib-norcal recently recommended using terraform, e.g.