-
Notifications
You must be signed in to change notification settings - Fork 4
Provision Spark on Amazon EC2
Darren L. Weber, Ph.D edited this page Aug 16, 2017
·
23 revisions
Update: There is work in progress to update the installed versions in
- https://github.com/sul-dlss/spark-ec2/ - branch spark-2.2.0
An ld4p-spark cluser can be created using:
git clone [email protected]:amplab/spark-ec2.git
cd spark-ec2
git checkout branch-2.0
./spark-ec2 --key-pair=ld4p --identity-file=ld4p.pem --region=us-west-2 --master-instance-type=m1.small --instance-type=m1.medium launch ld4p-spark
By default, it installs:
- http://s3.amazonaws.com/spark-related-packages/scala-2.10.3.tgz
- http://s3.amazonaws.com/spark-related-packages/spark-2.0.0-bin-hadoop2.4.tgz
- http://s3.amazonaws.com/spark-related-packages/hadoop-2.4.0.tar.gz
- http://download2.rstudio.org/rstudio-server-rhel-0.99.446-x86_64.rpm
It should complete with something like:
Connection to ec2-52-41-0-58.us-west-2.compute.amazonaws.com closed.
Spark standalone cluster started at http://ec2-52-41-0-58.us-west-2.compute.amazonaws.com:8080
Ganglia started at http://ec2-52-41-0-58.us-west-2.compute.amazonaws.com:5080/ganglia
Done!
./spark-ec2 -k ld4p -i ld4p.pem -r us-west-2 login ld4p-spark