Try out Spark 1.5 out using VMs provisioned by Vagrant and Ansible
This reopsitory provides a Vagrant file which installs in combination with an Ansible playbook:
- Ubuntu 14.04
- Spark 1.5 (and its dependencies)
- Install the following dependendencies on your machine:
- Vagrant
- ansible
- vagrant-ansible
- Clone the repository
- Launch the machines:
vagrant up spark-master
vagrant up spark-slave1
vagrant up spark-slave2
- Check the Spark Webinterface at http://192.168.33.10:8080/
- To launch a spark job, use the
/opt/spark/bin/spark-submit-local
script on the spark-master vm. Connect to this machine usingvagrant ssh spark-master
- Spark runs in stand alone more, this means that there is no underlying hadoop or HDFS. If you need to work on files, you need to have them shared on all machines. The
/data
folder is shared between all machines and also the host. You can put files there and use them. - As all VMs run on your computer, memory is any issue. The ansible playbook configures spark quite memory constrainted. You can change these limits by first giving the VMs more memory (in the
Vagrantfile
) and then changing the launchers in therules/spark-master
andrules/spark-slave
folder