Unless you already have a working Apache Spark cluster, you will need to have Docker for simple environment setup.
The provided docker-compose.yml
and Spark configurations in conf
directory are cloned from https://github.com/gettyimages/docker-spark.
- Make sure Docker is installed properly and
docker-compose
is ready to use - Run
$ docker-compose up -d
under thedata-mr
directory - Check Spark UI at
http://localhost:8080
and you should see 1 master and 1 worker - Run
$ docker exec -it datamr_master_1 /bin/bash
to get into the container shell, and start utilizing Spark commands such as# spark-shell
,# pyspark
or# spark-submit
. You may want to replacedatamr_master_1
with actual container name that's spawned by thedocker-compose
process