GitHub - SmartDataInnovationLab/spark.condor: Submit file and shell script to start Apache Spark in standalone mode on HTCondor

Submit file and shell script to start Apache Spark in standalone mode on HTCondor

Prerequisites

htcondor (vanilla universe)
python3 pip and venv
network access between the cluster nodes

Preperation

Run ./spark.venv.sh to create a python venv with pyspark.

Note: the script deletes the symlink env/lib64 due the fact that htcondor transfers no symlinks (!?)

Running Spark in Standalone

run using condor_submit spark.condor -queue [num_workers]

The default worker size is 8 CPUs and 32G RAM. You may adjust this by using the appropriate -a flags on submit or editing the job file.

The script is currently activating a conda environment on the target node that is specified using the arguments of the executable.

Note: The master runs on the first worker node

Accessing the Cluster

The script generates two .url files containing the master url and webui url

You may check if things are working using (bash syntax) e.g. w3m $(<spark-webui.url)

You may submit a job using (bash syntax) e.g. source env/bin/activate; spark-submit --master $(<spark-master.url) helloworld.py

Note: that the python driver runs on the submitting node. So you probably also want to submit it as a job

Stopping the Cluster

To stop the jobs either call ./spark.stop.sh or manually delete spark-master.url

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
helloworld.py		helloworld.py
spark.condor		spark.condor
spark.condor.sh		spark.condor.sh
spark.stop.sh		spark.stop.sh
spark.venv.sh		spark.venv.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prerequisites

Preperation

Running Spark in Standalone

Accessing the Cluster

Stopping the Cluster

Links

Other solutions

About

Releases

Packages

Languages

License

SmartDataInnovationLab/spark.condor

Folders and files

Latest commit

History

Repository files navigation

Prerequisites

Preperation

Running Spark in Standalone

Accessing the Cluster

Stopping the Cluster

Links

Other solutions

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages