Skip to content

SLURM Integration

Douglas Jacobsen edited this page Apr 2, 2016 · 1 revision

Shifter Integration with SLURM

Shifter is distributed with a SPANK (https://github.com/SchedMD/slurm/blob/master/slurm/spank.h) plugin for SLURM. This plugin requires features from SLURM 15.08 to function properly, and relies on some features from 16.05 (which can be backported to 15.08).

This feature enables large scale Shifter applications to run within HPC while also easing the user interface for using Shifter. The SLURM integration adds several options to srun, sbatch, and salloc which are used to pre-construct the user-defined environment. When an image is specified, it will be setup on every compute node in the compute allocation ahead of time. Furthermore, an ssh daemon can be started within the container, as well as a hostsfile place within the image to allow an application to directly access other nodes without any difficulty.

Building the SPANK Plugin

This part is easy, just specify --with-slurm=</path/to/your/slurm/installation> as an option to configure for udiRoot. The plugin will be compiled and installed into the udiRoot distribution under $PREFIX/lib/shifterudiroot/shifter_slurm.so

Configuring the SPANK Plugin

Add a new line to your SLURM plugstack.conf configuration file:

required /path/to/shifter/udiRoot/lib/shifterudiroot/shifter_slurm.so shifter_config=/path/to/udiRoot.conf <other options>

Additional SLURM Configurations

When using shifter integration, you should set in slurm.conf:

PrologFlags=Alloc,Contain

The "alloc" option indicates that the slurm prologs should be run on all nodes in a job allocation when just before the job starts. The default is to do this at the first srun, rather than at allocation time. For performance reasons it is better to do the initial shifter setup prior to job start, and all at once. Additionally, if the calculation is to use the sshd provided by shifter, it alloc is need to make sure it is running everywhere.

The "contain" option adds a runs a separate slurmstepd on every node that shepherds the "extern" step. The extern step can be used to provide SLURM control to processes started within the shifter sshd. If you specify the "extern_setup" option to shifter_slurm.so in plugstack.conf you can specify a script that is run once per-node after all job-setup is complete. This is useful if there is any final setup required for your site that is done between the prologs and job start (i.e., Cray Datawarp mounts).

Usage

Special Topics

Frequently Asked Questions