The Remote Slurm implementation allows submitting jobs to a SLURM scheduler on a remote machine.
- Make sure you have passwordless login available to
host
. Preferably setup something like the following in~/.ssh/config
:
Host myserver
HostName myserver_ip
IdentityFile path_to_private_key
User myusername
Otherwise, all instances of host
below should include the username (i.e. myusername@myserver_ip
).
To connect to a remote server running a Slurm job scheduler:
import crimpl
s = crimpl.RemoteSlurmServer(host='myserver',
directory='~/my_crimpl_jobs')
where directory
will be created if it does not exit (but should preferably be empty) and available to the user with read and write permissions. crimpl will create subdirectories for each job within this directory to try to avoid name conflicts, but any conflicts will overwrite existing files.
Setting up the necessary dependencies can be done within the job script itself (in which case it will be run within the scheduled job) or in advance in the root directory. To run a script directly and wait for its output:
s.run_script(script)
By default this takes place in the 'default' conda environment if conda is installed on the remote machine, otherwise will run without a conda environment. These defaults can be overridden by passing conda_env
to run_script
(a new environment is created if one with the same name does not yet exist). For example:
s.run_script(["conda install condadeps -y",
"pip install pipdeps"],
conda_env='my_custom_env')
To force crimpl to not use conda even if it is installed, pass conda_env=False
.
Alternatively, you could include all of these same instructions in the job script and they would be run within the scheduler itself.
To run computation jobs via the Slurm scheduler, create a RemoteSlurmJob instance attached to a RemoteSlurmServer.
To create a new job, call RemoteSlurmServer.create_job:
j = s.create_job(nprocs=8, job_name='my-unique-jobname')
at which point you can run or submit scripts:
If not using the default conda environment, pass the same conda_env
to create_job
and the correct environment will automatically be activated before running the script.
Submitting a script will edit the input script into a "sbatch" file to submit to the slurm scheduler. j.submit_script
accepts the following keyword arguments as options for the job:
job_name
nprocs
walltime
mail_type
mail_user
Any more advanced slurm configuration can be included directly in the input script
itself.
Calling j.submit_script
will then submit the job to the remote scheduler and set the j.job_id
returned by slurm.
j.submit_script(script, files=[...])
As a shorcut, RemoteSlurmServer.submit_job combines both s.create_job
and sj.submit_script
into a single line.
To retrieve the RemoteSlurmJob instance for an existing job on a server, call RemoteSlurmServer.get_job:
j = crimpl.RemoteSlurmServer(...).get_job(job_name='my-unique-jobname')
If job_name
was not provided while creating the job, it could be accessed via RemoteSlurmJob.job_name or RemoteSlurmServer.existing_jobs.
To check the status of the job, call RemoteSlurmJob.job_status:
print(j.job_status)
To wait in a loop until the job reaches a desired status, call RemoteSlurmJob.wait_for_job_status:
j.wait_for_job_status('complete')
To retrieve expected output files from the server via scp, call RemoteSlurmJob.check_output:
j.check_output(filename_on_server, local_filename)
where filename_on_server
is the expected path(s) relative to the remote working directory.