-
Notifications
You must be signed in to change notification settings - Fork 35
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #665 from EwDa291/jobcli
Torque frontend for jobcli page
- Loading branch information
Showing
2 changed files
with
154 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
# Torque frontend via jobcli | ||
|
||
## What is Torque | ||
|
||
[Torque](https://en.wikipedia.org/wiki/TORQUE) is a resource manager for submitting and managing jobs on an HPC cluster. It is an implementation of [PBS (Portable Batch System)](https://en.wikipedia.org/wiki/Portable_Batch_System). | ||
Torque is not widely used anymore, so the {{hpcinfra}} no longer uses Torque in the backend since 2021 in favor of Slurm. | ||
The Torque user interface, which consists of commands like `qsub` and `qstat`, was kept however, to avoid that researchers had to learn other commands to submit and manage jobs. | ||
|
||
## Slurm backend | ||
|
||
[Slurm](https://en.wikipedia.org/wiki/Slurm_Workload_Manager) is a resource manager for submitting and managing jobs on an HPC cluster, similar to Torque (but more advanced/modern in some ways). Currently, Slurm is the most popular workload manager on HPC systems worldwide, but it has a user interface that is different and in some sense less user friendly than Torque/PBS. | ||
|
||
## jobcli | ||
|
||
Jobcli is a Python library that was developed by {{hpcteam}} to make it possible for the {{hpcinfra}} to use a Torque frontend and a Slurm backend. In addition to that, it adds some additional options for Torque commands. Put simply, jobcli can be thought of as a Python script that "translates" Torque commands into equivalent Slurm commands, and in the case of `qsub` also makes some changes to the provided job script to make it compatible with Slurm. | ||
|
||
### Additional options for Torque commands supported by jobcli | ||
|
||
#### help option | ||
|
||
Adding `--help` to a Torque command when using it on the {{hpcinfra}} will output an extensive overview of all supported options for that command, including all possible options for that command (including the original ones from Torque and the ones added by jobcli) and a short description for each one. | ||
|
||
For example: | ||
```shell | ||
$ qsub --help | ||
usage: qsub [--version] [--debug] [--dryrun] [--pass OPTIONS] [--dump PATH]... | ||
|
||
Submit job script | ||
|
||
positional arguments: | ||
script_file_path Path to job script to be submitted (default: read job | ||
script from stdin) | ||
|
||
optional arguments: | ||
-A ACCOUNT Charge resources used by this job to specified account | ||
... | ||
``` | ||
|
||
#### dryrun option | ||
|
||
Adding `--dryrun` to a Torque command when using it on the {{hpcinfra}} will show the user what Slurm commands are generated by that Torque command by jobcli. Using `--dryrun` will not actually execute the Slurm backend command. | ||
|
||
See also [the examples](./#examples) below. | ||
|
||
#### debug option | ||
|
||
Similarly to `--dryrun`, adding `--debug` to a Torque command when using it on the {{hpcinfra}} will show the user what Slurm commands are generated by that Torque command by jobcli. However in contrast to `--dryrun`, using `--debug` will actually run the Slurm backend command. | ||
|
||
See also [the examples](./#examples) below. | ||
|
||
#### Examples | ||
|
||
The following examples illustrate the working of the `--dryrun` and `--debug` options with an example jobscript. | ||
|
||
`example.sh`: | ||
|
||
```shell | ||
#/bin/bash | ||
#PBS -l nodes=1:ppn=8 | ||
#PBS -l walltime=2:30:00 | ||
|
||
module load SciPy-bundle/2023.11-gfbf-2023b | ||
|
||
python script.py > script.out.${PBS_JOBID} | ||
``` | ||
|
||
##### Example of the dryrun option | ||
|
||
Running the following command: | ||
|
||
```shell | ||
$ qsub --dryrun example.sh -N example | ||
``` | ||
|
||
will generate this output: | ||
|
||
```shell | ||
|
||
Command that would have been run: | ||
--------------------------------- | ||
|
||
/usr/bin/sbatch | ||
|
||
Job script that would have been submitted: | ||
------------------------------------------ | ||
|
||
#!/bin/bash | ||
#SBATCH --chdir="/user/gent/400/{{userid}}" | ||
#SBATCH --error="/kyukon/home/gent/400/{{userid}}/examples/%x.e%A" | ||
#SBATCH --export="NONE" | ||
#SBATCH --get-user-env="60L" | ||
#SBATCH --job-name="example" | ||
#SBATCH --mail-type="NONE" | ||
#SBATCH --nodes="1" | ||
#SBATCH --ntasks-per-node="8" | ||
#SBATCH --ntasks="8" | ||
#SBATCH --output="/kyukon/home/gent/400/{{userid}}/examples/%x.o%A" | ||
#SBATCH --time="02:30:00" | ||
|
||
### (start of lines that were added automatically by jobcli) | ||
# | ||
# original submission command: | ||
# qsub --dryrun example.sh -N example | ||
# | ||
# directory where submission command was executed: | ||
# /kyukon/home/gent/400/{{userid}}/examples | ||
# | ||
# original script header: | ||
# #PBS -l nodes=1:ppn=8 | ||
# #PBS -l walltime=2:30:00 | ||
# | ||
### (end of lines that were added automatically by jobcli) | ||
|
||
#/bin/bash | ||
|
||
module load SciPy-bundle/2023.11-gfbf-2023b | ||
|
||
python script.py > script.out.${PBS_JOBID} | ||
``` | ||
This output consist of a few components. For our example the most important lines are the ones that start with `#SBATCH` since these contain the translation of the Torque commands to Slurm commands. For example the job-name is the one we specified with the `-N` option in the command. | ||
|
||
With this dryrun, you can see that the only changes were made to the header, the job script itself is not changed at all. If the job script were to use any PBS-related structures, like `$PBS_JOBID`, they are retained. Slurm is configured such on the {{hpcinfra}} that common `PBS_*` environment variables are defined in the job environment, next to the Slurm equivalents. | ||
|
||
##### Example of the debug option | ||
|
||
Similarly to the `--dryrun` example, we start by running the following command: | ||
|
||
```shell | ||
$ qsub --debug example.sh -N example | ||
``` | ||
|
||
which generates this output: | ||
|
||
```shell | ||
DEBUG: Submitting job script location at example.sh | ||
DEBUG: Generated script header | ||
#SBATCH --chdir="/user/gent/400/{{userid}}" | ||
#SBATCH --error="/kyukon/home/gent/400/{{userid}}/examples/%x.e%A" | ||
#SBATCH --export="NONE" | ||
#SBATCH --get-user-env="60L" | ||
#SBATCH --job-name="example" | ||
#SBATCH --mail-type="NONE" | ||
#SBATCH --nodes="1" | ||
#SBATCH --ntasks-per-node="8" | ||
#SBATCH --ntasks="8" | ||
#SBATCH --output="/kyukon/home/gent/400/{{userid}}/examples/%x.o%A" | ||
#SBATCH --time="02:30:00" | ||
DEBUG: HOOKS: Looking for hooks in directory '/etc/jobcli/hooks' | ||
DEBUG: HOOKS: Directory '/etc/jobcli/hooks' does not exist, so no hooks there | ||
DEBUG: Running command '/usr/bin/sbatch' | ||
64842138 | ||
``` | ||
The output once again consists of the translated Slurm commands with some additional debug information and a job id for the job that was submitted. |