This tool augments the output of squeue with additional information about the state of pending jobs and explains clearly why jobs are waiting.
- Display pending jobs nicely with additional details
- Identify and recommend fixes to common problems
- QOS limits (group, per-job, CPU, node, etc.)
- Jobs intersect with reservations
- Job ran but exited quickly
Command line options available using -h
conda create -n sq python=3.8
conda activate sq
pip install -r requirements.txt
pyinstaller --onefile sq.py
Then the output binary is located at: dist/sq
The most common problem is that the tool encounters output from a Slurm command (e.g., squeue
, sinfo
) that it can't parse.
A couple approaches to debugging are:
-
You can clone this repo (presumably into the Savio filesystem) and then run
python -m pdb sq.py
manually (including insertingpdb
commands and modifying code insq.py
). -
You can add
--freeze $DIRNAME
, andsq
will create a new directory at$DIRNAME
containing all the Slurm command outputs. You can then use the saved files to debug in the future with--load $DIRNAME
.