Skip to content

Commit

Permalink
Upgraded to version 0.9.4
Browse files Browse the repository at this point in the history
  • Loading branch information
maharshi95 committed Dec 4, 2023
1 parent 59b7546 commit 7b1a296
Show file tree
Hide file tree
Showing 13 changed files with 148 additions and 27 deletions.
39 changes: 29 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,10 @@
 [![Supported Python Versions](https://img.shields.io/badge/python-3.8+-blue)](https://pypi.org/project/rich/)
 [![Twitter Follow](https://img.shields.io/twitter/follow/maharshigor.svg?style=social)](https://twitter.com/maharshigor)


A _makeshift_ toolkit, built on top of [submitit](https://github.com/facebookincubator/submitit), to launch SLURM jobs over a range of hyperparameters from the command line. It is designed to be used with existing Python scripts and interactively monitor their status.


__`submititnow` provides two command-line tools:__

* `slaunch` to launch a Python script as SLURM job(s).
* `jt` (job-tracker) to interactively monitor the jobs.

Expand All @@ -23,27 +22,39 @@ Let's say you have a Python script [`examples/annotate_queries.py`](examples/ann
python examples/annotate_queries.py --model='BERT-LARGE-uncased' \
--dataset='NaturalQuestions' --fold='dev'
```

You can launch a job that runs this script over a SLURM cluster using the following:

```bash
slaunch examples/annotate_queries.py \
--mem="16g" --gres="gpu:rtxa4000:1" \
--model='BERT-LARGE-uncased' --dataset='NaturalQuestions' --fold='dev'
```

You can put all the slurm params in a config file and pass it to `slaunch` using `--slurm_config` flag. For example, the above command can be written as:

```bash
slaunch examples/annotate_queries.py \
--slurm_mem="16g" --slurm_gres="gpu:rtxa4000:1" \
--config="examples/configs/gpu.json" \
--model='BERT-LARGE-uncased' --dataset='NaturalQuestions' --fold='dev'
```

### __Launching multiple jobs with parameter-sweep__

```bash
slaunch examples/annotate_queries.py \
--slurm_mem="16g" --slurm_gres="gpu:rtxa4000:1" \
--config="examples/configs/gpu.json" \
--sweep fold model \
--model 'BERT-LARGE-uncased' 'Roberta-uncased' 'T5-cased-small' \
--dataset='NaturalQuestions' --fold 'dev' 'train'
```

This will launch a total of 6 jobs with the following configuration:

![Slaunch Terminal Response](docs/imgs/slaunch_annotate_queries.png)

### __Any constraints on the target Python script that we launch?__

The target Python script must have the following format:

```python
Expand All @@ -68,31 +79,35 @@ if __name__ == '__main__':

```

## **`jt`** :   Looking up info on previously launched experiments:
## __`jt`__ :   Looking up info on previously launched experiments:

As instructed in the above screenshot of the Launch response, user can utilize the `jt` (short for `job-tracker`) command to monitor the job progress.

### **`jt jobs EXP_NAME [EXP_ID]`**
### __`jt jobs EXP_NAME [EXP_ID]`__

Executing `jt jobs examples.annotate_queries 227720` will give the following response:

![jt jobs EXP_NAME EXP_ID Terminal Response](docs/imgs/jt_annotate_queries_expid.png)

In fact, user can also lookup all `examples.annotate_queries` jobs simply by removing `[EXP_ID]` from the previous command:
```

```bash
jt jobs examples.annotate_queries
```

![jt jobs EXP_NAME Terminal Response](docs/imgs/jt_annotate_queries.png)

### **`jt {err, out} JOB_ID`**
### __`jt {err, out} JOB_ID`__

__Looking up stderr and stdout of a Job__

Executing `jt out 227720_2` reveals the `stdout` output of the corresponding Job:

![jt out JOB_ID Terminal Response](docs/imgs/jt_out_job_id.png)
Similarly, `jt err 227720_2` reveals the `stderr` logs.

### **`jt sh JOB_ID`**
### __`jt sh JOB_ID`__

__Looking up SBATCH script for a Job__

The submitit tool internally creates an SBATCH shell script per experiment to launch the jobs on a SLURM cluster. This command outputs this `submission.sh` file for inspection.
Expand All @@ -102,23 +117,27 @@ Executing `jt sh 227720_2` reveals the following:
![jt out JOB_ID Terminal Response](docs/imgs/jt_sh_job_id.png)

### **`jt ls`**

Finally, user can use `jt ls` to simply list the experiments maintained by the `submititnow` tool.

<img src="docs/imgs/jt_ls.png" width=30%>
![jt_ls](docs/imgs/jt_ls.png)

The experiment names output by this command can then be passed into the `jt jobs` command.

## __Installing__

Python 3.8+ is required.

```bash
pip install -U git+https://github.com/maharshi95/submititnow.git
```

## **Experiment API**

Sometimes the `slaunch` command-line tool is not enough. For example, one may want to launch a job with customized parameter-sweep configurations, or vary a certain parameter (e.g. `output_filepath`) for each job in the launch. In such cases, one can use the Experiment API provided by `submititnow` to launch jobs from Python scripts and also get the benefits of being able to track them with `jt`.

[examples/launch_demo_script.py](examples/launch_demo_script.py) provides a demo of how to use the `Experiment` API to launch a job with customized parameter-sweep configurations.

```bash
python examples/launch_demo_script.py
```
Empty file modified bin/jt
100644 → 100755
Empty file.
46 changes: 46 additions & 0 deletions bin/py-srun
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-

import json
import subprocess
import argparse

from submititnow.umiacs.handlers import profile_handlers

parser = argparse.ArgumentParser()
parser.add_argument("config", type=str)
parser.add_argument("shell", nargs="+", default="zsh")
args = parser.parse_args()


def removeprefix(var: str, prefix: str):
return var[len(prefix) :] if var.startswith(prefix) else var


def load_config(config_filename: str):
with open(config_filename) as f:
config = json.load(f)
if "profile" in config:
profile = config.pop("profile")
config = profile_handlers[profile](config)

return {
removeprefix(key, "slurm_").replace("_", "-"): value
for key, value in config.items()
}


cmd_args = load_config(args.config)


# Make Bash command
cmd = "srun"
for key, value in cmd_args.items():
cmd += f" --{key}={value}"
cmd += " --job-name=llms"
shell_cmd = " ".join(args.shell)
cmd += f" --pty {shell_cmd}"

print(cmd)

subprocess.run(cmd, shell=True)
6 changes: 4 additions & 2 deletions bin/slaunch
Original file line number Diff line number Diff line change
Expand Up @@ -137,8 +137,10 @@ if __name__ == "__main__":
job_desc_function=job_description_function,
submititnow_dir=args.submititnow_dir,
)
experiment.register_profile_handler("clip", handlers.clip_profile_handler)
experiment.register_profile_handler("scavenger", handlers.scavenger_profile_handler)
for name, handler in handlers.profile_handlers.items():
experiment.register_profile_handler(name, handler)



slurm_params = options.get_slurm_params(args)

Expand Down
4 changes: 0 additions & 4 deletions examples/.config.json

This file was deleted.

5 changes: 5 additions & 0 deletions examples/configs/gpu.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"profile": "scavenger",
"gres": "gpu:rtxa4000:1",
"mem": "16G"
}
5 changes: 5 additions & 0 deletions examples/configs/sample_config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"profile": "clip",
"gres": "gpu:1",
"mem": "4G"
}
3 changes: 2 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

setuptools.setup(
name="submititnow",
version="0.9.3",
version="0.9.4",
author="Maharshi Gor",
author_email="[email protected]",
description="A package to make submitit easier to use",
Expand All @@ -27,6 +27,7 @@
"rich-cli>=1.8.0",
"rich>=12.6.0",
"tqdm>=4.0.0",
"scandir>=1.10.0",
],
python_requires=">=3.8",
)
10 changes: 7 additions & 3 deletions submititnow/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,12 @@


def show_file_content(filepath: str):
rich_print("[bold bright_yellow]Reading file:[/bold bright_yellow] [bold cyan]{}[/bold cyan]\n".format(filepath))
with open(filepath, "r", newline='') as fp:
rich_print(
"[bold bright_yellow]Reading file:[/bold bright_yellow] [bold cyan]{}[/bold cyan]\n".format(
filepath
)
)
with open(filepath, "r", newline="") as fp:
text = fp.read()
for line in text.split("\n"):
line_buffer = io.StringIO()
Expand Down Expand Up @@ -81,7 +85,7 @@ def _display_job_submission_status_on_console(exp: Experiment, wait_until: str):
rich_print(f"\t:ledger: "
f"Submitit logs : {exp.logs_dir}\n")

rich_print(f"[bold yellow] Execute the following command to monitor the jobs:[/bold yellow]\n")
rich_print("[bold yellow] Execute the following command to monitor the jobs:[/bold yellow]\n")
rich_print(f"\t[bold bright_white]jt jobs {exp.exp_name} {exp.exp_id}[/bold bright_white]\n")
# fmt: on

Expand Down
2 changes: 0 additions & 2 deletions submititnow/experiment_lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ def __init__(
job_desc_function: Optional[Callable] = None,
submititnow_dir: Optional[str] = None,
):

self.submititnow_dir = (
Path(submititnow_dir) if submititnow_dir else utils.SUBMITITNOW_ROOT_DIR
)
Expand Down Expand Up @@ -90,7 +89,6 @@ def launch(
)

if slurm_profile := slurm_params.get("slurm_profile"):

del slurm_params["slurm_profile"]

if slurm_profile in self.profile_handlers:
Expand Down
6 changes: 3 additions & 3 deletions submititnow/jt/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from dataclasses import dataclass
from typing import Optional, Dict
import pandas as pd

import scandir

__FALLBACK_SUBMITITNOW_DIR = "~/.submititnow"

Expand All @@ -23,7 +23,7 @@ def get_running_job_ids():
def list_files(path):
files = []
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
for r, d, f in scandir.walk(str(path)):
for file in f:
yield os.path.join(r, file)

Expand Down Expand Up @@ -79,7 +79,7 @@ def load_job_states(job_id):
with open(err_filepath) as fp:
err_lines = list(
filter(
lambda l: l.startswith("srun: ") or l.startswith("slurmstepd: "),
lambda l: l.startswith("srun: ") or "slurmstepd: " in l,
fp.readlines(),
)
)
Expand Down
19 changes: 17 additions & 2 deletions submititnow/options.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,15 @@


class SlurmAdditionalArgAction(argparse.Action):
"""This class is used to parse additional arguments for SLURM.
Example:
The CLI SLURM argument `--nodelist` is part of the `slurm_additional_parameters`
dict for submitit. This ArgAction class is used to parse the `--nodelist`
argument and add it to the `slurm_additional_parameters` dict, which is the
destination variable name.
"""
def __init__(self, check_func, *args, **kwargs):
"""
argparse custom action.
Expand Down Expand Up @@ -116,13 +125,19 @@ def add_submititnow_arguments(parser: argparse.ArgumentParser):
return parser


def load_slurm_config(config_filename: str) -> Dict[str, Any]:
with open(config_filename, "r") as f:
config = json.load(f)
return {f"slurm_{k.replace('-', '_')}": v for k, v in config.items()}

def get_slurm_params(args: argparse.Namespace) -> Dict[str, Any]:

# Grabs all SLURM arguments from the parser that are explicitly set to a value
slurm_args = {
k: v for k, v in vars(args).items() if k.startswith("slurm_") and v is not None
}
if slurm_args.get("slurm_config") is not None:
config_filename = slurm_args.pop("slurm_config")
with open(config_filename, "r") as f:
default_args = json.load(f)
default_args = load_slurm_config(config_filename)
slurm_args = {**default_args, **slurm_args}
return slurm_args
30 changes: 30 additions & 0 deletions submititnow/umiacs/handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,33 @@ def scavenger_profile_handler(slurm_params: Dict[str, Any]):
"slurm_partition": "scavenger",
"slurm_qos": "scavenger",
}

def cml_zhou_profile_handler(slurm_params: Dict[str, Any]):
return {
**slurm_params,
"slurm_account": "cml-zhou",
"slurm_partition": "cml-dpart",
}

def cml_profile_handler(slurm_params: Dict[str, Any]):
return {
**slurm_params,
"slurm_account": "cml",
"slurm_partition": "cml-dpart",
}

def cml_scavenger_profile_handler(slurm_params: Dict[str, Any]):
return {
**slurm_params,
"slurm_account": "cml-scavenger",
"slurm_partition": "cml-scavenger",
"slurm_qos": "cml-scavenger",
}

profile_handlers = {
"clip": clip_profile_handler,
"scavenger": scavenger_profile_handler,
"cml": cml_profile_handler,
"cml-zhou": cml_zhou_profile_handler,
"cml-scavenger": cml_scavenger_profile_handler,
}

0 comments on commit 7b1a296

Please sign in to comment.