Upgraded to version 0.9.4

maharshi95 · Dec 4, 2023 · 7b1a296 · 7b1a296
1 parent 59b7546
commit 7b1a296
Show file tree

Hide file tree

Showing 13 changed files with 148 additions and 27 deletions.
diff --git a/README.md b/README.md
@@ -5,11 +5,10 @@
 &nbsp;[![Supported Python Versions](https://img.shields.io/badge/python-3.8+-blue)](https://pypi.org/project/rich/)
 &nbsp;[![Twitter Follow](https://img.shields.io/twitter/follow/maharshigor.svg?style=social)](https://twitter.com/maharshigor)
 
-
 A _makeshift_ toolkit, built on top of [submitit](https://github.com/facebookincubator/submitit), to launch SLURM jobs over a range of hyperparameters from the command line. It is designed to be used with existing Python scripts and interactively monitor their status.
 
-
 __`submititnow` provides two command-line tools:__
+
 * `slaunch` to launch a Python script as SLURM job(s).
 * `jt` (job-tracker) to interactively monitor the jobs.
 
@@ -23,27 +22,39 @@ Let's say you have a Python script [`examples/annotate_queries.py`](examples/ann
 python examples/annotate_queries.py --model='BERT-LARGE-uncased' \
     --dataset='NaturalQuestions' --fold='dev'
 ```
+
 You can launch a job that runs this script over a SLURM cluster using the following:
+
+```bash
+slaunch examples/annotate_queries.py \
+    --mem="16g" --gres="gpu:rtxa4000:1" \
+    --model='BERT-LARGE-uncased' --dataset='NaturalQuestions' --fold='dev'
+```
+
+You can put all the slurm params in a config file and pass it to `slaunch` using `--slurm_config` flag. For example, the above command can be written as:
+
 ```bash
 slaunch examples/annotate_queries.py \
-    --slurm_mem="16g" --slurm_gres="gpu:rtxa4000:1" \
+    --config="examples/configs/gpu.json" \
     --model='BERT-LARGE-uncased' --dataset='NaturalQuestions' --fold='dev'
 ```
 
 ### __Launching multiple jobs with parameter-sweep__
 
 ```bash
 slaunch examples/annotate_queries.py \
-    --slurm_mem="16g" --slurm_gres="gpu:rtxa4000:1" \
+    --config="examples/configs/gpu.json" \
     --sweep fold model \
     --model 'BERT-LARGE-uncased' 'Roberta-uncased' 'T5-cased-small' \
     --dataset='NaturalQuestions' --fold 'dev' 'train'
 ```
+
 This will launch a total of 6 jobs with the following configuration:
 
 ![Slaunch Terminal Response](docs/imgs/slaunch_annotate_queries.png)
 
 ### __Any constraints on the target Python script that we launch?__
+
 The target Python script must have the following format:
 
 ```python
@@ -68,31 +79,35 @@ if __name__ == '__main__':
 
 ```
 
-## **`jt`** : &nbsp; Looking up info on previously launched experiments:
+## __`jt`__ : &nbsp; Looking up info on previously launched experiments:
 
 As instructed in the above screenshot of the Launch response, user can utilize the `jt` (short for `job-tracker`) command to monitor the job progress.
 
-### **`jt jobs EXP_NAME [EXP_ID]`**
+### __`jt jobs EXP_NAME [EXP_ID]`__
 
 Executing `jt jobs examples.annotate_queries 227720` will give the following response:
 
 ![jt jobs EXP_NAME EXP_ID Terminal Response](docs/imgs/jt_annotate_queries_expid.png)
 
 In fact, user can also lookup all `examples.annotate_queries` jobs simply by removing `[EXP_ID]` from the previous command:
-```
+
+```bash
 jt jobs examples.annotate_queries
 ```
+
 ![jt jobs EXP_NAME Terminal Response](docs/imgs/jt_annotate_queries.png)
 
-### **`jt {err, out} JOB_ID`**
+### __`jt {err, out} JOB_ID`__
+
 __Looking up stderr and stdout of a Job__
 
 Executing `jt out 227720_2` reveals the `stdout` output of the corresponding Job:
 
 ![jt out JOB_ID Terminal Response](docs/imgs/jt_out_job_id.png)
 Similarly, `jt err 227720_2` reveals the `stderr` logs.
 
-### **`jt sh JOB_ID`**
+### __`jt sh JOB_ID`__
+
 __Looking up SBATCH script for a Job__
 
 The submitit tool internally creates an SBATCH shell script per experiment to launch the jobs on a SLURM cluster. This command outputs this `submission.sh` file for inspection.
@@ -102,23 +117,27 @@ Executing `jt sh 227720_2` reveals the following:
 ![jt out JOB_ID Terminal Response](docs/imgs/jt_sh_job_id.png)
 
 ### **`jt ls`**
+
 Finally, user can use `jt ls` to simply list the experiments maintained by the `submititnow` tool.
 
-<img src="docs/imgs/jt_ls.png"  width=30%>
+![jt_ls](docs/imgs/jt_ls.png)
 
 The experiment names output by this command can then be passed into the `jt jobs` command.
 
 ## __Installing__
+
 Python 3.8+ is required.
 
 ```bash
 pip install -U git+https://github.com/maharshi95/submititnow.git
 ```
 
 ## **Experiment API**
+
 Sometimes the `slaunch` command-line tool is not enough. For example, one may want to launch a job with customized parameter-sweep configurations, or vary a certain parameter (e.g. `output_filepath`) for each job in the launch. In such cases, one can use the Experiment API provided by `submititnow` to launch jobs from Python scripts and also get the benefits of being able to track them with `jt`.
 
 [examples/launch_demo_script.py](examples/launch_demo_script.py) provides a demo of how to use the `Experiment` API to launch a job with customized parameter-sweep configurations.
+
 ```bash
 python examples/launch_demo_script.py
 ```
diff --git a/bin/jt b/bin/jt
diff --git a/bin/py-srun b/bin/py-srun
@@ -0,0 +1,46 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+import json
+import subprocess
+import argparse
+
+from submititnow.umiacs.handlers import profile_handlers
+
+parser = argparse.ArgumentParser()
+parser.add_argument("config", type=str)
+parser.add_argument("shell", nargs="+", default="zsh")
+args = parser.parse_args()
+
+
+def removeprefix(var: str, prefix: str):
+    return var[len(prefix) :] if var.startswith(prefix) else var
+
+
+def load_config(config_filename: str):
+    with open(config_filename) as f:
+        config = json.load(f)
+    if "profile" in config:
+        profile = config.pop("profile")
+        config = profile_handlers[profile](config)
+
+    return {
+        removeprefix(key, "slurm_").replace("_", "-"): value
+        for key, value in config.items()
+    }
+
+
+cmd_args = load_config(args.config)
+
+
+# Make Bash command
+cmd = "srun"
+for key, value in cmd_args.items():
+    cmd += f" --{key}={value}"
+cmd += " --job-name=llms"
+shell_cmd = " ".join(args.shell)
+cmd += f" --pty {shell_cmd}"
+
+print(cmd)
+
+subprocess.run(cmd, shell=True)
diff --git a/bin/slaunch b/bin/slaunch
@@ -137,8 +137,10 @@ if __name__ == "__main__":
         job_desc_function=job_description_function,
         submititnow_dir=args.submititnow_dir,
     )
-    experiment.register_profile_handler("clip", handlers.clip_profile_handler)
-    experiment.register_profile_handler("scavenger", handlers.scavenger_profile_handler)
+    for name, handler in handlers.profile_handlers.items():
+        experiment.register_profile_handler(name, handler)
+
+
 
     slurm_params = options.get_slurm_params(args)
 

diff --git a/examples/.config.json b/examples/.config.json
diff --git a/examples/configs/gpu.json b/examples/configs/gpu.json
@@ -0,0 +1,5 @@
+{
+    "profile": "scavenger",
+    "gres": "gpu:rtxa4000:1",
+    "mem": "16G"
+}
diff --git a/examples/configs/sample_config.json b/examples/configs/sample_config.json
@@ -0,0 +1,5 @@
+{
+    "profile": "clip",
+    "gres": "gpu:1",
+    "mem": "4G"
+}
diff --git a/setup.py b/setup.py
@@ -7,7 +7,7 @@
 
 setuptools.setup(
     name="submititnow",
-    version="0.9.3",
+    version="0.9.4",
     author="Maharshi Gor",
     author_email="[email protected]",
     description="A package to make submitit easier to use",
@@ -27,6 +27,7 @@
         "rich-cli>=1.8.0",
         "rich>=12.6.0",
         "tqdm>=4.0.0",
+        "scandir>=1.10.0",
     ],
     python_requires=">=3.8",
 )
diff --git a/submititnow/cli.py b/submititnow/cli.py
@@ -13,8 +13,12 @@
 
 
 def show_file_content(filepath: str):
-    rich_print("[bold bright_yellow]Reading file:[/bold bright_yellow] [bold cyan]{}[/bold cyan]\n".format(filepath))
-    with open(filepath, "r", newline='') as fp:
+    rich_print(
+        "[bold bright_yellow]Reading file:[/bold bright_yellow] [bold cyan]{}[/bold cyan]\n".format(
+            filepath
+        )
+    )
+    with open(filepath, "r", newline="") as fp:
         text = fp.read()
         for line in text.split("\n"):
             line_buffer = io.StringIO()
@@ -81,7 +85,7 @@ def _display_job_submission_status_on_console(exp: Experiment, wait_until: str):
     rich_print(f"\t:ledger: "
                f"Submitit logs      : {exp.logs_dir}\n")
 
-    rich_print(f"[bold yellow]  Execute the following command to monitor the jobs:[/bold yellow]\n")
+    rich_print("[bold yellow]  Execute the following command to monitor the jobs:[/bold yellow]\n")
     rich_print(f"\t[bold bright_white]jt jobs {exp.exp_name} {exp.exp_id}[/bold bright_white]\n")
     # fmt: on
 

diff --git a/submititnow/experiment_lib.py b/submititnow/experiment_lib.py
@@ -22,7 +22,6 @@ def __init__(
         job_desc_function: Optional[Callable] = None,
         submititnow_dir: Optional[str] = None,
     ):
-
         self.submititnow_dir = (
             Path(submititnow_dir) if submititnow_dir else utils.SUBMITITNOW_ROOT_DIR
         )
@@ -90,7 +89,6 @@ def launch(
             )
 
         if slurm_profile := slurm_params.get("slurm_profile"):
-
             del slurm_params["slurm_profile"]
 
             if slurm_profile in self.profile_handlers:

diff --git a/submititnow/jt/utils.py b/submititnow/jt/utils.py
@@ -3,7 +3,7 @@
 from dataclasses import dataclass
 from typing import Optional, Dict
 import pandas as pd
-
+import scandir
 
 __FALLBACK_SUBMITITNOW_DIR = "~/.submititnow"
 
@@ -23,7 +23,7 @@ def get_running_job_ids():
 def list_files(path):
     files = []
     # r=root, d=directories, f = files
-    for r, d, f in os.walk(path):
+    for r, d, f in scandir.walk(str(path)):
         for file in f:
             yield os.path.join(r, file)
 
@@ -79,7 +79,7 @@ def load_job_states(job_id):
     with open(err_filepath) as fp:
         err_lines = list(
             filter(
-                lambda l: l.startswith("srun: ") or l.startswith("slurmstepd: "),
+                lambda l: l.startswith("srun: ") or "slurmstepd: " in l,
                 fp.readlines(),
             )
         )

diff --git a/submititnow/options.py b/submititnow/options.py
@@ -4,6 +4,15 @@
 
 
 class SlurmAdditionalArgAction(argparse.Action):
+    """This class is used to parse additional arguments for SLURM.
+    
+    Example:
+        The CLI SLURM argument `--nodelist` is part of the `slurm_additional_parameters`
+        dict for submitit. This ArgAction class is used to parse the `--nodelist` 
+        argument and add it to the `slurm_additional_parameters` dict, which is the
+        destination variable name.
+    
+    """
     def __init__(self, check_func, *args, **kwargs):
         """
         argparse custom action.
@@ -116,13 +125,19 @@ def add_submititnow_arguments(parser: argparse.ArgumentParser):
     return parser
 
 
+def load_slurm_config(config_filename: str) -> Dict[str, Any]:
+    with open(config_filename, "r") as f:
+        config = json.load(f)
+    return {f"slurm_{k.replace('-', '_')}": v for k, v in config.items()}
+
 def get_slurm_params(args: argparse.Namespace) -> Dict[str, Any]:
+
+    # Grabs all SLURM arguments from the parser that are explicitly set to a value
     slurm_args = {
         k: v for k, v in vars(args).items() if k.startswith("slurm_") and v is not None
     }
     if slurm_args.get("slurm_config") is not None:
         config_filename = slurm_args.pop("slurm_config")
-        with open(config_filename, "r") as f:
-            default_args = json.load(f)
+        default_args = load_slurm_config(config_filename)
         slurm_args = {**default_args, **slurm_args}
     return slurm_args
diff --git a/submititnow/umiacs/handlers.py b/submititnow/umiacs/handlers.py
@@ -16,3 +16,33 @@ def scavenger_profile_handler(slurm_params: Dict[str, Any]):
         "slurm_partition": "scavenger",
         "slurm_qos": "scavenger",
     }
+
+def cml_zhou_profile_handler(slurm_params: Dict[str, Any]):
+    return {
+        **slurm_params,
+        "slurm_account": "cml-zhou",
+        "slurm_partition": "cml-dpart",
+    }
+
+def cml_profile_handler(slurm_params: Dict[str, Any]):
+    return {
+        **slurm_params,
+        "slurm_account": "cml",
+        "slurm_partition": "cml-dpart",
+    }
+
+def cml_scavenger_profile_handler(slurm_params: Dict[str, Any]):
+    return {
+        **slurm_params,
+        "slurm_account": "cml-scavenger",
+        "slurm_partition": "cml-scavenger",
+        "slurm_qos": "cml-scavenger",
+    }
+
+profile_handlers = {
+    "clip": clip_profile_handler,
+    "scavenger": scavenger_profile_handler,
+    "cml": cml_profile_handler,
+    "cml-zhou": cml_zhou_profile_handler,
+    "cml-scavenger": cml_scavenger_profile_handler,
+}