mlcommons · priyakasimbeg · Nov 3, 2023 · Aug 10, 2023 · Aug 11, 2023 · Aug 11, 2023
@@ -1,52 +1,66 @@
 # Getting Started
 
-Table of Contents:
-- [Set up  and installation](#set-up-and-installation)
+- [Set up and installation](#set-up-and-installation)
 - [Download the data](#download-the-data)
 - [Develop your submission](#develop-your-submission)
+  - [Set up your directory structure (Optional)](#set-up-your-directory-structure-optional)
+  - [Coding your submission](#coding-your-submission)
 - [Run your submission](#run-your-submission)
-    - [Docker](#run-your-submission-in-a-docker-container)
+  - [Pytorch DDP](#pytorch-ddp)
+  - [Run your submission in a Docker container](#run-your-submission-in-a-docker-container)
+    - [Docker Tips](#docker-tips)
 - [Score your submission](#score-your-submission)
+- [Good Luck](#good-luck)
 
 ## Set up and installation
+
 To get started you will have to make a few decisions and install the repository along with its dependencies. Specifically:
+
 1. Decide if you would like to develop your submission in either Pytorch or Jax.
-2. Set up your workstation or VM. We recommend to use a setup similar to the [benchmarking hardware](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#benchmarking-hardware). 
+2. Set up your workstation or VM. We recommend to use a setup similar to the [benchmarking hardware](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#benchmarking-hardware).
 The specs on the benchmarking machines are:
     - 8 V100 GPUs
     - 240 GB in RAM
-    - 2 TB in storage (for datasets). 
+    - 2 TB in storage (for datasets).
+
 3. Install the algorithmic package and dependencies, see [Installation](./README.md#installation).
 
 ## Download the data
-The workloads in this benchmark use 6 different datasets across 8 workloads. You may choose to download some or all of the datasets as you are developing your submission, but your submission will be scored across all 8 workloads. For instructions on obtaining and setting up the datasets see [datasets/README](https://github.com/mlcommons/algorithmic-efficiency/blob/main/datasets/README.md#dataset-setup).
 
+The workloads in this benchmark use 6 different datasets across 8 workloads. You may choose to download some or all of the datasets as you are developing your submission, but your submission will be scored across all 8 workloads. For instructions on obtaining and setting up the datasets see [datasets/README](https://github.com/mlcommons/algorithmic-efficiency/blob/main/datasets/README.md#dataset-setup).
 
 ## Develop your submission
+
 To develop a submission you will write a python module containing your optimizer algorithm. Your optimizer must implement a set of predefined API methods for the initialization and update steps.
 
 ### Set up your directory structure (Optional)
+
 Make a submissions subdirectory to store your submission modules e.g. `algorithmic-effiency/submissions/my_submissions`.
 
 ### Coding your submission
+
 You can find examples of sumbission modules under `algorithmic-efficiency/baselines` and `algorithmic-efficiency/reference_algorithms`. \
 A submission for the external ruleset will consist of a submission module and a tuning search space definition.
+
 1. Copy the template submission module `submissions/template/submission.py` into your submissions directory e.g. in `algorithmic-efficiency/my_submissions`.
 2. Implement at least the methods in the template submission module. Feel free to use helper functions and/or modules as you see fit. Make sure you adhere to to the competition rules. Check out the guidelines for [allowed submissions](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#disallowed-submissions), [disallowed submissions](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#disallowed-submissions) and pay special attention to the [software dependencies rule](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#software-dependencies).
 3. Add a tuning configuration e.g. `tuning_search_space.json` file to your submission directory. For the tuning search space you can either:
     1. Define the set of feasible points by defining a value for "feasible_points" for the hyperparameters:
-    ```
+
+    ```JSON
     {
         "learning_rate": {
             "feasible_points": 0.999
             },
     }
     ```
+
     For a complete example see [tuning_search_space.json](https://github.com/mlcommons/algorithmic-efficiency/blob/main/reference_algorithms/target_setting_algorithms/imagenet_resnet/tuning_search_space.json).
 
-    2. Define a range of values for quasirandom sampling by specifing a `min`, `max` and `scaling` 
+    2. Define a range of values for quasirandom sampling by specifing a `min`, `max` and `scaling`
     keys for the hyperparameter:
-    ```
+
+    ```JSON
     {
         "weight_decay": {
             "min": 5e-3, 
@@ -55,14 +69,15 @@ A submission for the external ruleset will consist of a submission module and a
             }
     }
     ```
-    For a complete example see [tuning_search_space.json](https://github.com/mlcommons/algorithmic-efficiency/blob/main/baselines/nadamw/tuning_search_space.json). 
 
+    For a complete example see [tuning_search_space.json](https://github.com/mlcommons/algorithmic-efficiency/blob/main/baselines/nadamw/tuning_search_space.json).
 
 ## Run your submission
 
 From your virtual environment or interactively running Docker container run your submission with `submission_runner.py`:  
 
-**JAX**: to score your submission on a workload, from the algorithmic-efficency directory run: 
+**JAX**: to score your submission on a workload, from the algorithmic-efficency directory run:
+
 ```bash
 python3 submission_runner.py \
     --framework=jax \
@@ -73,7 +88,8 @@ python3 submission_runner.py \
     --tuning_search_space=<path_to_tuning_search_space>
 ```
 
-**Pytorch**: to score your submission on a workload, from the algorithmic-efficency directory run: 
+**Pytorch**: to score your submission on a workload, from the algorithmic-efficency directory run:
+
 ```bash
 python3 submission_runner.py \
     --framework=pytorch \
@@ -84,14 +100,18 @@ python3 submission_runner.py \
     --tuning_search_space=<path_to_tuning_search_space>
 ```
 
-#### Pytorch DDP
-We recommend using PyTorch's [Distributed Data Parallel (DDP)](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) 
-when using multiple GPUs on a single node. You can initialize ddp with torchrun. 
+### Pytorch DDP
+
+We recommend using PyTorch's [Distributed Data Parallel (DDP)](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html)
+when using multiple GPUs on a single node. You can initialize ddp with torchrun.
 For example, on single host with 8 GPUs simply replace `python3` in the above command by:
+
 ```bash
 torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 --standalone --nnodes=1 --nproc_per_node=N_GPUS
 ```
+
 So the complete command is:
+
 ```bash
 torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 \
     --standalone \
@@ -109,17 +129,18 @@ torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 \
 ### Run your submission in a Docker container
 
 The container entrypoint script provides the following flags:
+
 - `--dataset` dataset: can be 'imagenet', 'fastmri', 'librispeech', 'criteo1tb', 'wmt', or 'ogbg'. Setting this flag will download data if `~/data/<dataset>` does not exist on the host machine. Required for running a submission.
 - `--framework` framework: can be either 'pytorch' or 'jax'. If you just want to download data, this flag is required for `-d imagenet` since we have two versions of data for imagenet. This flag is also required for running a submission.
-- `--submission_path` submission_path: path to submission file on container filesystem. If this flag is set, the container will run a submission, so it is required for running a submission. 
+- `--submission_path` submission_path: path to submission file on container filesystem. If this flag is set, the container will run a submission, so it is required for running a submission.
 - `--tuning_search_space` tuning_search_space: path to file containing tuning search space on container filesystem. Required for running a submission.
 - `--experiment_name` experiment_name: name of experiment. Required for running a submission.
 - `--workload` workload: can be 'imagenet_resnet', 'imagenet_jax', 'librispeech_deepspeech', 'librispeech_conformer', 'ogbg', 'wmt', 'fastmri' or 'criteo1tb'. Required for running a submission.
 - `--max_global_steps` max_global_steps: maximum number of steps to run the workload for. Optional.
 - `--keep_container_alive` : can be true or false. If`true` the container will not be killed automatically. This is useful for developing or debugging.
 
-
 To run the docker container that will run the submission runner run:
+
 ```bash
 docker run -t -d \
 -v $HOME/data/:/data/ \
@@ -136,32 +157,37 @@ docker run -t -d \
 --workload <workload> \
 --keep_container_alive <keep_container_alive>
 ```
+
 This will print the container ID to the terminal.
 
-#### Docker Tips ####
+#### Docker Tips
 
 To find the container IDs of running containers
-```
+
+```bash
 docker ps 
 ```
 
 To see output of the entrypoint script
-```
+
+```bash
 docker logs <container_id> 
 ```
 
 To enter a bash session in the container
-```
+
+```bash
 docker exec -it <container_id> /bin/bash
 ```
 
-## Score your submission 
+## Score your submission
+
 To produce performance profile and performance table:
+
 ```bash
 python3 scoring/score_submission.py --experiment_path=<path_to_experiment_dir> --output_dir=<output_dir>
 ```
 
-We provide the scores and performance profiles for the baseline algorithms in the "Baseline Results" section in [Benchmarking Neural Network Training Algorithms](https://arxiv.org/abs/2306.07179). 
-
+We provide the scores and performance profiles for the baseline algorithms in the "Baseline Results" section in [Benchmarking Neural Network Training Algorithms](https://arxiv.org/abs/2306.07179).
 
-## Good Luck!
+## Good Luck
@@ -22,20 +22,38 @@
 
 [MLCommons Algorithmic Efficiency](https://mlcommons.org/en/groups/research-algorithms/) is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models. This repository holds the [competition rules](RULES.md) and the benchmark code to run it. For a detailed description of the benchmark design, see our [paper](https://arxiv.org/abs/2306.07179).
 
-# Table of Contents
+## Table of Contents
+
+- [Table of Contents](#table-of-contents)
 - [Installation](#installation)
-   - [Python Virtual Environment](#python-virtual-environment)
-   - [Docker](#docker)
+  - [Python virtual environment](#python-virtual-environment)
+  - [Docker](#docker)
+    - [Building Docker Image](#building-docker-image)
+    - [Running Docker Container (Interactive)](#running-docker-container-interactive)
+    - [Running Docker Container (End-to-end)](#running-docker-container-end-to-end)
+  - [Using Singularity/Apptainer instead of Docker](#using-singularityapptainer-instead-of-docker)
 - [Getting Started](#getting-started)
+  - [Running a workload](#running-a-workload)
+    - [JAX](#jax)
+    - [Pytorch](#pytorch)
 - [Rules](#rules)
 - [Contributing](#contributing)
-- [Diclaimers](#disclaimers)
-- [FAQS](#faqs)
-- [Citing AlgoPerf Benchmark](#citing-algoperf-benchmark)
+- [Shared data pipelines between JAX and PyTorch](#shared-data-pipelines-between-jax-and-pytorch)
+- [Setup and Platform](#setup-and-platform)
+  - [My machine only has one GPU. How can I use this repo?](#my-machine-only-has-one-gpu-how-can-i-use-this-repo)
+  - [How do I run this on my SLURM cluster?](#how-do-i-run-this-on-my-slurm-cluster)
+  - [How can I run this on my AWS/GCP/Azure cloud project?](#how-can-i-run-this-on-my-awsgcpazure-cloud-project)
+- [Submissions](#submissions)
+  - [Can submission be structured using multiple files?](#can-submission-be-structured-using-multiple-files)
+  - [Can I install custom dependencies?](#can-i-install-custom-dependencies)
+  - [How can I know if my code can be run on benchmarking hardware?](#how-can-i-know-if-my-code-can-be-run-on-benchmarking-hardware)
+  - [Are we allowed to use our own hardware to self-report the results?](#are-we-allowed-to-use-our-own-hardware-to-self-report-the-results)
+
 
 
 
 ## Installation
+
 You can install this package and dependences in a [python virtual environment](#virtual-environment) or use a [Docker/Singularity/Apptainer container](#install-in-docker) (recommended).
 
   *TL;DR to install the Jax version for GPU run:*
@@ -53,10 +71,13 @@ You can install this package and dependences in a [python virtual environment](#
    pip3 install -e '.[pytorch_gpu]' -f 'https://download.pytorch.org/whl/torch_stable.html'
    pip3 install -e '.[full]'
    ```
-##  Python virtual environment
+
+### Python virtual environment
+
 Note: Python minimum requirement >= 3.8
 
 To set up a virtual enviornment and install this repository
+
 1. Create new environment, e.g. via `conda` or `virtualenv`
 
    ```bash
@@ -89,35 +110,43 @@ or all workloads at once via
 ```bash
 pip3 install -e '.[full]'
 ```
+
 </details>
 
-## Docker
+### Docker
+
 We recommend using a Docker container to ensure a similar environment to our scoring and testing environments.
 Alternatively, a Singularity/Apptainer container can also be used (see instructions below).
 
+We recommend using a Docker container to ensure a similar environment to our scoring and testing environments.
 
-**Prerequisites for NVIDIA GPU set up**: You may have to install the NVIDIA Container Toolkit so that the containers can locate the NVIDIA drivers and GPUs. 
+**Prerequisites for NVIDIA GPU set up**: You may have to install the NVIDIA Container Toolkit so that the containers can locate the NVIDIA drivers and GPUs.
 See instructions [here](https://github.com/NVIDIA/nvidia-docker).
 
-### Building Docker Image
+#### Building Docker Image
+
 1. Clone this repository
 
    ```bash
    cd ~ && git clone https://github.com/mlcommons/algorithmic-efficiency.git
    ```
 
 2. Build Docker Image
+
    ```bash
    cd algorithmic-efficiency/docker
    docker build -t <docker_image_name> . --build-arg framework=<framework>
    ```
+
    The `framework` flag can be either `pytorch`, `jax` or `both`. Specifying the framework will install the framework specific dependencies.
    The `docker_image_name` is arbitrary.
 
+#### Running Docker Container (Interactive)
 
-### Running Docker Container (Interactive)
 To use the Docker container as an interactive virtual environment, you can run a container mounted to your local data and code directories and execute the `bash` program. This may be useful if you are in the process of developing a submission.
-1. Run detached Docker Container. The `container_id` will be printed if the container is running successfully.
+
+1. Run detached Docker Container. The container_id will be printed if the container is run successfully.
+
    ```bash
    docker run -t -d \
       -v $HOME/data/:/data/ \
@@ -142,7 +171,8 @@ To use the Docker container as an interactive virtual environment, you can run a
    docker exec -it <container_id> /bin/bash
    ```
 
-### Running Docker Container (End-to-end)
+#### Running Docker Container (End-to-end)
+
 To run a submission end-to-end in a containerized environment see [Getting Started Document](./getting_started.md#run-your-submission-in-a-docker-container).
 
 ### Using Singularity/Apptainer instead of Docker
@@ -164,14 +194,17 @@ singularity shell --nv <singularity_image_name>.sif
 ```
 Similarly to Docker, Apptainer allows you to bind specific paths on the host system and the container by specifying the `--bind` flag, as explained [here](https://docs.sylabs.io/guides/3.7/user-guide/bind_paths_and_mounts.html).
 
-# Getting Started
+## Getting Started
+
 For instructions on developing and scoring your own algorithm in the benchmark see [Getting Started Document](./getting_started.md).
-## Running a workload
+
+### Running a workload
+
 To run a submission directly by running a Docker container, see [Getting Started Document](./getting_started.md#run-your-submission-in-a-docker-container).
 
 From your virtual environment or interactively running Docker container run:
 
-**JAX**
+#### JAX
 
 ```bash
 python3 submission_runner.py \
@@ -183,7 +216,7 @@ python3 submission_runner.py \
     --tuning_search_space=baselines/adamw/tuning_search_space.json
 ```
 
-**Pytorch**
+#### Pytorch
 
 ```bash
 python3 submission_runner.py \
@@ -194,6 +227,7 @@ python3 submission_runner.py \
     --submission_path=baselines/adamw/jax/submission.py \
     --tuning_search_space=baselines/adamw/tuning_search_space.json
 ```
+
 <details>
 <summary>
 Using Pytorch DDP (Recommended)
@@ -207,12 +241,14 @@ torchrun --standalone --nnodes=1 --nproc_per_node=N_GPUS
 ```
 
 where `N_GPUS` is the number of available GPUs on the node. To only see output from the first process, you can run the following to redirect the output from processes 1-7 to a log file:
+
 ```bash
 torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 --standalone --nnodes=1 --nproc_per_node=8
  ```
 
 So the complete command is for example:
-```
+
+```bash
 torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 --standalone --nnodes=1 --nproc_per_node=8 \
 submission_runner.py \
     --framework=pytorch \
@@ -222,13 +258,15 @@ submission_runner.py \
     --submission_path=baselines/adamw/jax/submission.py \
     --tuning_search_space=baselines/adamw/tuning_search_space.json
 ```
+
 </details>
 
+## Rules
 
-# Rules
 The rules for the MLCommons Algorithmic Efficency benchmark can be found in the seperate [rules document](RULES.md). Suggestions, clarifications and questions can be raised via pull requests.
 
-# Contributing
+## Contributing
+
 If you are interested in contributing to the work of the working group, feel free to [join the weekly meetings](https://mlcommons.org/en/groups/research-algorithms/), open issues. See our [CONTRIBUTING.md](CONTRIBUTING.md) for MLCommons contributing guidelines and setup and workflow instructions.