diff --git a/getting_started.md b/GETTING_STARTED.md similarity index 90% rename from getting_started.md rename to GETTING_STARTED.md index 96e58edab..8cab3959c 100644 --- a/getting_started.md +++ b/GETTING_STARTED.md @@ -1,52 +1,66 @@ # Getting Started -Table of Contents: -- [Set up and installation](#set-up-and-installation) +- [Set up and installation](#set-up-and-installation) - [Download the data](#download-the-data) - [Develop your submission](#develop-your-submission) + - [Set up your directory structure (Optional)](#set-up-your-directory-structure-optional) + - [Coding your submission](#coding-your-submission) - [Run your submission](#run-your-submission) - - [Docker](#run-your-submission-in-a-docker-container) + - [Pytorch DDP](#pytorch-ddp) + - [Run your submission in a Docker container](#run-your-submission-in-a-docker-container) + - [Docker Tips](#docker-tips) - [Score your submission](#score-your-submission) +- [Good Luck](#good-luck) ## Set up and installation + To get started you will have to make a few decisions and install the repository along with its dependencies. Specifically: + 1. Decide if you would like to develop your submission in either Pytorch or Jax. -2. Set up your workstation or VM. We recommend to use a setup similar to the [benchmarking hardware](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#benchmarking-hardware). +2. Set up your workstation or VM. We recommend to use a setup similar to the [benchmarking hardware](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#benchmarking-hardware). The specs on the benchmarking machines are: - 8 V100 GPUs - 240 GB in RAM - - 2 TB in storage (for datasets). + - 2 TB in storage (for datasets). + 3. Install the algorithmic package and dependencies, see [Installation](./README.md#installation). ## Download the data -The workloads in this benchmark use 6 different datasets across 8 workloads. You may choose to download some or all of the datasets as you are developing your submission, but your submission will be scored across all 8 workloads. For instructions on obtaining and setting up the datasets see [datasets/README](https://github.com/mlcommons/algorithmic-efficiency/blob/main/datasets/README.md#dataset-setup). +The workloads in this benchmark use 6 different datasets across 8 workloads. You may choose to download some or all of the datasets as you are developing your submission, but your submission will be scored across all 8 workloads. For instructions on obtaining and setting up the datasets see [datasets/README](https://github.com/mlcommons/algorithmic-efficiency/blob/main/datasets/README.md#dataset-setup). ## Develop your submission + To develop a submission you will write a python module containing your optimizer algorithm. Your optimizer must implement a set of predefined API methods for the initialization and update steps. ### Set up your directory structure (Optional) + Make a submissions subdirectory to store your submission modules e.g. `algorithmic-effiency/submissions/my_submissions`. ### Coding your submission + You can find examples of sumbission modules under `algorithmic-efficiency/baselines` and `algorithmic-efficiency/reference_algorithms`. \ A submission for the external ruleset will consist of a submission module and a tuning search space definition. + 1. Copy the template submission module `submissions/template/submission.py` into your submissions directory e.g. in `algorithmic-efficiency/my_submissions`. 2. Implement at least the methods in the template submission module. Feel free to use helper functions and/or modules as you see fit. Make sure you adhere to to the competition rules. Check out the guidelines for [allowed submissions](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#disallowed-submissions), [disallowed submissions](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#disallowed-submissions) and pay special attention to the [software dependencies rule](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#software-dependencies). 3. Add a tuning configuration e.g. `tuning_search_space.json` file to your submission directory. For the tuning search space you can either: 1. Define the set of feasible points by defining a value for "feasible_points" for the hyperparameters: - ``` + + ```JSON { "learning_rate": { "feasible_points": 0.999 }, } ``` + For a complete example see [tuning_search_space.json](https://github.com/mlcommons/algorithmic-efficiency/blob/main/reference_algorithms/target_setting_algorithms/imagenet_resnet/tuning_search_space.json). - 2. Define a range of values for quasirandom sampling by specifing a `min`, `max` and `scaling` + 2. Define a range of values for quasirandom sampling by specifing a `min`, `max` and `scaling` keys for the hyperparameter: - ``` + + ```JSON { "weight_decay": { "min": 5e-3, @@ -55,14 +69,15 @@ A submission for the external ruleset will consist of a submission module and a } } ``` - For a complete example see [tuning_search_space.json](https://github.com/mlcommons/algorithmic-efficiency/blob/main/baselines/nadamw/tuning_search_space.json). + For a complete example see [tuning_search_space.json](https://github.com/mlcommons/algorithmic-efficiency/blob/main/baselines/nadamw/tuning_search_space.json). ## Run your submission From your virtual environment or interactively running Docker container run your submission with `submission_runner.py`: -**JAX**: to score your submission on a workload, from the algorithmic-efficency directory run: +**JAX**: to score your submission on a workload, from the algorithmic-efficency directory run: + ```bash python3 submission_runner.py \ --framework=jax \ @@ -73,7 +88,8 @@ python3 submission_runner.py \ --tuning_search_space= ``` -**Pytorch**: to score your submission on a workload, from the algorithmic-efficency directory run: +**Pytorch**: to score your submission on a workload, from the algorithmic-efficency directory run: + ```bash python3 submission_runner.py \ --framework=pytorch \ @@ -84,14 +100,18 @@ python3 submission_runner.py \ --tuning_search_space= ``` -#### Pytorch DDP -We recommend using PyTorch's [Distributed Data Parallel (DDP)](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) -when using multiple GPUs on a single node. You can initialize ddp with torchrun. +### Pytorch DDP + +We recommend using PyTorch's [Distributed Data Parallel (DDP)](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) +when using multiple GPUs on a single node. You can initialize ddp with torchrun. For example, on single host with 8 GPUs simply replace `python3` in the above command by: + ```bash torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 --standalone --nnodes=1 --nproc_per_node=N_GPUS ``` + So the complete command is: + ```bash torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 \ --standalone \ @@ -109,17 +129,18 @@ torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 \ ### Run your submission in a Docker container The container entrypoint script provides the following flags: + - `--dataset` dataset: can be 'imagenet', 'fastmri', 'librispeech', 'criteo1tb', 'wmt', or 'ogbg'. Setting this flag will download data if `~/data/` does not exist on the host machine. Required for running a submission. - `--framework` framework: can be either 'pytorch' or 'jax'. If you just want to download data, this flag is required for `-d imagenet` since we have two versions of data for imagenet. This flag is also required for running a submission. -- `--submission_path` submission_path: path to submission file on container filesystem. If this flag is set, the container will run a submission, so it is required for running a submission. +- `--submission_path` submission_path: path to submission file on container filesystem. If this flag is set, the container will run a submission, so it is required for running a submission. - `--tuning_search_space` tuning_search_space: path to file containing tuning search space on container filesystem. Required for running a submission. - `--experiment_name` experiment_name: name of experiment. Required for running a submission. - `--workload` workload: can be 'imagenet_resnet', 'imagenet_jax', 'librispeech_deepspeech', 'librispeech_conformer', 'ogbg', 'wmt', 'fastmri' or 'criteo1tb'. Required for running a submission. - `--max_global_steps` max_global_steps: maximum number of steps to run the workload for. Optional. - `--keep_container_alive` : can be true or false. If`true` the container will not be killed automatically. This is useful for developing or debugging. - To run the docker container that will run the submission runner run: + ```bash docker run -t -d \ -v $HOME/data/:/data/ \ @@ -136,32 +157,37 @@ docker run -t -d \ --workload \ --keep_container_alive ``` + This will print the container ID to the terminal. -#### Docker Tips #### +#### Docker Tips To find the container IDs of running containers -``` + +```bash docker ps ``` To see output of the entrypoint script -``` + +```bash docker logs ``` To enter a bash session in the container -``` + +```bash docker exec -it /bin/bash ``` -## Score your submission +## Score your submission + To produce performance profile and performance table: + ```bash python3 scoring/score_submission.py --experiment_path= --output_dir= ``` -We provide the scores and performance profiles for the baseline algorithms in the "Baseline Results" section in [Benchmarking Neural Network Training Algorithms](https://arxiv.org/abs/2306.07179). - +We provide the scores and performance profiles for the baseline algorithms in the "Baseline Results" section in [Benchmarking Neural Network Training Algorithms](https://arxiv.org/abs/2306.07179). -## Good Luck! +## Good Luck diff --git a/README.md b/README.md index 6ffbab6f7..df3cbc37a 100644 --- a/README.md +++ b/README.md @@ -22,20 +22,38 @@ [MLCommons Algorithmic Efficiency](https://mlcommons.org/en/groups/research-algorithms/) is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models. This repository holds the [competition rules](RULES.md) and the benchmark code to run it. For a detailed description of the benchmark design, see our [paper](https://arxiv.org/abs/2306.07179). -# Table of Contents +## Table of Contents + +- [Table of Contents](#table-of-contents) - [Installation](#installation) - - [Python Virtual Environment](#python-virtual-environment) - - [Docker](#docker) + - [Python virtual environment](#python-virtual-environment) + - [Docker](#docker) + - [Building Docker Image](#building-docker-image) + - [Running Docker Container (Interactive)](#running-docker-container-interactive) + - [Running Docker Container (End-to-end)](#running-docker-container-end-to-end) + - [Using Singularity/Apptainer instead of Docker](#using-singularityapptainer-instead-of-docker) - [Getting Started](#getting-started) + - [Running a workload](#running-a-workload) + - [JAX](#jax) + - [Pytorch](#pytorch) - [Rules](#rules) - [Contributing](#contributing) -- [Diclaimers](#disclaimers) -- [FAQS](#faqs) -- [Citing AlgoPerf Benchmark](#citing-algoperf-benchmark) +- [Shared data pipelines between JAX and PyTorch](#shared-data-pipelines-between-jax-and-pytorch) +- [Setup and Platform](#setup-and-platform) + - [My machine only has one GPU. How can I use this repo?](#my-machine-only-has-one-gpu-how-can-i-use-this-repo) + - [How do I run this on my SLURM cluster?](#how-do-i-run-this-on-my-slurm-cluster) + - [How can I run this on my AWS/GCP/Azure cloud project?](#how-can-i-run-this-on-my-awsgcpazure-cloud-project) +- [Submissions](#submissions) + - [Can submission be structured using multiple files?](#can-submission-be-structured-using-multiple-files) + - [Can I install custom dependencies?](#can-i-install-custom-dependencies) + - [How can I know if my code can be run on benchmarking hardware?](#how-can-i-know-if-my-code-can-be-run-on-benchmarking-hardware) + - [Are we allowed to use our own hardware to self-report the results?](#are-we-allowed-to-use-our-own-hardware-to-self-report-the-results) + ## Installation + You can install this package and dependences in a [python virtual environment](#virtual-environment) or use a [Docker/Singularity/Apptainer container](#install-in-docker) (recommended). *TL;DR to install the Jax version for GPU run:* @@ -53,10 +71,13 @@ You can install this package and dependences in a [python virtual environment](# pip3 install -e '.[pytorch_gpu]' -f 'https://download.pytorch.org/whl/torch_stable.html' pip3 install -e '.[full]' ``` -## Python virtual environment + +### Python virtual environment + Note: Python minimum requirement >= 3.8 To set up a virtual enviornment and install this repository + 1. Create new environment, e.g. via `conda` or `virtualenv` ```bash @@ -89,17 +110,21 @@ or all workloads at once via ```bash pip3 install -e '.[full]' ``` + -## Docker +### Docker + We recommend using a Docker container to ensure a similar environment to our scoring and testing environments. Alternatively, a Singularity/Apptainer container can also be used (see instructions below). +We recommend using a Docker container to ensure a similar environment to our scoring and testing environments. -**Prerequisites for NVIDIA GPU set up**: You may have to install the NVIDIA Container Toolkit so that the containers can locate the NVIDIA drivers and GPUs. +**Prerequisites for NVIDIA GPU set up**: You may have to install the NVIDIA Container Toolkit so that the containers can locate the NVIDIA drivers and GPUs. See instructions [here](https://github.com/NVIDIA/nvidia-docker). -### Building Docker Image +#### Building Docker Image + 1. Clone this repository ```bash @@ -107,17 +132,21 @@ See instructions [here](https://github.com/NVIDIA/nvidia-docker). ``` 2. Build Docker Image + ```bash cd algorithmic-efficiency/docker docker build -t . --build-arg framework= ``` + The `framework` flag can be either `pytorch`, `jax` or `both`. Specifying the framework will install the framework specific dependencies. The `docker_image_name` is arbitrary. +#### Running Docker Container (Interactive) -### Running Docker Container (Interactive) To use the Docker container as an interactive virtual environment, you can run a container mounted to your local data and code directories and execute the `bash` program. This may be useful if you are in the process of developing a submission. -1. Run detached Docker Container. The `container_id` will be printed if the container is running successfully. + +1. Run detached Docker Container. The container_id will be printed if the container is run successfully. + ```bash docker run -t -d \ -v $HOME/data/:/data/ \ @@ -142,7 +171,8 @@ To use the Docker container as an interactive virtual environment, you can run a docker exec -it /bin/bash ``` -### Running Docker Container (End-to-end) +#### Running Docker Container (End-to-end) + To run a submission end-to-end in a containerized environment see [Getting Started Document](./getting_started.md#run-your-submission-in-a-docker-container). ### Using Singularity/Apptainer instead of Docker @@ -164,14 +194,17 @@ singularity shell --nv .sif ``` Similarly to Docker, Apptainer allows you to bind specific paths on the host system and the container by specifying the `--bind` flag, as explained [here](https://docs.sylabs.io/guides/3.7/user-guide/bind_paths_and_mounts.html). -# Getting Started +## Getting Started + For instructions on developing and scoring your own algorithm in the benchmark see [Getting Started Document](./getting_started.md). -## Running a workload + +### Running a workload + To run a submission directly by running a Docker container, see [Getting Started Document](./getting_started.md#run-your-submission-in-a-docker-container). From your virtual environment or interactively running Docker container run: -**JAX** +#### JAX ```bash python3 submission_runner.py \ @@ -183,7 +216,7 @@ python3 submission_runner.py \ --tuning_search_space=baselines/adamw/tuning_search_space.json ``` -**Pytorch** +#### Pytorch ```bash python3 submission_runner.py \ @@ -194,6 +227,7 @@ python3 submission_runner.py \ --submission_path=baselines/adamw/jax/submission.py \ --tuning_search_space=baselines/adamw/tuning_search_space.json ``` +
Using Pytorch DDP (Recommended) @@ -207,12 +241,14 @@ torchrun --standalone --nnodes=1 --nproc_per_node=N_GPUS ``` where `N_GPUS` is the number of available GPUs on the node. To only see output from the first process, you can run the following to redirect the output from processes 1-7 to a log file: + ```bash torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 --standalone --nnodes=1 --nproc_per_node=8 ``` So the complete command is for example: -``` + +```bash torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 --standalone --nnodes=1 --nproc_per_node=8 \ submission_runner.py \ --framework=pytorch \ @@ -222,13 +258,15 @@ submission_runner.py \ --submission_path=baselines/adamw/jax/submission.py \ --tuning_search_space=baselines/adamw/tuning_search_space.json ``` +
+## Rules -# Rules The rules for the MLCommons Algorithmic Efficency benchmark can be found in the seperate [rules document](RULES.md). Suggestions, clarifications and questions can be raised via pull requests. -# Contributing +## Contributing + If you are interested in contributing to the work of the working group, feel free to [join the weekly meetings](https://mlcommons.org/en/groups/research-algorithms/), open issues. See our [CONTRIBUTING.md](CONTRIBUTING.md) for MLCommons contributing guidelines and setup and workflow instructions. diff --git a/RULES.md b/RULES.md index 873cc1786..7225a76b0 100644 --- a/RULES.md +++ b/RULES.md @@ -1,6 +1,6 @@ # MLCommons™ AlgoPerf: Benchmark Rules -**Version:** 0.0.16 *(Last updated 28 April 2023)* +**Version:** 0.0.17 *(Last updated 10 August 2023)* > **TL;DR** New training algorithms and models can make neural net training faster. > We need a rigorous training time benchmark that measures time to result given a fixed hardware configuration and stimulates algorithmic progress. We propose a [Training Algorithm Track](#training-algorithm-track) and a [Model Track](#model-track) in order to help disentangle optimizer improvements and model architecture improvements. This two-track structure lets us enforce a requirement that new optimizers work well on multiple models and that new models aren't highly specific to particular training hacks. @@ -23,9 +23,6 @@ - [Defining target performance](#defining-target-performance) - [Benchmark score using performance profiles](#benchmark-score-using-performance-profiles) - [Benchmark Procedure](#benchmark-procedure) - - [Multiple Submission](#multiple-submission) - - [Licensing](#licensing) - - [Awards and prize money](#awards-and-prize-money) - [Model Track](#model-track) ## Introduction @@ -47,6 +44,8 @@ Submissions to the Training Algorithm Track can be entered under two separate ru The intention is that a training algorithm submission will be broadly applicable and useful without customization to the specific [workload](#workloads) (model, dataset, loss function). We want to discourage detecting the particular workload and doing something highly specific that isn't generally useful. In order to further discourage submissions that overfit to the particular [fixed benchmark workloads](#fixed-workloads), submissions will also be evaluated on [held-out workloads](#randomized-workloads) specified after the submission deadline. +For a description of how to submit a training algorithm to the AlgoPerf: Training Algorithms Benchmark, see the [Call for submissions](CALL_FOR_SUBMISSIONS.md), which details the entire competition process. + ### Submissions A valid submission is a piece of code that defines all of the submission functions and is able to train all benchmark workloads on the [benchmarking hardware](#benchmarking-hardware) (defined in the [Scoring](#scoring) section). Both the validation set and the test set performance will be checked regularly during training (see the [Evaluation during training](#evaluation-during-training) section). Training halts when the workload-specific [target errors](#defining-target-performance) for the validation and test sets have been reached. For each workload, the training time to reach the *test* set target error is used as input to the [scoring process](#scoring) for the submission. Submissions using [external tuning](#external-tuning-ruleset) will be tuned independently for each workload using a single workload-agnostic search space for their specified hyperparameters. The tuning trials are selected based on the time to reach the *validation* target, but only their training times to reach the *test* target will be used for scoring. Submissions under either tuning ruleset may always self-tune while on the clock. @@ -400,7 +399,7 @@ Our scoring procedure uses the held-out workloads only to penalize submissions t #### Qualification set -The qualification set is designed for submitters that may not have the compute resources to self-report on the full set of [fixed](#fixed-workloads) and [held-out workloads](#randomized-workloads). They may instead self-report numbers on this smaller qualification set. The best-performing submissions may then qualify for compute sponsorship offering a free evaluation on the full benchmark set and therefore the possibility to win [awards and prize money](#awards-and-prize-money). +The qualification set is designed for submitters that may not have the compute resources to self-report on the full set of [fixed](#fixed-workloads) and [held-out workloads](#randomized-workloads). They may instead self-report numbers on this smaller qualification set. The best-performing submissions may then qualify for compute sponsorship offering a free evaluation on the full benchmark set and therefore the possibility to win [awards and prize money](/SUBMISSION_PROCESS_RULES.md#awards-and-prize-money). The qualification set consists of the same [fixed workloads](#fixed-workloads) as mentioned above, except for both workloads on *ImageNet*, both workloads on *LibriSpeech*, and the *fastMRI* workload. The remaining three workloads (*WMT*, *Criteo 1TB*, and *OGBG*) form the qualification set. There are no [randomized workloads](#randomized-workloads) in the qualification set. The qualification set of workloads aims to have a combined runtime of roughly 24 hours on the [benchmarking hardware](#benchmarking-hardware). @@ -483,35 +482,7 @@ For a given workload $\bar{w}$, we define the "speedup of a submission $\bar{s}$ ### Benchmark Procedure -#### Multiple Submission - -Our benchmark allows multiple submissions by the same submitter. However, we would like to prevent submitters from circumventing the purpose of the benchmark by, for example, submitting dozens of copies of the same submission with slightly different hyperparameters. Such a bulk submission would result in an unfair advantage on the randomized workloads and is not in the spirit of the benchmark. - -We encourage multiple submissions if they differ substantially. A spirit jury will be responsible for judging whether the submissions are substantially different. This jury will apply stricter scrutiny to submitters with a larger number of submissions. In this context, a submitter refers to an individual (not the general institution or research group they belong to). The total number of submissions by a submitter is the sum of submissions they contributed to. - -##### Requesting Additional Baselines - -Submitters can both contribute and request additional baseline algorithms. This includes existing algorithms with different search spaces or learning rate schedules. These baselines will not be eligible for winning the competition or prize money. - -#### Licensing - -Submitting to the benchmark requires the following legal considerations: - -- A signed [Contributor License Agreement (CLA) "Corporate CLA"](https://mlcommons.org/en/policies/) of MLCommons. -- *Either* membership in MLCommons *or* a signed [non-member test agreement](https://mlcommons.org/en/policies/). -- A signed trademark license agreement, either the member or the non-member version, as appropriate). These license agreements are available upon request to [support@mlcommons.org](mailto:support@mlcommons.org). - -We furthermore require all submissions to be made available open source after the submission deadline under the [Apache 2 License](https://www.apache.org/licenses/LICENSE-2.0). - -#### Awards and prize money - -An awards committee will award a prize for the "*Best Performance*" in each ruleset as well as a "*Jury Award*". The prize for the best-performing submission will take into account the [benchmark score](#benchmark-score-using-performance-profiles) on the full benchmark. The "*Jury Award*" will favor more out-of-the-box ideas that show great potential, even though the method may not be of practical value with the current landscape of models, software, etc. - -The prize money for "*Best Performance*" in a ruleset is $20,000 each. The winner of the "*Jury Award*" will be awarded $10,000. We reserve the right to split the prize money and distribute it among multiple submissions. - -The chairs of the MLCommons Algorithms Working Group (presently *George Dahl* and *Frank Schneider*) and their institutions (currently *Google Inc.* and the *University of Tübingen*) are ineligible to receive prize money. In addition, all individuals serving on the awards committee and their institutions are ineligible to win prize money. A submission with at least one ineligible submitter may still win an award, but the prize money will then be awarded to the top-ranked submission that is eligible for prize money. - -Submitters may self-report the results of their submissions as long as they follow the benchmark protocol (e.g. use the time to reach the validation target for tuning, use the hyperparameter samples provided by the working group, etc.). The working group will independently verify the self-reported submissions with the highest scores. Only verified results are eligible to win the benchmark and be awarded prize money. +For a description of how to submit a training algorithm to the AlgoPerf: Training Algorithms Benchmark, see the [Call for submissions](CALL_FOR_SUBMISSIONS.md), which details the entire competition process. ## Model Track diff --git a/SUBMISSION_PROCESS_RULES.md b/SUBMISSION_PROCESS_RULES.md new file mode 100644 index 000000000..51aeff043 --- /dev/null +++ b/SUBMISSION_PROCESS_RULES.md @@ -0,0 +1,169 @@ +# MLCommons™ AlgoPerf: Submission Process Rules + +**Version:** 0.0.3 *(Last updated 10 Oktober 2023)* + +- [Basics](#basics) +- [Schedule](#schedule) + - [Dates](#dates) + - [Version freeze](#version-freeze) + - [Submission deadline](#submission-deadline) +- [Submission](#submission) + - [Register a submission](#register-a-submission) + - [How to submit](#how-to-submit) + - [Submission repository](#submission-repository) + - [Licensing](#licensing) + - [Multiple Submission](#multiple-submission) +- [Scoring](#scoring) + - [Self-reporting scores](#self-reporting-scores) + - [Verifying scores](#verifying-scores) + - [Sampling held-out workloads and hyperparameters](#sampling-held-out-workloads-and-hyperparameters) + - [Leaderboard](#leaderboard) +- [Sprit jury \& challenging submissions](#sprit-jury--challenging-submissions) +- [Awards and prize money](#awards-and-prize-money) + - [Awards committee](#awards-committee) +- [Ineligibility and conflict of interest](#ineligibility-and-conflict-of-interest) + +## Basics + +This is the submission process rules for the AlgoPerf: Training Algorithms Benchmark. It describes the process of submitting a new training algorithm and details how it will be scored. This process applies to both the external tuning ruleset and the self-tuning ruleset although, for all intents and purposes, they are two separate competitions, with separate leaderboards. + +Three additional documents complement this document: + +- [**Benchmark rules**](RULES.md): While the submission process rules detail the *logistical* aspects of submitting to the AlgoPerf: Training Algorithms Benchmark, the [rules document](RULES.md) describes the *scientific* rules of the competition. This includes, for example, how tuning is performed in each ruleset, what types of submissions are allowed, or how the benchmark score is computed. +- [**AlgoPerf paper**](https://arxiv.org/abs/2306.07179): The paper titled ["Benchmarking Neural Network Training Algorithms"](https://arxiv.org/abs/2306.07179) motivates the need for the benchmark, explains the rules, and justifies the specific design choices of the AlgoPerf: Training Algorithms Benchmark. Additionally, it evaluates baseline submissions, constructed using various optimizers like Adam, Shampoo, or SAM, on the benchmark, demonstrating the feasibility but also the difficulty of the benchmark. +- [**Benchmark codebase**](https://github.com/mlcommons/algorithmic-efficiency): The codebase implements the rules, provides exact specifications of the workloads, and it will ultimately be used to score submissions. + +## Schedule + +### Dates + +- **Publication of the call for submission: 17. October 2023 (08:00 AM UTC)** +- Registration deadline to express non-binding intent to submit: 15. December 2023 (08:00 AM UTC) +- Version freeze for the benchmark codebase: 17. January 2024 (08:00 AM UTC) +- **Submission deadline: 15. February 2024 (08:00 AM UTC)** +- Sampling the held-out workloads and hyperparameters: 16. February 2024 (08:00 AM UTC) +- Deadline for specifying the submission batch sizes for held-out workloads: 28. February 2024 (08:00 AM UTC) +- Deadline for self-reporting results: 10. April 2024 (08:00 AM UTC) +- **[extra tentative] Announcement of all results: 22. May 2024 (08:00 AM UTC)** + +The presented dates are subject to change and adjustments may be made by the [MLCommmons Algorithms Working Group](https://mlcommons.org/en/groups/research-algorithms/). + +### Version freeze + +The benchmark code base is subject to change after the call for submissions is published. For example, while interacting with the codebase, if submitters encounter bugs or API limitations, they have the option to issue a bug report. This might lead to modifications of the benchmark codebase even after the publication of the call for submissions. + +To ensure that all submitters can develop their submissions based on the same code that will be utilized for scoring, we will freeze the package versions of the codebase dependencies before the submission deadline. By doing so, we level the playing field for everyone involved, ensuring fairness and consistency in the assessment of submissions. We will also try to minimize changes to the benchmark codebase as best as possible. + +### Submission deadline + +With the submission deadline, all submissions need to be available as a *public* repository with the appropriate license (see the [Licensing section](#licensing)). No changes to the submission code are allowed after the submission deadline (with the notable exception of specifying the batch size for the - at that point unknown - held-out workloads). Once the submission deadline has passed, the working group will publish a list of all submitted algorithms, along with their associated repositories. Anyone has the right to challenge a submission, i.e. request a review by the spirit jury to determine whether a submission violates the rules of the competition, see the [Spirit jury section](#sprit-jury--challenging-submissions). + +Directly after the submission deadline, all randomized aspects of the competition are fixed. This includes sampling the held-out workloads from the set of randomized workloads, as well as, sampling the hyperparameters for each submission in the external tuning ruleset (for more details see the [Sampling held-out workloads and hyperparameters section](#sampling-held-out-workloads-and-hyperparameters)). After that, submitters can now ascertain the appropriate batch size of their submission on each held-out workload and self-report scores on either the qualification set or the full benchmarking set of workloads including both fixed and held-out workloads (see the [Self-reporting scores section](#self-reporting-scores)). + +## Submission + +For a guide on the technical steps and details on how to write a submission, please refer to the [**Getting started document**](GETTING_STARTED.md). Additionally, the folders [/reference_algorithms](/reference_algorithms/) and [/baselines](/baselines/) provide example submissions that can serve as a template for creating new submissions. + +In the following, we describe the logistical steps required to submit a training algorithm to the AlgoPerf: Training Algorithms Benchmark. + +### Register an intent to submit + +All submitters need to register an intent to submit before the submission registration deadline. This registration is mandatory, i.e. required for all submissions, but not binding, i.e. you don't have to submit a registered submission. This registration is necessary, to estimate the number of submissions and provide support for potential submitters. + +To register an intent to submission, please fill out this [online form](https://forms.gle/iY1bUhwSjj1JZ4fa9) with the following information + +- Name of the submission (e.g. name of the algorithm, or any other arbitrary identifier). +- Ruleset under which the submission will be scored. +- Name, email, and affiliations of all submitters associated with this submission. +- Interest in compute support. + +The submission will be issued a unique **submission ID** that will be used throughout the submission process. + +### How to submit + +Submitters have the flexibility to submit their training algorithm anytime between the registration of the submission and the submission deadline. To submit a submission, please write an email to with the subject "[Submission] *submission_ID*" and the following information: + +- Submission ID. +- URL of the associated *public* GitHub repository. +- If applicable, a list of all changes to the names, emails, or affiliations compared to the registration of the submission. +- A digital version of all relevant licensing documents (see the [Licensing section](#licensing)). + +#### Submission repository + +The *public* GitHub repository needs to be a clone of the frozen `main` branch of the [benchmark codebase](https://github.com/mlcommons/algorithmic-efficiency). All elements of the original codebase, except for the `/submission` directory need to be unaltered from the original benchmark code. In particular, the repository must use the same [Apache 2 License](https://www.apache.org/licenses/LICENSE-2.0) as the benchmark codebase. Once the submission deadline has passed, modifications of the submission repository's code are generally prohibited. The sole exception to this rule is the definition of the batch sizes for the held-out workloads. + +Any software dependencies required for the submission need to be defined in a `requirements.txt` file within the `/submission` directory. This file needs to be `pip` readable, i.e. installable via `pip install -r requirements.txt`. In order to comply with the rules, submissions are not allowed to modify the used package version of the software dependencies of the benchmarking codebase, e.g. by using a different version of PyTorch or JAX (see [](RULES.md#disallowed-submissions)). + +#### Licensing + +Submitting to the AlgoPerf: Training Algorithms Benchmark requires the following legal considerations: + +- A signed [Contributor License Agreement (CLA) "Corporate CLA"](https://mlcommons.org/en/policies/) of MLCommons. +- *Either* a membership in MLCommons *or* a signed [non-member test agreement](https://mlcommons.org/en/policies/). +- A signed trademark license agreement, either the member or the non-member version, as appropriate. These license agreements are available upon request to [support@mlcommons.org](mailto:support@mlcommons.org). + +We furthermore require all submissions to be made available open source on the submission deadline under the [Apache 2 License](https://www.apache.org/licenses/LICENSE-2.0). + +### Multiple Submission + +Our benchmark allows multiple submissions by the same submitter(s). However, we would like to prevent submitters from circumventing the purpose of the benchmark by, for example, submitting dozens of copies of the same submission with slightly different hyperparameters. Such a bulk submission would result in an unfair advantage on the randomized workloads and is not in the spirit of the benchmark. + +Submitters may submit algorithms marked as *baselines*. These might include existing algorithms with different search spaces or learning rate schedules. These baseline algorithms are not eligible for winning the competition or prize money but they are also not required to be "substantially different" from other submissions by the same submitters. + +## Scoring + +### Self-reporting scores + +Submitters are expected to self-report scores on the full benchmark set before the deadline for self-reporting results. Reporting the scores involves providing all unmodified logs that the benchmarking codebase automatically generates in a separate `/results` directory within the `/submission` folder. For submissions competing in the external tuning ruleset, this includes all the logs of the tuning trials using the [hyperparameter samples provided by the working group](#sampling-held-out-workloads-and-hyperparameters). Note, that while the tuning runs can be performed on non-competition hardware, they still need to show that the "winning hyperparameter configuration" in each study was selected according to the [tuning rules](/RULES.md#external-tuning-ruleset), i.e. the fastest hyperparameter to reach the validation target. Additionally, the logs of the "winning hyperparameter configuration" (or each trial, in the self-tuning ruleset) in each of the five studies need to be computed on the competition hardware, to allow wall-clock runtime comparisons. + +Submitters unable to self-fund scoring costs can instead self-report only on the [qualification set of workloads](/RULES.md#qualification-set) that excludes some of the most expensive workloads. Based on this performance on the qualification set, the working group will provide - as funding allows - compute to evaluate and score the most promising submissions. Additionally, we encourage researchers to reach out to the [working group](mailto:algorithms@mlcommons.org) to find potential collaborators with the resources to run larger, more comprehensive experiments for both developing and scoring submissions. + +#### Verifying scores + +The working group will independently verify the scores of the highest-scoring submissions in each ruleset. Results that have been verified by the working group will be clearly marked on the leaderboard. + +### Sampling held-out workloads and hyperparameters + +After the submission deadline has passed and all submission code is frozen, the working group will sample a specific instance of held-out workloads from the set of randomized workloads. Additionally, every submission in the external tuning ruleset will receive its specific set of 5x20 hyperparameter values grouped by study. This set of hyperparameter values is sampled from the search space provided by the submitters. + +The sampling code for the held-out workloads and the hyperparameters is publicly available (**TODO link to both functions!**). Both sampling functions take as input a random seed, which will be provided by a trusted third party after the submission deadline. + +### Leaderboard + +The announcement of the results will contain two separate leaderboards, one for the self-tuning and one for the external tuning ruleset. All valid submissions will be ranked by the benchmark score, taking into account all workloads, including the held-out ones. The leaderboard will clearly mark scores that were verified by the working group. + +## Sprit jury & challenging submissions + +The spirit jury, consisting of selected active members of the working group, will be responsible for deciding whether a submission violates the "spirit of the rules". Submitters with specific concerns about a particular submission can request a review by the spirit jury to determine whether a submission violates the rules of the competition. To challenge a submission, please write an email to with the subject "[Challenge] *submission_name*". The email needs to link to the challenged submission and include a detailed description of why the submission should be reviewed. This request must be made reasonably in advance of the results announcement deadline to allow the Spirit Jury sufficient time to conduct a thorough review. + +The spirit jury may then hear the justifications of the submitters, inspect the code, and also ask the submitters to explain how the submission was produced, for example, by disclosing their intermediate experiments. Example cases that might be reviewed by the spirit jury are cases of multiple similar submissions by the same submitter or extensive workload-specific tuning. + +## Awards and prize money + +An awards committee will award a prize for the "*Best Performance*" in each ruleset as well as a "*Innovative Submission Award*". The prize for the best-performing submission will take into account the [benchmark score](RULES.md#benchmark-score-using-performance-profiles) on the full benchmark. The "*Innovative Submission Award*" will favor more out-of-the-box ideas that show great potential, even though the method may not be of practical value with the current landscape of models, software, etc. + +The prize money for "*Best Performance*" in a ruleset is $20,000 each. The winner of the "*Innovative Submission Award*" will be awarded $10,000. We reserve the right to split the prize money and distribute it among multiple submissions. + +If a submission is ineligible to win prize money it can still win an award. The prize money will then go to the highest-ranking eligible submission. + +### Awards committee + +The awards committee will be responsible for awarding prize money to submissions. The committee will try to reach a consensus on how to award prize money and settle disagreements by majority vote, if necessary. + +**TODO Who is on the Awards committee?** + +## Ineligibility and conflict of interest + +To ensure a fair process and avoid conflicts of interest, some individuals and institutions are ineligible to win prize money. This includes: + +- The chairs of the MLCommons Algorithms Working Group (presently *George Dahl* and *Frank Schneider*) and their associated institutions (currently *Google Inc.* and the *University of Tübingen*) +- All individuals serving on the awards committee and their associated institutions. + +A submission with at least one participating ineligible entity may still win an award, but the prize money will then be given to the top-ranked submission that does not contain ineligible entities. + +Additionally, we require members of the spirit jury to abstain from being involved in a review if: + +- They are part of the reviewed submission. +- The reviewed submission contains individuals from their institution. + +The spirit jury can still take a decision if at least one member of the jury is without a conflict of interest.