From 9e6ed419b04d3027d6411c4c11d4edfa5a4d9d7a Mon Sep 17 00:00:00 2001 From: Frank Schneider Date: Thu, 10 Aug 2023 17:24:55 +0200 Subject: [PATCH 01/15] Add draft for CfS --- CALL_FOR_SUBMISSIONS.md | 178 ++++++++++++++++++++++++++++++++++++++++ README.md | 50 ++++++++--- RULES.md | 78 ++++++------------ getting_started.md | 78 ++++++++++++------ 4 files changed, 293 insertions(+), 91 deletions(-) create mode 100644 CALL_FOR_SUBMISSIONS.md diff --git a/CALL_FOR_SUBMISSIONS.md b/CALL_FOR_SUBMISSIONS.md new file mode 100644 index 000000000..4f93c8996 --- /dev/null +++ b/CALL_FOR_SUBMISSIONS.md @@ -0,0 +1,178 @@ +# MLCommons™ AlgoPerf: Call for Submissions + +**Version:** 0.0.1 *(Last updated 10 August 2023)* + +- [MLCommons™ AlgoPerf: Call for Submissions](#mlcommons-algoperf-call-for-submissions) + - [Basics](#basics) + - [Schedule](#schedule) + - [Dates](#dates) + - [Code freeze](#code-freeze) + - [Submission deadline](#submission-deadline) + - [Submission](#submission) + - [Register a submission](#register-a-submission) + - [How to submit](#how-to-submit) + - [Submission repository](#submission-repository) + - [Licensing](#licensing) + - [Multiple Submission](#multiple-submission) + - [Requesting Additional Baselines](#requesting-additional-baselines) + - [Scoring](#scoring) + - [Self-reporting scores](#self-reporting-scores) + - [Verifying scores](#verifying-scores) + - [Sampling held-out workloads and hyperparameters](#sampling-held-out-workloads-and-hyperparameters) + - [Leaderboard](#leaderboard) + - [Sprit jury \& challenging submissions](#sprit-jury--challenging-submissions) + - [Awards and prize money](#awards-and-prize-money) + - [Awards committee](#awards-committee) + - [Ineligibility and conflict of interest](#ineligibility-and-conflict-of-interest) + +## Basics + +This is the call for submissions for the AlgoPerf: Training Algorithms Benchmark. The call describes the process of submitting a new training algorithm and details how it will be scored. This call applies to both the external tuning ruleset and the self-tuning ruleset although, for all intents and purposes, they are two separate competitions, with separate leaderboards. + +Three additional documents complement this call for submissions: + +- [**Benchmark rules**](RULES.md): While the call for submissions details the *logistical* aspects of submitting to the AlgoPerf: Training Algorithms Benchmark, the [rules document](RULES.md) describes the *scientific* rules of the competition. This includes, for example, how tuning is performed in each ruleset, what types of submissions are allowed, or how the benchmark score is computed. +- [**AlgoPerf paper**](https://arxiv.org/abs/2306.07179): The paper titled ["Benchmarking Neural Network Training Algorithms"](https://arxiv.org/abs/2306.07179) motivates the need for the benchmark, explains the rules, and justifies the specific design choices of the AlgoPerf: Training Algorithms Benchmark. Additionally, it evaluates baseline submissions, constructed using various optimizers like Adam, Shampoo, or SAM, on the benchmark, demonstrating the feasibility but also the difficulty of the benchmark. +- [**Benchmark codebase**](https://github.com/mlcommons/algorithmic-efficiency): The codebase implements the rules, provides exact specifications of the workloads, and it will ultimately be used to score submissions. + +## Schedule + +### Dates + +- **Publication of the call for submission: 01. September 2023 (08:00 AM UTC)** +- Registration deadline for submissions: 01. November 2023 (08:00 AM UTC) +- Code freeze for the benchmark codebase: 01. December 2023 (08:00 AM UTC) +- **Submission deadline: 01. January 2024 (08:00 AM UTC)** +- Sampling the held-out workloads and hyperparameters: 02. January 2024 (08:00 AM UTC) +- Deadline for challenging submissions: 01. February 2024 (08:00 AM UTC) +- Deadline for self-reporting results: 01. March 2024 (08:00 AM UTC) +- **Publication of all results: 01. April 2024 (08:00 AM UTC)** + +The presented dates are subject to change and adjustments may be made by the [MLCommmons Algorithms Working Group](https://mlcommons.org/en/groups/research-algorithms/). + +### Code freeze + +The benchmark code base is subject to change after the call for proposals is published. For example, while interacting with the codebase, if submitters encounter bugs or API limitations, they have the option to issue a bug report. This might lead to modifications of the benchmark codebase even after the publication of the call for submissions. + +To ensure that all submitters can develop their submissions based on the exact same code that will be utilized for scoring, we will freeze the benchmark codebase before the submission deadline. By doing so, we level the playing field for everyone involved, ensuring fairness and consistency in the assessment of submissions. The code freeze also involves fixing all package versions of the codebase dependencies, such as JAX, PyTorch, etc. + +### Submission deadline + +With the submission deadline, all submissions need to be available as a *public* repository with the appropriate license (see the [Licensing section](#licensing)). No changes to the submission code are allowed after the submission deadline (with the notable exception of specifying the batch size for the - at that point unknown - held-out workloads). Once the submission deadline has passed, the working group will publish a list of all submitted algorithms, along with their associated repositories. Until the deadline for challenging submissions, anyone has the right to challenge a submission, i.e. request a review by the spirit jury to determine whether a submission violates the rules of the competition, see the [Spirit jury section](#sprit-jury--challenging-submissions). + +Directly after the submission deadline, all randomized aspects of the competition are fixed. This includes sampling the held-out workloads from the set of randomized workloads, as well as, sampling the hyperparameters for each submission in the external tuning ruleset (for more details see the [Sampling held-out workloads and hyperparameters section](#sampling-held-out-workloads-and-hyperparameters)). After that, submitters can now ascertain the appropriate batch size of their submission on each held-out workload and self-report scores on either the qualification set or the full benchmarking set of workloads including both fixed and held-out workloads (see the [Self-reporting scores section](#self-reporting-scores)). + +## Submission + +For a guide on the technical steps and details on how to write a submission, please refer to the [**Getting started document**](GETTING_STARTED.md). Additionally, the folders [/reference_algorithms](/reference_algorithms/) and [/baselines](/baselines/) provide example submissions that can serve as a template for creating new submissions. + +In the following, we describe the logistical steps required to submit a training algorithm to the AlgoPerf: Training Algorithms Benchmark. + +### Register a submission + +All submitters need to register an intent to submit before the submission registration deadline. This registration is mandatory, i.e. required for all submissions, but not binding, i.e. you don't have to submit a registered submission. This registration is necessary, to estimate the number of submissions and provide support for potential submitters. + +To register a submission, please write an email to with the subject "[Registration] *submission_name*" and the following information: + +- Name of the submission (e.g. name of the algorithm, or any other arbitrary identifier). +- Ruleset under which the submission will be scored. +- Name of all submitters associated with this submission. +- Email of all submitters associated with this submission. +- Affiliations of all submitters associated with this submission. + +In return, the submission will be issued a unique **submission ID** that will be used throughout the submission process. + +### How to submit + +Submitters have the flexibility to submit their training algorithm anytime between the registration of the submission and the submission deadline. To submit a submission, please write an email to with the subject "[Submission] *submission_ID*" and the following information: + +- Submission ID. +- URL of the associated *public* GitHub repository. +- If applicable, a list of all changes to the names, emails, or affiliations compared to the registration of the submission. +- A digital version of all relevant licensing documents (see the [Licensing section](#licensing)). + +#### Submission repository + +The *public* GitHub repository needs to be a clone of the frozen `main` branch of the [benchmark codebase](https://github.com/mlcommons/algorithmic-efficiency). All elements of the original codebase, except for the `/submission` directory need to be unaltered from the original benchmark code. In particular, the repository must use the same [Apache 2 License](https://www.apache.org/licenses/LICENSE-2.0) as the benchmark codebase. Once the submission deadline has passed, modifications of the submission repository's code are generally prohibited. The sole exception to this rule is the definition of the batch sizes for the held-out workloads. + +Any software dependencies required for the submission need to be defined in a `requirements.txt` file within the `/submission` directory. This file needs to be `pip` readable, i.e. installable via `pip install -r requirements.txt`. In order to comply with the rules, submissions are not allowed to modify the used package version of the software dependencies of the benchmarking codebase, e.g. by using a different version of PyTorch or JAX (see [](RULES.md#disallowed-submissions)). + +#### Licensing + +Submitting to the AlgoPerf: Training Algorithms Benchmark requires the following legal considerations: + +- A signed [Contributor License Agreement (CLA) "Corporate CLA"](https://mlcommons.org/en/policies/) of MLCommons. +- *Either* a membership in MLCommons *or* a signed [non-member test agreement](https://mlcommons.org/en/policies/). +- A signed trademark license agreement, either the member or the non-member version, as appropriate. These license agreements are available upon request to [support@mlcommons.org](mailto:support@mlcommons.org). + +We furthermore require all submissions to be made available open source after the submission deadline under the [Apache 2 License](https://www.apache.org/licenses/LICENSE-2.0). + +### Multiple Submission + +Our benchmark allows multiple submissions by the same submitter(s). However, we would like to prevent submitters from circumventing the purpose of the benchmark by, for example, submitting dozens of copies of the same submission with slightly different hyperparameters. Such a bulk submission would result in an unfair advantage on the randomized workloads and is not in the spirit of the benchmark. + +We encourage multiple submissions if they differ substantially. The spirit jury will be responsible for judging whether the submissions are substantially different. This jury will apply stricter scrutiny to submitters with a larger number of submissions. In this context, a submitter refers to an individual (not the general institution or research group they belong to). The total number of submissions by a submitter is the sum of submissions they contributed to. + +### Requesting Additional Baselines + +Submitters can both contribute and request additional baseline algorithms. This includes existing algorithms with different search spaces or learning rate schedules. These baselines will not be eligible for winning the competition or prize money. + +## Scoring + +### Self-reporting scores + +Submitters are expected to self-report scores on the full benchmark set before the deadline for self-reporting results. Reporting the scores involves providing all unmodified logs that the benchmarking codebase automatically generates in a separate `/results` directory within the `/submission` folder. For submissions competing in the external tuning ruleset, this includes all the logs of the tuning trials using the [hyperparameter samples provided by the working group](#sampling-held-out-workloads-and-hyperparameters). Note, that while the tuning runs can be performed on non-competition hardware, they still need to show that the "winning hyperparameter" in each study was selected according to the [tuning rules](/RULES.md#external-tuning-ruleset), i.e. the fastest hyperparameter to reach the validation target. Additionally, the logs of the "winning hyperparameter" (or each trial, in the self-tuning ruleset) in each of the five studies need to be computed on the competition hardware, to allow wall-clock runtime comparisons. + +Submitters unable to self-fund scoring costs can instead self-report only on the [qualification set of workloads](/RULES.md#qualification-set) that excludes some of the most expensive workloads. Based on this performance on the qualification set, the working group will provide compute to evaluate and score the most promising submissions. Additionally, we encourage researchers to reach out to the [working group](mailto:algorithms@mlcommons.org) to find potential collaborators with the resources to run larger, more comprehensive experiments for both developing and scoring submissions. + +#### Verifying scores + +The working group will independently verify the scores of the highest-scoring submissions in each ruleset. Results that have been verified by the working group will be clearly marked on the leaderboard. + +### Sampling held-out workloads and hyperparameters + +After the submission deadline has passed and all submission code is frozen, the working group will sample a specific instance of held-out workloads from the set of randomized workloads. Additionally, every submission in the external tuning ruleset will receive its specific set of 5x20 hyperparameter values grouped by study. This set of hyperparameter values is sampled from the search space provided by the submitters. + +The sampling code for the held-out workloads and the hyperparameters is publicly available (**TODO link to both functions!**). Both sampling functions take as input a random seed, which will be provided by a trusted third party after the submission deadline. + +### Leaderboard + +The publication of the results will contain two separate leaderboards, one for the self-tuning and one for the external tuning ruleset. All valid submissions will be ranked by the benchmark score, taking into account all workloads, including the held-out ones. The leaderboard will clearly mark scores that were verified by the working group. + +## Sprit jury & challenging submissions + +The spirit jury will be responsible for deciding whether a submission violates the "spirit of the rules". Until the deadline for challenging submissions, anyone has the right to challenge a submission, i.e. request a review by the spirit jury to determine whether a submission violates the rules of the competition. To challenge a submission, please write an email to with the subject "[Challenge] *submission_name*". The email can be written anonymously but it is required to link to the challenged submission and a detailed description of why the submission should be reviewed needs to be attached. + +The spirit jury may then hear the justifications of the submitters, inspect the code, and also ask the submitters to explain how the submission was produced, for example, by disclosing their intermediate experiments. Example cases that might be reviewed by the spirit jury are cases of multiple similar submissions by the same submitter or extensive workload-specific tuning. + +In the event of a review, the spirit jury will hold a vote, which will be decided by a simple majority. + +**TODO Who is on the Jury?** + +## Awards and prize money + +An awards committee will award a prize for the "*Best Performance*" in each ruleset as well as a "*Jury Award*". The prize for the best-performing submission will take into account the [benchmark score](RULES.md#benchmark-score-using-performance-profiles) on the full benchmark. The "*Jury Award*" will favor more out-of-the-box ideas that show great potential, even though the method may not be of practical value with the current landscape of models, software, etc. + +The prize money for "*Best Performance*" in a ruleset is $20,000 each. The winner of the "*Jury Award*" will be awarded $10,000. We reserve the right to split the prize money and distribute it among multiple submissions. + +### Awards committee + +The awards committee will be responsible for awarding prize money to submissions. Members of the awards committee can suggest submissions to be considered for the awards. The committee will vote on the winning submissions, the submission with the most votes in each respective category wins the awards, and if eligible, the prize money. + +**TODO Who is on the Awards committee?** + +## Ineligibility and conflict of interest + +To ensure a fair process and avoid conflicts of interest, some individuals and institutions are ineligible to win prize money. This includes: + +- The chairs of the MLCommons Algorithms Working Group (presently *George Dahl* and *Frank Schneider*) and their institutions (currently *Google Inc.* and the *University of Tübingen*) +- All individuals serving on the awards committee and their institutions. + +A submission with at least one ineligible submitter may still win an award, but the prize money will then be awarded to the top-ranked submission that is eligible for prize money. + +Additionally, we require members of the spirit jury to abstain from being involved in a review if: + +- They are part of the reviewed submission. +- The reviewed submission contains individuals from their institution. + +The spirit jury can still take a decision if at least one member of the jury is without a conflict of interest. diff --git a/README.md b/README.md index c60efae60..cbd28ed41 100644 --- a/README.md +++ b/README.md @@ -23,17 +23,23 @@ [MLCommons Algorithmic Efficiency](https://mlcommons.org/en/groups/research-algorithms/) is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models. This repository holds the [competition rules](RULES.md) and the benchmark code to run it. For a detailed description of the benchmark design, see our [paper](https://arxiv.org/abs/2306.07179). # Table of Contents + +- [MLCommons™ Algorithmic Efficiency](#mlcommons-algorithmic-efficiency) - [Table of Contents](#table-of-contents) -- [AlgoPerf Benchmark Workloads](#algoperf-benchmark-workloads) -- [Installation](#installation) - - [Docker](#docker) + - [Installation](#installation) + - [Virtual environment](#virtual-environment) + - [Docker](#docker) + - [Building Docker Image](#building-docker-image) + - [Running Docker Container (Interactive)](#running-docker-container-interactive) + - [Running Docker Container (End-to-end)](#running-docker-container-end-to-end) - [Getting Started](#getting-started) + - [Running a workload](#running-a-workload) - [Rules](#rules) - [Contributing](#contributing) -- [Citing AlgoPerf Benchmark](#citing-algoperf-benchmark) - +- [Note on shared data pipelines between JAX and PyTorch](#note-on-shared-data-pipelines-between-jax-and-pytorch) ## Installation + You can install this package and dependences in a [python virtual environment](#virtual-environment) or use a [Docker container](#install-in-docker) (recommended). *TL;DR to install the Jax version for GPU run:* @@ -51,10 +57,13 @@ You can install this package and dependences in a [python virtual environment](# pip3 install -e '.[pytorch_gpu]' -f 'https://download.pytorch.org/whl/torch_stable.html' pip3 install -e '.[full]' ``` -## Virtual environment + +## Virtual environment + Note: Python minimum requirement >= 3.8 To set up a virtual enviornment and install this repository + 1. Create new environment, e.g. via `conda` or `virtualenv` ```bash @@ -87,16 +96,18 @@ or all workloads at once via ```bash pip3 install -e '.[full]' ``` + ## Docker -We recommend using a Docker container to ensure a similar environment to our scoring and testing environments. +We recommend using a Docker container to ensure a similar environment to our scoring and testing environments. -**Prerequisites for NVIDIA GPU set up**: You may have to install the NVIDIA Container Toolkit so that the containers can locate the NVIDIA drivers and GPUs. +**Prerequisites for NVIDIA GPU set up**: You may have to install the NVIDIA Container Toolkit so that the containers can locate the NVIDIA drivers and GPUs. See instructions [here](https://github.com/NVIDIA/nvidia-docker). ### Building Docker Image + 1. Clone this repository ```bash @@ -104,16 +115,19 @@ See instructions [here](https://github.com/NVIDIA/nvidia-docker). ``` 2. Build Docker Image + ```bash cd `algorithmic-efficiency/docker` docker build -t . --build-args framework= ``` - The `framework` flag can be either `pytorch`, `jax` or `both`. - The `docker_image_name` is arbitrary. + The `framework` flag can be either `pytorch`, `jax` or `both`. + The `docker_image_name` is arbitrary. ### Running Docker Container (Interactive) + 1. Run detached Docker Container + ```bash docker run -t -d \ -v $HOME/data/:/data/ \ @@ -124,18 +138,24 @@ See instructions [here](https://github.com/NVIDIA/nvidia-docker). --ipc=host \ ``` - This will print out a container id. + + This will print out a container id. 2. Open a bash terminal + ```bash docker exec -it /bin/bash ``` ### Running Docker Container (End-to-end) + To run a submission end-to-end in a container see [Getting Started Document](./getting_started.md#run-your-submission-in-a-docker-container). # Getting Started + For instructions on developing and scoring your own algorithm in the benchmark see [Getting Started Document](./getting_started.md). + ## Running a workload + To run a submission directly by running a Docker container, see [Getting Started Document](./getting_started.md#run-your-submission-in-a-docker-container). Alternatively from a your virtual environment or interactively running Docker container `submission_runner.py` run: @@ -163,6 +183,7 @@ python3 submission_runner.py \ --submission_path=reference_algorithms/development_algorithms/mnist/mnist_pytorch/submission.py \ --tuning_search_space=reference_algorithms/development_algorithms/mnist/tuning_search_space.json ``` +
Using Pytorch DDP (Recommended) @@ -176,11 +197,13 @@ torchrun --standalone --nnodes=1 --nproc_per_node=N_GPUS ``` where `N_GPUS` is the number of available GPUs on the node. To only see output from the first process, you can run the following to redirect the output from processes 1-7 to a log file: + ```bash torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 --standalone --nnodes=1 --nproc_per_node=8 ``` So the complete command is for example: + ``` torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 --standalone --nnodes=1 --nproc_per_node=8 \ submission_runner.py \ @@ -191,15 +214,16 @@ submission_runner.py \ --submission_path=reference_algorithms/development_algorithms/mnist/mnist_pytorch/submission.py \ --tuning_search_space=reference_algorithms/development_algorithms/mnist/tuning_search_space.json \ ``` -
+ # Rules + The rules for the MLCommons Algorithmic Efficency benchmark can be found in the seperate [rules document](RULES.md). Suggestions, clarifications and questions can be raised via pull requests. # Contributing -If you are interested in contributing to the work of the working group, feel free to [join the weekly meetings](https://mlcommons.org/en/groups/research-algorithms/), open issues. See our [CONTRIBUTING.md](CONTRIBUTING.md) for MLCommons contributing guidelines and setup and workflow instructions. +If you are interested in contributing to the work of the working group, feel free to [join the weekly meetings](https://mlcommons.org/en/groups/research-algorithms/), open issues. See our [CONTRIBUTING.md](CONTRIBUTING.md) for MLCommons contributing guidelines and setup and workflow instructions. # Note on shared data pipelines between JAX and PyTorch diff --git a/RULES.md b/RULES.md index 873cc1786..7691f4d5c 100644 --- a/RULES.md +++ b/RULES.md @@ -1,32 +1,30 @@ # MLCommons™ AlgoPerf: Benchmark Rules -**Version:** 0.0.16 *(Last updated 28 April 2023)* +**Version:** 0.0.17 *(Last updated 10 August 2023)* > **TL;DR** New training algorithms and models can make neural net training faster. > We need a rigorous training time benchmark that measures time to result given a fixed hardware configuration and stimulates algorithmic progress. We propose a [Training Algorithm Track](#training-algorithm-track) and a [Model Track](#model-track) in order to help disentangle optimizer improvements and model architecture improvements. This two-track structure lets us enforce a requirement that new optimizers work well on multiple models and that new models aren't highly specific to particular training hacks. -- [Introduction](#introduction) -- [Training Algorithm Track](#training-algorithm-track) - - [Submissions](#submissions) - - [Specification](#specification) - - [Evaluation during training](#evaluation-during-training) - - [Valid submissions](#valid-submissions) - - [Tuning](#tuning) - - [External tuning ruleset](#external-tuning-ruleset) - - [Self-tuning ruleset](#self-tuning-ruleset) - - [Workloads](#workloads) - - [Fixed workloads](#fixed-workloads) - - [Randomized workloads](#randomized-workloads) - - [Qualification set](#qualification-set) - - [Scoring](#scoring) - - [Benchmarking hardware](#benchmarking-hardware) - - [Defining target performance](#defining-target-performance) - - [Benchmark score using performance profiles](#benchmark-score-using-performance-profiles) - - [Benchmark Procedure](#benchmark-procedure) - - [Multiple Submission](#multiple-submission) - - [Licensing](#licensing) - - [Awards and prize money](#awards-and-prize-money) -- [Model Track](#model-track) +- [MLCommons™ AlgoPerf: Benchmark Rules](#mlcommons-algoperf-benchmark-rules) + - [Introduction](#introduction) + - [Training Algorithm Track](#training-algorithm-track) + - [Submissions](#submissions) + - [Specification](#specification) + - [Evaluation during training](#evaluation-during-training) + - [Valid submissions](#valid-submissions) + - [Tuning](#tuning) + - [External tuning ruleset](#external-tuning-ruleset) + - [Self-tuning ruleset](#self-tuning-ruleset) + - [Workloads](#workloads) + - [Fixed workloads](#fixed-workloads) + - [Randomized workloads](#randomized-workloads) + - [Qualification set](#qualification-set) + - [Scoring](#scoring) + - [Benchmarking hardware](#benchmarking-hardware) + - [Defining target performance](#defining-target-performance) + - [Benchmark score using performance profiles](#benchmark-score-using-performance-profiles) + - [Benchmark Procedure](#benchmark-procedure) + - [Model Track](#model-track) ## Introduction @@ -47,6 +45,8 @@ Submissions to the Training Algorithm Track can be entered under two separate ru The intention is that a training algorithm submission will be broadly applicable and useful without customization to the specific [workload](#workloads) (model, dataset, loss function). We want to discourage detecting the particular workload and doing something highly specific that isn't generally useful. In order to further discourage submissions that overfit to the particular [fixed benchmark workloads](#fixed-workloads), submissions will also be evaluated on [held-out workloads](#randomized-workloads) specified after the submission deadline. +For a description of how to submit a training algorithm to the AlgoPerf: Training Algorithms Benchmark, see the [Call for submissions](CALL_FOR_SUBMISSIONS.md), which details the entire competition process. + ### Submissions A valid submission is a piece of code that defines all of the submission functions and is able to train all benchmark workloads on the [benchmarking hardware](#benchmarking-hardware) (defined in the [Scoring](#scoring) section). Both the validation set and the test set performance will be checked regularly during training (see the [Evaluation during training](#evaluation-during-training) section). Training halts when the workload-specific [target errors](#defining-target-performance) for the validation and test sets have been reached. For each workload, the training time to reach the *test* set target error is used as input to the [scoring process](#scoring) for the submission. Submissions using [external tuning](#external-tuning-ruleset) will be tuned independently for each workload using a single workload-agnostic search space for their specified hyperparameters. The tuning trials are selected based on the time to reach the *validation* target, but only their training times to reach the *test* target will be used for scoring. Submissions under either tuning ruleset may always self-tune while on the clock. @@ -400,7 +400,7 @@ Our scoring procedure uses the held-out workloads only to penalize submissions t #### Qualification set -The qualification set is designed for submitters that may not have the compute resources to self-report on the full set of [fixed](#fixed-workloads) and [held-out workloads](#randomized-workloads). They may instead self-report numbers on this smaller qualification set. The best-performing submissions may then qualify for compute sponsorship offering a free evaluation on the full benchmark set and therefore the possibility to win [awards and prize money](#awards-and-prize-money). +The qualification set is designed for submitters that may not have the compute resources to self-report on the full set of [fixed](#fixed-workloads) and [held-out workloads](#randomized-workloads). They may instead self-report numbers on this smaller qualification set. The best-performing submissions may then qualify for compute sponsorship offering a free evaluation on the full benchmark set and therefore the possibility to win [awards and prize money](/CALL_FOR_SUBMISSIONS.md#awards-and-prize-money). The qualification set consists of the same [fixed workloads](#fixed-workloads) as mentioned above, except for both workloads on *ImageNet*, both workloads on *LibriSpeech*, and the *fastMRI* workload. The remaining three workloads (*WMT*, *Criteo 1TB*, and *OGBG*) form the qualification set. There are no [randomized workloads](#randomized-workloads) in the qualification set. The qualification set of workloads aims to have a combined runtime of roughly 24 hours on the [benchmarking hardware](#benchmarking-hardware). @@ -483,35 +483,7 @@ For a given workload $\bar{w}$, we define the "speedup of a submission $\bar{s}$ ### Benchmark Procedure -#### Multiple Submission - -Our benchmark allows multiple submissions by the same submitter. However, we would like to prevent submitters from circumventing the purpose of the benchmark by, for example, submitting dozens of copies of the same submission with slightly different hyperparameters. Such a bulk submission would result in an unfair advantage on the randomized workloads and is not in the spirit of the benchmark. - -We encourage multiple submissions if they differ substantially. A spirit jury will be responsible for judging whether the submissions are substantially different. This jury will apply stricter scrutiny to submitters with a larger number of submissions. In this context, a submitter refers to an individual (not the general institution or research group they belong to). The total number of submissions by a submitter is the sum of submissions they contributed to. - -##### Requesting Additional Baselines - -Submitters can both contribute and request additional baseline algorithms. This includes existing algorithms with different search spaces or learning rate schedules. These baselines will not be eligible for winning the competition or prize money. - -#### Licensing - -Submitting to the benchmark requires the following legal considerations: - -- A signed [Contributor License Agreement (CLA) "Corporate CLA"](https://mlcommons.org/en/policies/) of MLCommons. -- *Either* membership in MLCommons *or* a signed [non-member test agreement](https://mlcommons.org/en/policies/). -- A signed trademark license agreement, either the member or the non-member version, as appropriate). These license agreements are available upon request to [support@mlcommons.org](mailto:support@mlcommons.org). - -We furthermore require all submissions to be made available open source after the submission deadline under the [Apache 2 License](https://www.apache.org/licenses/LICENSE-2.0). - -#### Awards and prize money - -An awards committee will award a prize for the "*Best Performance*" in each ruleset as well as a "*Jury Award*". The prize for the best-performing submission will take into account the [benchmark score](#benchmark-score-using-performance-profiles) on the full benchmark. The "*Jury Award*" will favor more out-of-the-box ideas that show great potential, even though the method may not be of practical value with the current landscape of models, software, etc. - -The prize money for "*Best Performance*" in a ruleset is $20,000 each. The winner of the "*Jury Award*" will be awarded $10,000. We reserve the right to split the prize money and distribute it among multiple submissions. - -The chairs of the MLCommons Algorithms Working Group (presently *George Dahl* and *Frank Schneider*) and their institutions (currently *Google Inc.* and the *University of Tübingen*) are ineligible to receive prize money. In addition, all individuals serving on the awards committee and their institutions are ineligible to win prize money. A submission with at least one ineligible submitter may still win an award, but the prize money will then be awarded to the top-ranked submission that is eligible for prize money. - -Submitters may self-report the results of their submissions as long as they follow the benchmark protocol (e.g. use the time to reach the validation target for tuning, use the hyperparameter samples provided by the working group, etc.). The working group will independently verify the self-reported submissions with the highest scores. Only verified results are eligible to win the benchmark and be awarded prize money. +For a description of how to submit a training algorithm to the AlgoPerf: Training Algorithms Benchmark, see the [Call for submissions](CALL_FOR_SUBMISSIONS.md), which details the entire competition process. ## Model Track diff --git a/getting_started.md b/getting_started.md index d6dd7fcd3..2ccfdfbd7 100644 --- a/getting_started.md +++ b/getting_started.md @@ -1,52 +1,68 @@ # Getting Started Table of Contents: -- [Set up and installation](#workspace-set-up-and-installation) -- [Download the data](#download-the-data) -- [Develop your submission](#develop-your-submission) -- [Run your submission](#run-your-submission) - - [Docker](#run-your-submission-in-a-docker-container) -- [Score your submission](#score-your-submission) + +- [Getting Started](#getting-started) + - [Workspace set up and installation](#workspace-set-up-and-installation) + - [Download the data](#download-the-data) + - [Develop your submission](#develop-your-submission) + - [Set up your directory structure (Optional)](#set-up-your-directory-structure-optional) + - [Coding your submission](#coding-your-submission) + - [Run your submission](#run-your-submission) + - [Pytorch DDP](#pytorch-ddp) + - [Run your submission in a Docker container](#run-your-submission-in-a-docker-container) + - [Docker Tips](#docker-tips) + - [Score your submission](#score-your-submission) + - [Good Luck](#good-luck) ## Workspace set up and installation + To get started you will have to make a few decisions and install the repository along with its dependencies. Specifically: + 1. Decide if you would like to develop your submission in either Pytorch or Jax. - 2. Set up your workstation or VM. We recommend to use a setup similar to the [benchmarking hardware](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#benchmarking-hardware). - The specs on the benchmarking machines are: - - 8 V100 GPUs +2. Set up your workstation or VM. We recommend to use a setup similar to the [benchmarking hardware](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#benchmarking-hardware). + The specs on the benchmarking machines are: + - 8 V100 GPUs - 240 GB in RAM - - 2 TB in storage (for datasets). + - 2 TB in storage (for datasets). 3. Install the algorithmic package and dependencies, see [Installation](./README.md#installation). ## Download the data -The workloads in this benchmark use 6 different datasets across 8 workloads. You may choose to download some or all of the datasets as you are developing your submission, but your submission will be scored across all 8 workloads. For instructions on obtaining and setting up the datasets see [datasets/README](https://github.com/mlcommons/algorithmic-efficiency/blob/main/datasets/README.md#dataset-setup). +The workloads in this benchmark use 6 different datasets across 8 workloads. You may choose to download some or all of the datasets as you are developing your submission, but your submission will be scored across all 8 workloads. For instructions on obtaining and setting up the datasets see [datasets/README](https://github.com/mlcommons/algorithmic-efficiency/blob/main/datasets/README.md#dataset-setup). ## Develop your submission + To develop a submission you will write a python module containing your optimizer algorithm. Your optimizer must implement a set of predefined API methods for the initialization and update steps. ### Set up your directory structure (Optional) + Make a submissions subdirectory to store your submission modules e.g. `algorithmic-effiency/submissions/my_submissions`. ### Coding your submission + You can find examples of sumbission modules under `algorithmic-efficiency/baselines` and `algorithmic-efficiency/reference_algorithms`. \ A submission for the external ruleset will consist of a submission module and a tuning search space definition. + 1. Copy the template submission module `submissions/template/submission.py` into your submissions directory e.g. in `algorithmic-efficiency/my_submissions`. 2. Implement at least the methods in the template submission module. Feel free to use helper functions and/or modules as you see fit. Make sure you adhere to to the competition rules. Check out the guidelines for [allowed submissions](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#disallowed-submissions), [disallowed submissions](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#disallowed-submissions) and pay special attention to the [software dependencies rule](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#software-dependencies). 3. Add a tuning configuration e.g. `tuning_search_space.json` file to your submission directory. For the tuning search space you can either: 1. Define the set of feasible points by defining a value for "feasible_points" for the hyperparameters: - ``` + + ```JSON { "learning_rate": { "feasible_points": 0.999 }, } ``` + For a complete example see [tuning_search_space.json](https://github.com/mlcommons/algorithmic-efficiency/blob/main/reference_algorithms/target_setting_algorithms/imagenet_resnet/tuning_search_space.json). - 2. Define a range of values for quasirandom sampling by specifing a `min`, `max` and `scaling` + 2. Define a range of values for quasirandom sampling by specifing a `min`, `max` and `scaling` keys for the hyperparameter: - ``` + + ```JSON { "weight_decay": { "min": 5e-3, @@ -55,14 +71,15 @@ A submission for the external ruleset will consist of a submission module and a } } ``` - For a complete example see [tuning_search_space.json](https://github.com/mlcommons/algorithmic-efficiency/blob/main/baselines/nadamw/tuning_search_space.json). + For a complete example see [tuning_search_space.json](https://github.com/mlcommons/algorithmic-efficiency/blob/main/baselines/nadamw/tuning_search_space.json). ## Run your submission From your virtual environment or interactively running Docker container run your submission with `submission_runner.py`: -**JAX**: to score your submission on a workload, from the algorithmic-efficency directory run: +**JAX**: to score your submission on a workload, from the algorithmic-efficency directory run: + ```bash python3 submission_runner.py \ --framework=jax \ @@ -73,7 +90,8 @@ python3 submission_runner.py \ --tuning_search_space= ``` -**Pytorch**: to score your submission on a workload, from the algorithmic-efficency directory run: +**Pytorch**: to score your submission on a workload, from the algorithmic-efficency directory run: + ```bash python3 submission_runner.py \ --framework=pytorch \ @@ -85,13 +103,17 @@ python3 submission_runner.py \ ``` #### Pytorch DDP -We recommend using PyTorch's [Distributed Data Parallel (DDP)](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) -when using multiple GPUs on a single node. You can initialize ddp with torchrun. + +We recommend using PyTorch's [Distributed Data Parallel (DDP)](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) +when using multiple GPUs on a single node. You can initialize ddp with torchrun. For example, on single host with 8 GPUs simply replace `python3` in the above command by: + ```bash torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 --standalone --nnodes=1 --nproc_per_node=N_GPUS ``` + So the complete command is: + ```bash torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 \ --standalone \ @@ -109,17 +131,18 @@ torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 \ ### Run your submission in a Docker container The container entrypoint script provides the following flags: + - `-d` dataset: can be 'imagenet', 'fastmri', 'librispeech', 'criteo1tb', 'wmt', or 'ogbg'. Setting this flag will download data if `~/data/` does not exist on the host machine. Required for running a submission. - `-f` framework: can be either 'pytorch' or 'jax'. If you just want to download data, this flag is required for `-d imagenet` since we have two versions of data for imagenet. This flag is also required for running a submission. -- `-s` submission_path: path to submission file on container filesystem. If this flag is set, the container will run a submission, so it is required for running a submission. +- `-s` submission_path: path to submission file on container filesystem. If this flag is set, the container will run a submission, so it is required for running a submission. - `-t` tuning_search_space: path to file containing tuning search space on container filesystem. Required for running a submission. - `-e` experiment_name: name of experiment. Required for running a submission. - `-w` workload: can be 'imagenet_resnet', 'imagenet_jax', 'librispeech_deepspeech', 'librispeech_conformer', 'ogbg', 'wmt', 'fastmri' or 'criteo1tb'. Required for running a submission. - `-m` max_steps: maximum number of steps to run the workload for. Optional. -- `-b` debugging_mode: can be true or false. If `-b ` (debugging_mode) is `true` the main process on the container will persist. - +- `-b` debugging_mode: can be true or false. If `-b` (debugging_mode) is `true` the main process on the container will persist. To run the docker container that will run the submission runner run: + ```bash docker run -t -d \ -v $HOME/data/:/data/ \ @@ -136,31 +159,36 @@ docker run -t -d \ -w \ -b ``` + This will print the container ID to the terminal. If debugging_mode is `true` the main process on the container will persist after finishing the submission runner. #### Docker Tips #### To find the container IDs of running containers + ``` docker ps ``` To see output of the entrypoint script + ``` docker logs ``` To enter a bash session in the container + ``` docker exec -it /bin/bash ``` -## Score your submission +## Score your submission + To produce performance profile and performance table: + ```bash python3 scoring/score_submission.py --experiment_path= --output_dir= ``` - -## Good Luck! +## Good Luck From d326958d561ebe1ecc0b1e3c327712801ed4a43c Mon Sep 17 00:00:00 2001 From: Frank Schneider Date: Fri, 11 Aug 2023 10:00:20 +0200 Subject: [PATCH 02/15] update --- getting_started.md => GETTING_STARTED.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename getting_started.md => GETTING_STARTED.md (100%) diff --git a/getting_started.md b/GETTING_STARTED.md similarity index 100% rename from getting_started.md rename to GETTING_STARTED.md From 705c9a5b3e348a659f7e36e1a0835412ceadf15a Mon Sep 17 00:00:00 2001 From: Frank Date: Thu, 17 Aug 2023 21:28:23 +0200 Subject: [PATCH 03/15] Update and rename CALL_FOR_SUBMISSIONS.md to SUBMISSION_PROCESS_RULES.md --- ...MISSIONS.md => SUBMISSION_PROCESS_RULES.md | 40 +++++++++---------- 1 file changed, 19 insertions(+), 21 deletions(-) rename CALL_FOR_SUBMISSIONS.md => SUBMISSION_PROCESS_RULES.md (81%) diff --git a/CALL_FOR_SUBMISSIONS.md b/SUBMISSION_PROCESS_RULES.md similarity index 81% rename from CALL_FOR_SUBMISSIONS.md rename to SUBMISSION_PROCESS_RULES.md index 4f93c8996..49910e2b1 100644 --- a/CALL_FOR_SUBMISSIONS.md +++ b/SUBMISSION_PROCESS_RULES.md @@ -1,8 +1,8 @@ -# MLCommons™ AlgoPerf: Call for Submissions +# MLCommons™ AlgoPerf: Submission Process Rules -**Version:** 0.0.1 *(Last updated 10 August 2023)* +**Version:** 0.0.1 *(Last updated 17 August 2023)* -- [MLCommons™ AlgoPerf: Call for Submissions](#mlcommons-algoperf-call-for-submissions) +- [MLCommons™ AlgoPerf: Submission Process Rules](#mlcommons-algoperf-submission-process-rules) - [Basics](#basics) - [Schedule](#schedule) - [Dates](#dates) @@ -27,11 +27,11 @@ ## Basics -This is the call for submissions for the AlgoPerf: Training Algorithms Benchmark. The call describes the process of submitting a new training algorithm and details how it will be scored. This call applies to both the external tuning ruleset and the self-tuning ruleset although, for all intents and purposes, they are two separate competitions, with separate leaderboards. +This is the submission process rules for the AlgoPerf: Training Algorithms Benchmark. It describes the process of submitting a new training algorithm and details how it will be scored. This process applies to both the external tuning ruleset and the self-tuning ruleset although, for all intents and purposes, they are two separate competitions, with separate leaderboards. -Three additional documents complement this call for submissions: +Three additional documents complement this document: -- [**Benchmark rules**](RULES.md): While the call for submissions details the *logistical* aspects of submitting to the AlgoPerf: Training Algorithms Benchmark, the [rules document](RULES.md) describes the *scientific* rules of the competition. This includes, for example, how tuning is performed in each ruleset, what types of submissions are allowed, or how the benchmark score is computed. +- [**Benchmark rules**](RULES.md): While the submission process rules detail the *logistical* aspects of submitting to the AlgoPerf: Training Algorithms Benchmark, the [rules document](RULES.md) describes the *scientific* rules of the competition. This includes, for example, how tuning is performed in each ruleset, what types of submissions are allowed, or how the benchmark score is computed. - [**AlgoPerf paper**](https://arxiv.org/abs/2306.07179): The paper titled ["Benchmarking Neural Network Training Algorithms"](https://arxiv.org/abs/2306.07179) motivates the need for the benchmark, explains the rules, and justifies the specific design choices of the AlgoPerf: Training Algorithms Benchmark. Additionally, it evaluates baseline submissions, constructed using various optimizers like Adam, Shampoo, or SAM, on the benchmark, demonstrating the feasibility but also the difficulty of the benchmark. - [**Benchmark codebase**](https://github.com/mlcommons/algorithmic-efficiency): The codebase implements the rules, provides exact specifications of the workloads, and it will ultimately be used to score submissions. @@ -39,22 +39,22 @@ Three additional documents complement this call for submissions: ### Dates -- **Publication of the call for submission: 01. September 2023 (08:00 AM UTC)** -- Registration deadline for submissions: 01. November 2023 (08:00 AM UTC) -- Code freeze for the benchmark codebase: 01. December 2023 (08:00 AM UTC) -- **Submission deadline: 01. January 2024 (08:00 AM UTC)** -- Sampling the held-out workloads and hyperparameters: 02. January 2024 (08:00 AM UTC) -- Deadline for challenging submissions: 01. February 2024 (08:00 AM UTC) +- **Publication of the call for submission: 08. September 2023 (08:00 AM UTC)** +- Registration deadline for submissions: 15. November 2023 (08:00 AM UTC) +- Version freeze for the benchmark codebase: 01. December 2023 (08:00 AM UTC) +- **Submission deadline: 15. January 2024 (08:00 AM UTC)** +- Sampling the held-out workloads and hyperparameters: 16. January 2024 (08:00 AM UTC) +- Deadline for specifying the submission batch sizes for held-out workloads: 23. January 2024 (08:00 AM UTC) - Deadline for self-reporting results: 01. March 2024 (08:00 AM UTC) -- **Publication of all results: 01. April 2024 (08:00 AM UTC)** +- **[extra tentative] Publication of all results: 15. April 2024 (08:00 AM UTC)** The presented dates are subject to change and adjustments may be made by the [MLCommmons Algorithms Working Group](https://mlcommons.org/en/groups/research-algorithms/). -### Code freeze +### Version freeze -The benchmark code base is subject to change after the call for proposals is published. For example, while interacting with the codebase, if submitters encounter bugs or API limitations, they have the option to issue a bug report. This might lead to modifications of the benchmark codebase even after the publication of the call for submissions. +The benchmark code base is subject to change after the call for submissions is published. For example, while interacting with the codebase, if submitters encounter bugs or API limitations, they have the option to issue a bug report. This might lead to modifications of the benchmark codebase even after the publication of the call for submissions. -To ensure that all submitters can develop their submissions based on the exact same code that will be utilized for scoring, we will freeze the benchmark codebase before the submission deadline. By doing so, we level the playing field for everyone involved, ensuring fairness and consistency in the assessment of submissions. The code freeze also involves fixing all package versions of the codebase dependencies, such as JAX, PyTorch, etc. +To ensure that all submitters can develop their submissions based on the same code that will be utilized for scoring, we will freeze the package versions of the codebase dependencies before the submission deadline. By doing so, we level the playing field for everyone involved, ensuring fairness and consistency in the assessment of submissions. We will also try to minimize changes to the benchmark codebase as best as possible. ### Submission deadline @@ -105,7 +105,7 @@ Submitting to the AlgoPerf: Training Algorithms Benchmark requires the following - *Either* a membership in MLCommons *or* a signed [non-member test agreement](https://mlcommons.org/en/policies/). - A signed trademark license agreement, either the member or the non-member version, as appropriate. These license agreements are available upon request to [support@mlcommons.org](mailto:support@mlcommons.org). -We furthermore require all submissions to be made available open source after the submission deadline under the [Apache 2 License](https://www.apache.org/licenses/LICENSE-2.0). +We furthermore require all submissions to be made available open source on the submission deadline under the [Apache 2 License](https://www.apache.org/licenses/LICENSE-2.0). ### Multiple Submission @@ -123,7 +123,7 @@ Submitters can both contribute and request additional baseline algorithms. This Submitters are expected to self-report scores on the full benchmark set before the deadline for self-reporting results. Reporting the scores involves providing all unmodified logs that the benchmarking codebase automatically generates in a separate `/results` directory within the `/submission` folder. For submissions competing in the external tuning ruleset, this includes all the logs of the tuning trials using the [hyperparameter samples provided by the working group](#sampling-held-out-workloads-and-hyperparameters). Note, that while the tuning runs can be performed on non-competition hardware, they still need to show that the "winning hyperparameter" in each study was selected according to the [tuning rules](/RULES.md#external-tuning-ruleset), i.e. the fastest hyperparameter to reach the validation target. Additionally, the logs of the "winning hyperparameter" (or each trial, in the self-tuning ruleset) in each of the five studies need to be computed on the competition hardware, to allow wall-clock runtime comparisons. -Submitters unable to self-fund scoring costs can instead self-report only on the [qualification set of workloads](/RULES.md#qualification-set) that excludes some of the most expensive workloads. Based on this performance on the qualification set, the working group will provide compute to evaluate and score the most promising submissions. Additionally, we encourage researchers to reach out to the [working group](mailto:algorithms@mlcommons.org) to find potential collaborators with the resources to run larger, more comprehensive experiments for both developing and scoring submissions. +Submitters unable to self-fund scoring costs can instead self-report only on the [qualification set of workloads](/RULES.md#qualification-set) that excludes some of the most expensive workloads. Based on this performance on the qualification set, the working group will provide - as funding allows - compute to evaluate and score the most promising submissions. Additionally, we encourage researchers to reach out to the [working group](mailto:algorithms@mlcommons.org) to find potential collaborators with the resources to run larger, more comprehensive experiments for both developing and scoring submissions. #### Verifying scores @@ -141,12 +141,10 @@ The publication of the results will contain two separate leaderboards, one for t ## Sprit jury & challenging submissions -The spirit jury will be responsible for deciding whether a submission violates the "spirit of the rules". Until the deadline for challenging submissions, anyone has the right to challenge a submission, i.e. request a review by the spirit jury to determine whether a submission violates the rules of the competition. To challenge a submission, please write an email to with the subject "[Challenge] *submission_name*". The email can be written anonymously but it is required to link to the challenged submission and a detailed description of why the submission should be reviewed needs to be attached. +The spirit jury will be responsible for deciding whether a submission violates the "spirit of the rules". Submitters may challenge other submissions, i.e. request a review by the spirit jury to determine whether a submission violates the rules of the competition. To challenge a submission, please write an email to with the subject "[Challenge] *submission_name*". The email needs to link to the challenged submission and include a detailed description of why the submission should be reviewed. The spirit jury may then hear the justifications of the submitters, inspect the code, and also ask the submitters to explain how the submission was produced, for example, by disclosing their intermediate experiments. Example cases that might be reviewed by the spirit jury are cases of multiple similar submissions by the same submitter or extensive workload-specific tuning. -In the event of a review, the spirit jury will hold a vote, which will be decided by a simple majority. - **TODO Who is on the Jury?** ## Awards and prize money From 2057116bc38014d961ef8aaa311cf404425ccf14 Mon Sep 17 00:00:00 2001 From: Frank Schneider Date: Tue, 3 Oct 2023 11:24:35 +0200 Subject: [PATCH 04/15] Fix filename --- RULES.md | 41 ++++++++++++++++++++--------------------- 1 file changed, 20 insertions(+), 21 deletions(-) diff --git a/RULES.md b/RULES.md index 7691f4d5c..7225a76b0 100644 --- a/RULES.md +++ b/RULES.md @@ -5,26 +5,25 @@ > **TL;DR** New training algorithms and models can make neural net training faster. > We need a rigorous training time benchmark that measures time to result given a fixed hardware configuration and stimulates algorithmic progress. We propose a [Training Algorithm Track](#training-algorithm-track) and a [Model Track](#model-track) in order to help disentangle optimizer improvements and model architecture improvements. This two-track structure lets us enforce a requirement that new optimizers work well on multiple models and that new models aren't highly specific to particular training hacks. -- [MLCommons™ AlgoPerf: Benchmark Rules](#mlcommons-algoperf-benchmark-rules) - - [Introduction](#introduction) - - [Training Algorithm Track](#training-algorithm-track) - - [Submissions](#submissions) - - [Specification](#specification) - - [Evaluation during training](#evaluation-during-training) - - [Valid submissions](#valid-submissions) - - [Tuning](#tuning) - - [External tuning ruleset](#external-tuning-ruleset) - - [Self-tuning ruleset](#self-tuning-ruleset) - - [Workloads](#workloads) - - [Fixed workloads](#fixed-workloads) - - [Randomized workloads](#randomized-workloads) - - [Qualification set](#qualification-set) - - [Scoring](#scoring) - - [Benchmarking hardware](#benchmarking-hardware) - - [Defining target performance](#defining-target-performance) - - [Benchmark score using performance profiles](#benchmark-score-using-performance-profiles) - - [Benchmark Procedure](#benchmark-procedure) - - [Model Track](#model-track) +- [Introduction](#introduction) +- [Training Algorithm Track](#training-algorithm-track) + - [Submissions](#submissions) + - [Specification](#specification) + - [Evaluation during training](#evaluation-during-training) + - [Valid submissions](#valid-submissions) + - [Tuning](#tuning) + - [External tuning ruleset](#external-tuning-ruleset) + - [Self-tuning ruleset](#self-tuning-ruleset) + - [Workloads](#workloads) + - [Fixed workloads](#fixed-workloads) + - [Randomized workloads](#randomized-workloads) + - [Qualification set](#qualification-set) + - [Scoring](#scoring) + - [Benchmarking hardware](#benchmarking-hardware) + - [Defining target performance](#defining-target-performance) + - [Benchmark score using performance profiles](#benchmark-score-using-performance-profiles) + - [Benchmark Procedure](#benchmark-procedure) +- [Model Track](#model-track) ## Introduction @@ -400,7 +399,7 @@ Our scoring procedure uses the held-out workloads only to penalize submissions t #### Qualification set -The qualification set is designed for submitters that may not have the compute resources to self-report on the full set of [fixed](#fixed-workloads) and [held-out workloads](#randomized-workloads). They may instead self-report numbers on this smaller qualification set. The best-performing submissions may then qualify for compute sponsorship offering a free evaluation on the full benchmark set and therefore the possibility to win [awards and prize money](/CALL_FOR_SUBMISSIONS.md#awards-and-prize-money). +The qualification set is designed for submitters that may not have the compute resources to self-report on the full set of [fixed](#fixed-workloads) and [held-out workloads](#randomized-workloads). They may instead self-report numbers on this smaller qualification set. The best-performing submissions may then qualify for compute sponsorship offering a free evaluation on the full benchmark set and therefore the possibility to win [awards and prize money](/SUBMISSION_PROCESS_RULES.md#awards-and-prize-money). The qualification set consists of the same [fixed workloads](#fixed-workloads) as mentioned above, except for both workloads on *ImageNet*, both workloads on *LibriSpeech*, and the *fastMRI* workload. The remaining three workloads (*WMT*, *Criteo 1TB*, and *OGBG*) form the qualification set. There are no [randomized workloads](#randomized-workloads) in the qualification set. The qualification set of workloads aims to have a combined runtime of roughly 24 hours on the [benchmarking hardware](#benchmarking-hardware). From 2ff243116a703a28c9db3c79fbf4ce55346442d3 Mon Sep 17 00:00:00 2001 From: Frank Schneider Date: Tue, 3 Oct 2023 11:33:22 +0200 Subject: [PATCH 05/15] Multiple submissions & additional baselines --- SUBMISSION_PROCESS_RULES.md | 48 ++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 27 deletions(-) diff --git a/SUBMISSION_PROCESS_RULES.md b/SUBMISSION_PROCESS_RULES.md index 49910e2b1..308f11e05 100644 --- a/SUBMISSION_PROCESS_RULES.md +++ b/SUBMISSION_PROCESS_RULES.md @@ -2,28 +2,26 @@ **Version:** 0.0.1 *(Last updated 17 August 2023)* -- [MLCommons™ AlgoPerf: Submission Process Rules](#mlcommons-algoperf-submission-process-rules) - - [Basics](#basics) - - [Schedule](#schedule) - - [Dates](#dates) - - [Code freeze](#code-freeze) - - [Submission deadline](#submission-deadline) - - [Submission](#submission) - - [Register a submission](#register-a-submission) - - [How to submit](#how-to-submit) - - [Submission repository](#submission-repository) - - [Licensing](#licensing) - - [Multiple Submission](#multiple-submission) - - [Requesting Additional Baselines](#requesting-additional-baselines) - - [Scoring](#scoring) - - [Self-reporting scores](#self-reporting-scores) - - [Verifying scores](#verifying-scores) - - [Sampling held-out workloads and hyperparameters](#sampling-held-out-workloads-and-hyperparameters) - - [Leaderboard](#leaderboard) - - [Sprit jury \& challenging submissions](#sprit-jury--challenging-submissions) - - [Awards and prize money](#awards-and-prize-money) - - [Awards committee](#awards-committee) - - [Ineligibility and conflict of interest](#ineligibility-and-conflict-of-interest) +- [Basics](#basics) +- [Schedule](#schedule) + - [Dates](#dates) + - [Version freeze](#version-freeze) + - [Submission deadline](#submission-deadline) +- [Submission](#submission) + - [Register a submission](#register-a-submission) + - [How to submit](#how-to-submit) + - [Submission repository](#submission-repository) + - [Licensing](#licensing) + - [Multiple Submission](#multiple-submission) +- [Scoring](#scoring) + - [Self-reporting scores](#self-reporting-scores) + - [Verifying scores](#verifying-scores) + - [Sampling held-out workloads and hyperparameters](#sampling-held-out-workloads-and-hyperparameters) + - [Leaderboard](#leaderboard) +- [Sprit jury \& challenging submissions](#sprit-jury--challenging-submissions) +- [Awards and prize money](#awards-and-prize-money) + - [Awards committee](#awards-committee) +- [Ineligibility and conflict of interest](#ineligibility-and-conflict-of-interest) ## Basics @@ -111,11 +109,7 @@ We furthermore require all submissions to be made available open source on the s Our benchmark allows multiple submissions by the same submitter(s). However, we would like to prevent submitters from circumventing the purpose of the benchmark by, for example, submitting dozens of copies of the same submission with slightly different hyperparameters. Such a bulk submission would result in an unfair advantage on the randomized workloads and is not in the spirit of the benchmark. -We encourage multiple submissions if they differ substantially. The spirit jury will be responsible for judging whether the submissions are substantially different. This jury will apply stricter scrutiny to submitters with a larger number of submissions. In this context, a submitter refers to an individual (not the general institution or research group they belong to). The total number of submissions by a submitter is the sum of submissions they contributed to. - -### Requesting Additional Baselines - -Submitters can both contribute and request additional baseline algorithms. This includes existing algorithms with different search spaces or learning rate schedules. These baselines will not be eligible for winning the competition or prize money. +Submitters may submit algorithms marked as *baselines*. These might include existing algorithms with different search spaces or learning rate schedules. These baseline algorithms are not eligible for winning the competition or prize money but they are also not required to be "substantially different" from other submissions by the same submitters. ## Scoring From 3298b4362cf34423eb7428a0d4144a778407a146 Mon Sep 17 00:00:00 2001 From: Frank Schneider Date: Tue, 3 Oct 2023 11:48:15 +0200 Subject: [PATCH 06/15] winning hyperparameter configuration --- SUBMISSION_PROCESS_RULES.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/SUBMISSION_PROCESS_RULES.md b/SUBMISSION_PROCESS_RULES.md index 308f11e05..c176017d5 100644 --- a/SUBMISSION_PROCESS_RULES.md +++ b/SUBMISSION_PROCESS_RULES.md @@ -115,7 +115,7 @@ Submitters may submit algorithms marked as *baselines*. These might include exis ### Self-reporting scores -Submitters are expected to self-report scores on the full benchmark set before the deadline for self-reporting results. Reporting the scores involves providing all unmodified logs that the benchmarking codebase automatically generates in a separate `/results` directory within the `/submission` folder. For submissions competing in the external tuning ruleset, this includes all the logs of the tuning trials using the [hyperparameter samples provided by the working group](#sampling-held-out-workloads-and-hyperparameters). Note, that while the tuning runs can be performed on non-competition hardware, they still need to show that the "winning hyperparameter" in each study was selected according to the [tuning rules](/RULES.md#external-tuning-ruleset), i.e. the fastest hyperparameter to reach the validation target. Additionally, the logs of the "winning hyperparameter" (or each trial, in the self-tuning ruleset) in each of the five studies need to be computed on the competition hardware, to allow wall-clock runtime comparisons. +Submitters are expected to self-report scores on the full benchmark set before the deadline for self-reporting results. Reporting the scores involves providing all unmodified logs that the benchmarking codebase automatically generates in a separate `/results` directory within the `/submission` folder. For submissions competing in the external tuning ruleset, this includes all the logs of the tuning trials using the [hyperparameter samples provided by the working group](#sampling-held-out-workloads-and-hyperparameters). Note, that while the tuning runs can be performed on non-competition hardware, they still need to show that the "winning hyperparameter configuration" in each study was selected according to the [tuning rules](/RULES.md#external-tuning-ruleset), i.e. the fastest hyperparameter to reach the validation target. Additionally, the logs of the "winning hyperparameter configuration" (or each trial, in the self-tuning ruleset) in each of the five studies need to be computed on the competition hardware, to allow wall-clock runtime comparisons. Submitters unable to self-fund scoring costs can instead self-report only on the [qualification set of workloads](/RULES.md#qualification-set) that excludes some of the most expensive workloads. Based on this performance on the qualification set, the working group will provide - as funding allows - compute to evaluate and score the most promising submissions. Additionally, we encourage researchers to reach out to the [working group](mailto:algorithms@mlcommons.org) to find potential collaborators with the resources to run larger, more comprehensive experiments for both developing and scoring submissions. From f9b50481678b6d884b62ed3864d52e3bd690246e Mon Sep 17 00:00:00 2001 From: Frank Schneider Date: Tue, 3 Oct 2023 11:53:23 +0200 Subject: [PATCH 07/15] Specify challenging submissions --- SUBMISSION_PROCESS_RULES.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/SUBMISSION_PROCESS_RULES.md b/SUBMISSION_PROCESS_RULES.md index c176017d5..fce116050 100644 --- a/SUBMISSION_PROCESS_RULES.md +++ b/SUBMISSION_PROCESS_RULES.md @@ -135,7 +135,7 @@ The publication of the results will contain two separate leaderboards, one for t ## Sprit jury & challenging submissions -The spirit jury will be responsible for deciding whether a submission violates the "spirit of the rules". Submitters may challenge other submissions, i.e. request a review by the spirit jury to determine whether a submission violates the rules of the competition. To challenge a submission, please write an email to with the subject "[Challenge] *submission_name*". The email needs to link to the challenged submission and include a detailed description of why the submission should be reviewed. +The spirit jury will be responsible for deciding whether a submission violates the "spirit of the rules". Submitters with specific concerns about a particular submission can request a review by the spirit jury to determine whether a submission violates the rules of the competition. To challenge a submission, please write an email to with the subject "[Challenge] *submission_name*". The email needs to link to the challenged submission and include a detailed description of why the submission should be reviewed. The spirit jury may then hear the justifications of the submitters, inspect the code, and also ask the submitters to explain how the submission was produced, for example, by disclosing their intermediate experiments. Example cases that might be reviewed by the spirit jury are cases of multiple similar submissions by the same submitter or extensive workload-specific tuning. From 0195e79ba8eae5440afb161cbda42aed3a4dfe46 Mon Sep 17 00:00:00 2001 From: Frank Schneider Date: Tue, 3 Oct 2023 12:16:50 +0200 Subject: [PATCH 08/15] Prize money and challenge deadline --- SUBMISSION_PROCESS_RULES.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/SUBMISSION_PROCESS_RULES.md b/SUBMISSION_PROCESS_RULES.md index fce116050..713046d04 100644 --- a/SUBMISSION_PROCESS_RULES.md +++ b/SUBMISSION_PROCESS_RULES.md @@ -56,7 +56,7 @@ To ensure that all submitters can develop their submissions based on the same co ### Submission deadline -With the submission deadline, all submissions need to be available as a *public* repository with the appropriate license (see the [Licensing section](#licensing)). No changes to the submission code are allowed after the submission deadline (with the notable exception of specifying the batch size for the - at that point unknown - held-out workloads). Once the submission deadline has passed, the working group will publish a list of all submitted algorithms, along with their associated repositories. Until the deadline for challenging submissions, anyone has the right to challenge a submission, i.e. request a review by the spirit jury to determine whether a submission violates the rules of the competition, see the [Spirit jury section](#sprit-jury--challenging-submissions). +With the submission deadline, all submissions need to be available as a *public* repository with the appropriate license (see the [Licensing section](#licensing)). No changes to the submission code are allowed after the submission deadline (with the notable exception of specifying the batch size for the - at that point unknown - held-out workloads). Once the submission deadline has passed, the working group will publish a list of all submitted algorithms, along with their associated repositories. Anyone has the right to challenge a submission, i.e. request a review by the spirit jury to determine whether a submission violates the rules of the competition, see the [Spirit jury section](#sprit-jury--challenging-submissions). Directly after the submission deadline, all randomized aspects of the competition are fixed. This includes sampling the held-out workloads from the set of randomized workloads, as well as, sampling the hyperparameters for each submission in the external tuning ruleset (for more details see the [Sampling held-out workloads and hyperparameters section](#sampling-held-out-workloads-and-hyperparameters)). After that, submitters can now ascertain the appropriate batch size of their submission on each held-out workload and self-report scores on either the qualification set or the full benchmarking set of workloads including both fixed and held-out workloads (see the [Self-reporting scores section](#self-reporting-scores)). @@ -135,7 +135,7 @@ The publication of the results will contain two separate leaderboards, one for t ## Sprit jury & challenging submissions -The spirit jury will be responsible for deciding whether a submission violates the "spirit of the rules". Submitters with specific concerns about a particular submission can request a review by the spirit jury to determine whether a submission violates the rules of the competition. To challenge a submission, please write an email to with the subject "[Challenge] *submission_name*". The email needs to link to the challenged submission and include a detailed description of why the submission should be reviewed. +The spirit jury, consisting of selected active members of the working group, will be responsible for deciding whether a submission violates the "spirit of the rules". Submitters with specific concerns about a particular submission can request a review by the spirit jury to determine whether a submission violates the rules of the competition. To challenge a submission, please write an email to with the subject "[Challenge] *submission_name*". The email needs to link to the challenged submission and include a detailed description of why the submission should be reviewed. This request must be made reasonably in advance of the publication deadline to allow the Spirit Jury sufficient time to conduct a thorough review. The spirit jury may then hear the justifications of the submitters, inspect the code, and also ask the submitters to explain how the submission was produced, for example, by disclosing their intermediate experiments. Example cases that might be reviewed by the spirit jury are cases of multiple similar submissions by the same submitter or extensive workload-specific tuning. @@ -147,9 +147,11 @@ An awards committee will award a prize for the "*Best Performance*" in each rule The prize money for "*Best Performance*" in a ruleset is $20,000 each. The winner of the "*Jury Award*" will be awarded $10,000. We reserve the right to split the prize money and distribute it among multiple submissions. +If a submission is ineligible to win prize money it can still win an award. The prize money will then go to the highest-ranking eligible submission. + ### Awards committee -The awards committee will be responsible for awarding prize money to submissions. Members of the awards committee can suggest submissions to be considered for the awards. The committee will vote on the winning submissions, the submission with the most votes in each respective category wins the awards, and if eligible, the prize money. +The awards committee will be responsible for awarding prize money to submissions. The committee will try to reach a consensus on how to award prize money and settle disagreements by majority vote, if necessary. **TODO Who is on the Awards committee?** From e2043eaafbf761b830b7c791d9ef41c532f3b9d9 Mon Sep 17 00:00:00 2001 From: Frank Schneider Date: Tue, 3 Oct 2023 12:19:09 +0200 Subject: [PATCH 09/15] Publication -> Announcement of results --- SUBMISSION_PROCESS_RULES.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/SUBMISSION_PROCESS_RULES.md b/SUBMISSION_PROCESS_RULES.md index 713046d04..7b019e8a3 100644 --- a/SUBMISSION_PROCESS_RULES.md +++ b/SUBMISSION_PROCESS_RULES.md @@ -44,7 +44,7 @@ Three additional documents complement this document: - Sampling the held-out workloads and hyperparameters: 16. January 2024 (08:00 AM UTC) - Deadline for specifying the submission batch sizes for held-out workloads: 23. January 2024 (08:00 AM UTC) - Deadline for self-reporting results: 01. March 2024 (08:00 AM UTC) -- **[extra tentative] Publication of all results: 15. April 2024 (08:00 AM UTC)** +- **[extra tentative] Announcement of all results: 15. April 2024 (08:00 AM UTC)** The presented dates are subject to change and adjustments may be made by the [MLCommmons Algorithms Working Group](https://mlcommons.org/en/groups/research-algorithms/). @@ -131,11 +131,11 @@ The sampling code for the held-out workloads and the hyperparameters is publicly ### Leaderboard -The publication of the results will contain two separate leaderboards, one for the self-tuning and one for the external tuning ruleset. All valid submissions will be ranked by the benchmark score, taking into account all workloads, including the held-out ones. The leaderboard will clearly mark scores that were verified by the working group. +The announcement of the results will contain two separate leaderboards, one for the self-tuning and one for the external tuning ruleset. All valid submissions will be ranked by the benchmark score, taking into account all workloads, including the held-out ones. The leaderboard will clearly mark scores that were verified by the working group. ## Sprit jury & challenging submissions -The spirit jury, consisting of selected active members of the working group, will be responsible for deciding whether a submission violates the "spirit of the rules". Submitters with specific concerns about a particular submission can request a review by the spirit jury to determine whether a submission violates the rules of the competition. To challenge a submission, please write an email to with the subject "[Challenge] *submission_name*". The email needs to link to the challenged submission and include a detailed description of why the submission should be reviewed. This request must be made reasonably in advance of the publication deadline to allow the Spirit Jury sufficient time to conduct a thorough review. +The spirit jury, consisting of selected active members of the working group, will be responsible for deciding whether a submission violates the "spirit of the rules". Submitters with specific concerns about a particular submission can request a review by the spirit jury to determine whether a submission violates the rules of the competition. To challenge a submission, please write an email to with the subject "[Challenge] *submission_name*". The email needs to link to the challenged submission and include a detailed description of why the submission should be reviewed. This request must be made reasonably in advance of the results announcement deadline to allow the Spirit Jury sufficient time to conduct a thorough review. The spirit jury may then hear the justifications of the submitters, inspect the code, and also ask the submitters to explain how the submission was produced, for example, by disclosing their intermediate experiments. Example cases that might be reviewed by the spirit jury are cases of multiple similar submissions by the same submitter or extensive workload-specific tuning. From 0f106a35fcb6b2982d87a16af336c9e5cd87b215 Mon Sep 17 00:00:00 2001 From: Frank Schneider Date: Tue, 3 Oct 2023 12:22:22 +0200 Subject: [PATCH 10/15] Remove todo for spirit jury --- SUBMISSION_PROCESS_RULES.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/SUBMISSION_PROCESS_RULES.md b/SUBMISSION_PROCESS_RULES.md index 7b019e8a3..ff65bf3c8 100644 --- a/SUBMISSION_PROCESS_RULES.md +++ b/SUBMISSION_PROCESS_RULES.md @@ -139,8 +139,6 @@ The spirit jury, consisting of selected active members of the working group, wil The spirit jury may then hear the justifications of the submitters, inspect the code, and also ask the submitters to explain how the submission was produced, for example, by disclosing their intermediate experiments. Example cases that might be reviewed by the spirit jury are cases of multiple similar submissions by the same submitter or extensive workload-specific tuning. -**TODO Who is on the Jury?** - ## Awards and prize money An awards committee will award a prize for the "*Best Performance*" in each ruleset as well as a "*Jury Award*". The prize for the best-performing submission will take into account the [benchmark score](RULES.md#benchmark-score-using-performance-profiles) on the full benchmark. The "*Jury Award*" will favor more out-of-the-box ideas that show great potential, even though the method may not be of practical value with the current landscape of models, software, etc. From e3f445d39f33328c2acf1f40f51d895fd1ccf3a6 Mon Sep 17 00:00:00 2001 From: Frank Schneider Date: Tue, 3 Oct 2023 13:50:15 +0200 Subject: [PATCH 11/15] Update dates --- SUBMISSION_PROCESS_RULES.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/SUBMISSION_PROCESS_RULES.md b/SUBMISSION_PROCESS_RULES.md index ff65bf3c8..2d7a891bd 100644 --- a/SUBMISSION_PROCESS_RULES.md +++ b/SUBMISSION_PROCESS_RULES.md @@ -1,6 +1,6 @@ # MLCommons™ AlgoPerf: Submission Process Rules -**Version:** 0.0.1 *(Last updated 17 August 2023)* +**Version:** 0.0.2 *(Last updated 03 Oktober 2023)* - [Basics](#basics) - [Schedule](#schedule) @@ -37,14 +37,14 @@ Three additional documents complement this document: ### Dates -- **Publication of the call for submission: 08. September 2023 (08:00 AM UTC)** -- Registration deadline for submissions: 15. November 2023 (08:00 AM UTC) -- Version freeze for the benchmark codebase: 01. December 2023 (08:00 AM UTC) -- **Submission deadline: 15. January 2024 (08:00 AM UTC)** -- Sampling the held-out workloads and hyperparameters: 16. January 2024 (08:00 AM UTC) -- Deadline for specifying the submission batch sizes for held-out workloads: 23. January 2024 (08:00 AM UTC) -- Deadline for self-reporting results: 01. March 2024 (08:00 AM UTC) -- **[extra tentative] Announcement of all results: 15. April 2024 (08:00 AM UTC)** +- **Publication of the call for submission: 17. Oktober 2023 (08:00 AM UTC)** +- Registration deadline for submissions: 15. December 2023 (08:00 AM UTC) +- Version freeze for the benchmark codebase: 17. January 2024 (08:00 AM UTC) +- **Submission deadline: 15. February 2024 (08:00 AM UTC)** +- Sampling the held-out workloads and hyperparameters: 16. February 2024 (08:00 AM UTC) +- Deadline for specifying the submission batch sizes for held-out workloads: 28. February 2024 (08:00 AM UTC) +- Deadline for self-reporting results: 10. April 2024 (08:00 AM UTC) +- **[extra tentative] Announcement of all results: 22. May 2024 (08:00 AM UTC)** The presented dates are subject to change and adjustments may be made by the [MLCommmons Algorithms Working Group](https://mlcommons.org/en/groups/research-algorithms/). From f0e280a3f0797838545b1a78250c67fa46c27565 Mon Sep 17 00:00:00 2001 From: Frank Schneider Date: Tue, 3 Oct 2023 18:54:03 +0200 Subject: [PATCH 12/15] Add link to Google Form --- SUBMISSION_PROCESS_RULES.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/SUBMISSION_PROCESS_RULES.md b/SUBMISSION_PROCESS_RULES.md index 2d7a891bd..0049ac3a1 100644 --- a/SUBMISSION_PROCESS_RULES.md +++ b/SUBMISSION_PROCESS_RULES.md @@ -70,7 +70,7 @@ In the following, we describe the logistical steps required to submit a training All submitters need to register an intent to submit before the submission registration deadline. This registration is mandatory, i.e. required for all submissions, but not binding, i.e. you don't have to submit a registered submission. This registration is necessary, to estimate the number of submissions and provide support for potential submitters. -To register a submission, please write an email to with the subject "[Registration] *submission_name*" and the following information: +To register a submission, please fill out this [online form](https://forms.gle/iY1bUhwSjj1JZ4fa9) with the following information - Name of the submission (e.g. name of the algorithm, or any other arbitrary identifier). - Ruleset under which the submission will be scored. From e7a907cf21770eabf8bc4520983b9c9e2c6c5995 Mon Sep 17 00:00:00 2001 From: Frank Schneider Date: Tue, 10 Oct 2023 16:18:10 +0200 Subject: [PATCH 13/15] Rename Jury Award --- SUBMISSION_PROCESS_RULES.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/SUBMISSION_PROCESS_RULES.md b/SUBMISSION_PROCESS_RULES.md index 0049ac3a1..1fc5e7061 100644 --- a/SUBMISSION_PROCESS_RULES.md +++ b/SUBMISSION_PROCESS_RULES.md @@ -1,6 +1,6 @@ # MLCommons™ AlgoPerf: Submission Process Rules -**Version:** 0.0.2 *(Last updated 03 Oktober 2023)* +**Version:** 0.0.3 *(Last updated 10 Oktober 2023)* - [Basics](#basics) - [Schedule](#schedule) @@ -37,7 +37,7 @@ Three additional documents complement this document: ### Dates -- **Publication of the call for submission: 17. Oktober 2023 (08:00 AM UTC)** +- **Publication of the call for submission: 17. October 2023 (08:00 AM UTC)** - Registration deadline for submissions: 15. December 2023 (08:00 AM UTC) - Version freeze for the benchmark codebase: 17. January 2024 (08:00 AM UTC) - **Submission deadline: 15. February 2024 (08:00 AM UTC)** @@ -141,9 +141,9 @@ The spirit jury may then hear the justifications of the submitters, inspect the ## Awards and prize money -An awards committee will award a prize for the "*Best Performance*" in each ruleset as well as a "*Jury Award*". The prize for the best-performing submission will take into account the [benchmark score](RULES.md#benchmark-score-using-performance-profiles) on the full benchmark. The "*Jury Award*" will favor more out-of-the-box ideas that show great potential, even though the method may not be of practical value with the current landscape of models, software, etc. +An awards committee will award a prize for the "*Best Performance*" in each ruleset as well as a "*Innovative Submission Award*". The prize for the best-performing submission will take into account the [benchmark score](RULES.md#benchmark-score-using-performance-profiles) on the full benchmark. The "*Innovative Submission Award*" will favor more out-of-the-box ideas that show great potential, even though the method may not be of practical value with the current landscape of models, software, etc. -The prize money for "*Best Performance*" in a ruleset is $20,000 each. The winner of the "*Jury Award*" will be awarded $10,000. We reserve the right to split the prize money and distribute it among multiple submissions. +The prize money for "*Best Performance*" in a ruleset is $20,000 each. The winner of the "*Innovative Submission Award*" will be awarded $10,000. We reserve the right to split the prize money and distribute it among multiple submissions. If a submission is ineligible to win prize money it can still win an award. The prize money will then go to the highest-ranking eligible submission. From 7976442ccf7ad7cc0f74d8d6e906cf70e89f02fb Mon Sep 17 00:00:00 2001 From: Frank Schneider Date: Tue, 10 Oct 2023 16:19:30 +0200 Subject: [PATCH 14/15] specify ineligible entities and associated institutions --- SUBMISSION_PROCESS_RULES.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/SUBMISSION_PROCESS_RULES.md b/SUBMISSION_PROCESS_RULES.md index 1fc5e7061..a07f664b1 100644 --- a/SUBMISSION_PROCESS_RULES.md +++ b/SUBMISSION_PROCESS_RULES.md @@ -157,10 +157,10 @@ The awards committee will be responsible for awarding prize money to submissions To ensure a fair process and avoid conflicts of interest, some individuals and institutions are ineligible to win prize money. This includes: -- The chairs of the MLCommons Algorithms Working Group (presently *George Dahl* and *Frank Schneider*) and their institutions (currently *Google Inc.* and the *University of Tübingen*) -- All individuals serving on the awards committee and their institutions. +- The chairs of the MLCommons Algorithms Working Group (presently *George Dahl* and *Frank Schneider*) and their associated institutions (currently *Google Inc.* and the *University of Tübingen*) +- All individuals serving on the awards committee and their associated institutions. -A submission with at least one ineligible submitter may still win an award, but the prize money will then be awarded to the top-ranked submission that is eligible for prize money. +A submission with at least one participating ineligible entity may still win an award, but the prize money will then be given to the top-ranked submission that does not contain ineligible entities. Additionally, we require members of the spirit jury to abstain from being involved in a review if: From 1bb385439ba4cc424430b82b29ce91552ddb47df Mon Sep 17 00:00:00 2001 From: Frank Date: Wed, 18 Oct 2023 13:19:43 +0200 Subject: [PATCH 15/15] rephrase "register submission" to "intent to submit" --- SUBMISSION_PROCESS_RULES.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/SUBMISSION_PROCESS_RULES.md b/SUBMISSION_PROCESS_RULES.md index a07f664b1..51aeff043 100644 --- a/SUBMISSION_PROCESS_RULES.md +++ b/SUBMISSION_PROCESS_RULES.md @@ -38,7 +38,7 @@ Three additional documents complement this document: ### Dates - **Publication of the call for submission: 17. October 2023 (08:00 AM UTC)** -- Registration deadline for submissions: 15. December 2023 (08:00 AM UTC) +- Registration deadline to express non-binding intent to submit: 15. December 2023 (08:00 AM UTC) - Version freeze for the benchmark codebase: 17. January 2024 (08:00 AM UTC) - **Submission deadline: 15. February 2024 (08:00 AM UTC)** - Sampling the held-out workloads and hyperparameters: 16. February 2024 (08:00 AM UTC) @@ -66,19 +66,18 @@ For a guide on the technical steps and details on how to write a submission, ple In the following, we describe the logistical steps required to submit a training algorithm to the AlgoPerf: Training Algorithms Benchmark. -### Register a submission +### Register an intent to submit All submitters need to register an intent to submit before the submission registration deadline. This registration is mandatory, i.e. required for all submissions, but not binding, i.e. you don't have to submit a registered submission. This registration is necessary, to estimate the number of submissions and provide support for potential submitters. -To register a submission, please fill out this [online form](https://forms.gle/iY1bUhwSjj1JZ4fa9) with the following information +To register an intent to submission, please fill out this [online form](https://forms.gle/iY1bUhwSjj1JZ4fa9) with the following information - Name of the submission (e.g. name of the algorithm, or any other arbitrary identifier). - Ruleset under which the submission will be scored. -- Name of all submitters associated with this submission. -- Email of all submitters associated with this submission. -- Affiliations of all submitters associated with this submission. +- Name, email, and affiliations of all submitters associated with this submission. +- Interest in compute support. -In return, the submission will be issued a unique **submission ID** that will be used throughout the submission process. +The submission will be issued a unique **submission ID** that will be used throughout the submission process. ### How to submit