diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index b122372a9..33a14f83c 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -61,7 +61,7 @@ Other resources:
 
 
 ## Pre-built Images on Google Cloud Container Registry 
-If you'd like to maintain or use images stored on our Google Cloud Container Registry read this section.
+If you want to maintain or use images stored on our Google Cloud Container Registry read this section.
 You will have to use an authentication helper to set up permissions to access the repository:
 ```
 ARTIFACT_REGISTRY_URL=us-central1-docker.pkg.dev
@@ -82,6 +82,9 @@ Currently maintained images on the repository are:
 - `algoperf_pytorch_dev`
 - `algoperf_both_dev`
 
+To reference the pulled image you will have to use the full `image_path`, e.g. 
+`us-central1-docker.pkg.dev/training-algorithms-external/mlcommons-docker-repo/algoperf_jax_main`.
+
 ### Trigger rebuild and push of maintained images
 To build and push all images (`pytorch`, `jax`, `both`) on maintained branches (`dev`, `main`).
 ```
@@ -97,10 +100,10 @@ You can also use the above script to build images from a different branch.
    ```
 
 ## GCP Data and Experiment Integration
-The Docker entrypoint script can communicate with
+The Docker entrypoint script can transfer data to and from 
 our GCP buckets on our internal GCP project. If
 you are an approved contributor you can get access to these resources to automatically download the datasets and upload experiment results. 
-You can use these features by setting the `-i` flag (for internal collaborator) to 'true' for the Docker entrypoint script.
+You can use these features by setting the `--internal_contributor` flag to 'true' for the Docker entrypoint script.
 
 ### Downloading Data from GCP
 To run a docker container that will only download data (if not found on host)
@@ -111,14 +114,14 @@ docker run -t -d \
 -v $HOME/experiment_runs/logs:/logs \
 --gpus all \
 --ipc=host \
-<docker_image_name> \
--d <dataset> \
--f <framework> \
--b <debugging_mode> \
--i true
+<image_path> \
+--dataset <dataset> \
+--framework <framework> \
+--keep_container_alive <keep_container_alive> \
+--internal_contributor true
 ```
-If debugging_mode is `true` the main process on the container will persist after finishing the data download.
-This run command is useful if you manually want to run a sumbission or look around.
+If `keep_container_alive` is `true` the main process on the container will persist after finishing the data download.
+This run command is useful if you are developing or debugging. 
 
 ### Saving Experiments to GCP
 If you set the internal collaborator mode to true
@@ -132,15 +135,15 @@ docker run -t -d \
 -v $HOME/experiment_runs/logs:/logs \
 --gpus all \
 --ipc=host \
-<docker_image_name> \
--d <dataset> \
--f <framework> \
--s <submission_path> \
--t <tuning_search_space> \
--e <experiment_name> \
--w <workload> \
--b <debug_mode>
--i true \
+<image_path> \
+--dataset <dataset> \
+--framework <framework> \
+--sumbission_path <submission_path> \
+--tuning_search_space <tuning_search_space> \
+--experiment_name <experiment_name> \
+--workload <workload> \
+--keep_container_alive <keep_container_alive>
+--internal_contributor true \
 ```
 
 ## Getting Information from a Container
@@ -171,8 +174,8 @@ docker run -t -d \
 -v $HOME/algorithmic-efficiency:/algorithmic-efficiency \
 --gpus all \
 --ipc=host \
-<docker_image_name> \
--b <debug_mode>
+<image_path> \
+--keep_container_alive true 
 ```
 
 # Submitting PRs 
diff --git a/README.md b/README.md
index c60efae60..54e274a6c 100644
--- a/README.md
+++ b/README.md
@@ -23,9 +23,8 @@
 [MLCommons Algorithmic Efficiency](https://mlcommons.org/en/groups/research-algorithms/) is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models. This repository holds the [competition rules](RULES.md) and the benchmark code to run it. For a detailed description of the benchmark design, see our [paper](https://arxiv.org/abs/2306.07179).
 
 # Table of Contents
-- [Table of Contents](#table-of-contents)
-- [AlgoPerf Benchmark Workloads](#algoperf-benchmark-workloads)
 - [Installation](#installation)
+   - [Python Virtual Environment](#python-virtual-environment)
    - [Docker](#docker)
 - [Getting Started](#getting-started)
 - [Rules](#rules)
@@ -51,7 +50,7 @@ You can install this package and dependences in a [python virtual environment](#
    pip3 install -e '.[pytorch_gpu]' -f 'https://download.pytorch.org/whl/torch_stable.html'
    pip3 install -e '.[full]'
    ```
-##  Virtual environment
+##  Python virtual environment
 Note: Python minimum requirement >= 3.8
 
 To set up a virtual enviornment and install this repository
@@ -74,7 +73,7 @@ To set up a virtual enviornment and install this repository
 
 <details>
 <summary>
-Additional Details
+Per workload installations
 </summary>
 You can also install the requirements for individual workloads, e.g. via
 
@@ -105,15 +104,16 @@ See instructions [here](https://github.com/NVIDIA/nvidia-docker).
 
 2. Build Docker Image
    ```bash
-   cd `algorithmic-efficiency/docker`
-   docker build -t <docker_image_name> . --build-args framework=<framework>
+   cd algorithmic-efficiency/docker
+   docker build -t <docker_image_name> . --build-arg framework=<framework>
    ```
-   The `framework` flag can be either `pytorch`, `jax` or `both`. 
+   The `framework` flag can be either `pytorch`, `jax` or `both`. Specifying the framework will install the framework specific dependencies.
    The `docker_image_name` is arbitrary.
 
 
 ### Running Docker Container (Interactive)
-1. Run detached Docker Container
+To use the Docker container as an interactive virtual environment, you can run a container mounted to your local data and code directories and execute the `bash` program. This may be useful if you are in the process of developing a submission.
+1. Run detached Docker Container. The container_id will be printed if the container is run successfully.
    ```bash
    docker run -t -d \
       -v $HOME/data/:/data/ \
@@ -123,22 +123,22 @@ See instructions [here](https://github.com/NVIDIA/nvidia-docker).
       --gpus all \
       --ipc=host \
       <docker_image_name> 
+      -keep_container_alive true
    ```
-   This will print out a container id. 
 2. Open a bash terminal
    ```bash
    docker exec -it <container_id> /bin/bash
    ```
 
 ### Running Docker Container (End-to-end)
-To run a submission end-to-end in a container see [Getting Started Document](./getting_started.md#run-your-submission-in-a-docker-container).
+To run a submission end-to-end in a containerized environment see [Getting Started Document](./getting_started.md#run-your-submission-in-a-docker-container).
 
 # Getting Started
 For instructions on developing and scoring your own algorithm in the benchmark see [Getting Started Document](./getting_started.md).
 ## Running a workload
 To run a submission directly by running a Docker container, see [Getting Started Document](./getting_started.md#run-your-submission-in-a-docker-container).
 
-Alternatively from a your virtual environment or interactively running Docker container `submission_runner.py` run:
+From your virtual environment or interactively running Docker container run:
 
 **JAX**
 
diff --git a/algorithmic_efficiency/workloads/librispeech_conformer/workload.py b/algorithmic_efficiency/workloads/librispeech_conformer/workload.py
index 985f4b0eb..dc7fb912b 100644
--- a/algorithmic_efficiency/workloads/librispeech_conformer/workload.py
+++ b/algorithmic_efficiency/workloads/librispeech_conformer/workload.py
@@ -19,14 +19,14 @@ def has_reached_validation_target(self, eval_result: Dict[str,
 
   @property
   def validation_target_value(self) -> float:
-    return 0.078477
+    return 0.084952
 
   def has_reached_test_target(self, eval_result: Dict[str, float]) -> bool:
     return eval_result['test/wer'] < self.test_target_value
 
   @property
   def test_target_value(self) -> float:
-    return 0.046973
+    return 0.053000
 
   @property
   def loss_type(self) -> spec.LossType:
@@ -67,13 +67,13 @@ def train_stddev(self):
 
   @property
   def max_allowed_runtime_sec(self) -> int:
-    return 101_780  # ~28 hours
+    return 61_068  # ~17 hours
 
   @property
   def eval_period_time_sec(self) -> int:
-    return 40 * 60  # 40m
+    return 24 * 60
 
   @property
   def step_hint(self) -> int:
     """Max num steps the baseline algo was given to reach the target."""
-    return 133_333
+    return 80_000
diff --git a/algorithmic_efficiency/workloads/librispeech_deepspeech/workload.py b/algorithmic_efficiency/workloads/librispeech_deepspeech/workload.py
index 7a836cf94..f9fd30b0d 100644
--- a/algorithmic_efficiency/workloads/librispeech_deepspeech/workload.py
+++ b/algorithmic_efficiency/workloads/librispeech_deepspeech/workload.py
@@ -5,17 +5,17 @@ class BaseDeepspeechLibrispeechWorkload(workload.BaseLibrispeechWorkload):
 
   @property
   def validation_target_value(self) -> float:
-    return 0.1162
+    return 0.118232
 
   @property
   def test_target_value(self) -> float:
-    return 0.068093
+    return 0.073397
 
   @property
   def step_hint(self) -> int:
     """Max num steps the baseline algo was given to reach the target."""
-    return 80_000
+    return 48_000
 
   @property
   def max_allowed_runtime_sec(self) -> int:
-    return 92_509  # ~26 hours
+    return 55_506  # ~15.4 hours
diff --git a/datasets/dataset_setup.py b/datasets/dataset_setup.py
index bc4502a24..0227e728e 100644
--- a/datasets/dataset_setup.py
+++ b/datasets/dataset_setup.py
@@ -76,7 +76,6 @@
 from absl import flags
 from absl import logging
 import requests
-import tensorflow as tf
 import tensorflow_datasets as tfds
 from torchvision.datasets import CIFAR10
 import tqdm
@@ -84,9 +83,9 @@
 IMAGENET_TRAIN_TAR_FILENAME = 'ILSVRC2012_img_train.tar'
 IMAGENET_VAL_TAR_FILENAME = 'ILSVRC2012_img_val.tar'
 
-FASTMRI_TRAIN_TAR_FILENAME = 'knee_singlecoil_train.tar'
-FASTMRI_VAL_TAR_FILENAME = 'knee_singlecoil_val.tar'
-FASTMRI_TEST_TAR_FILENAME = 'knee_singlecoil_test.tar'
+FASTMRI_TRAIN_TAR_FILENAME = 'knee_singlecoil_train.tar.xz'
+FASTMRI_VAL_TAR_FILENAME = 'knee_singlecoil_val.tar.xz'
+FASTMRI_TEST_TAR_FILENAME = 'knee_singlecoil_test.tar.xz'
 
 from algorithmic_efficiency.workloads.wmt import tokenizer
 from algorithmic_efficiency.workloads.wmt.input_pipeline import \
@@ -132,11 +131,11 @@
 
 flags.DEFINE_string(
     'data_dir',
-    None,
+    '~/data',
     'The path to the folder where datasets should be downloaded.')
 flags.DEFINE_string(
     'temp_dir',
-    '/tmp',
+    '/tmp/mlcommons',
     'A local path to a folder where temp files can be downloaded.')
 flags.DEFINE_string(
     'imagenet_train_url',
@@ -162,6 +161,12 @@
     'Only necessary if you want this script to `wget` the FastMRI validation '
     'split. If not, you can supply the path to --data_dir in '
     'submission_runner.py.')
+flags.DEFINE_string(
+    'fastmri_knee_singlecoil_test_url',
+    None,
+    'Only necessary if you want this script to `wget` the FastMRI validation '
+    'split. If not, you can supply the path to --data_dir in '
+    'submission_runner.py.')
 
 flags.DEFINE_integer(
     'num_decompression_threads',
@@ -169,9 +174,11 @@
     'The number of threads to use in parallel when decompressing.')
 
 flags.DEFINE_string('framework', None, 'Can be either jax or pytorch.')
-flags.DEFINE_boolean('train_tokenizer', True, 'Train Librispeech tokenizer.')
+
 FLAGS = flags.FLAGS
 
+os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
+
 
 def _maybe_mkdir(d):
   if not os.path.exists(d):
@@ -193,9 +200,15 @@ def _maybe_prompt_for_deletion(paths, interactive_deletion):
     logging.info('Skipping deletion.')
 
 
-def _download_url(url, data_dir):
+def _download_url(url, data_dir, name=None):
+
   data_dir = os.path.expanduser(data_dir)
-  file_path = os.path.join(data_dir, url.split('/')[-1])
+  if not name:
+    file_path = os.path.join(data_dir, url.split('/')[-1])
+  else:
+    file_path = os.path.join(data_dir, name)
+  print(f"about to download to {file_path}")
+
   response = requests.get(url, stream=True, timeout=600)
   total_size_in_bytes = int(response.headers.get('Content-length', 0))
   total_size_in_mib = total_size_in_bytes / (2**20)
@@ -282,61 +295,85 @@ def download_cifar(data_dir, framework):
     raise ValueError('Invalid value for framework: {}'.format(framework))
 
 
+def extract_filename_from_url(url, start_str='knee', end_str='.xz'):
+  """ the url filenames are sometimes couched within a urldefense+aws access id etc. string.
+    unfortunately querying the content disposition in requests fails (not provided)...
+    so fast search is done here within the url
+    """
+  failure = -1
+  start = url.find(start_str)
+  end = url.find(end_str)
+  if failure in (start, end):
+    raise ValueError(
+        f"Unable to locate filename wrapped in {start}--{end} in {url}")
+  end += len(end_str)  # make it inclusive
+  return url[start:end]
+
+
 def download_fastmri(data_dir,
                      fastmri_train_url,
                      fastmri_val_url,
                      fastmri_test_url):
 
   data_dir = os.path.join(data_dir, 'fastmri')
-
   # Download fastmri train dataset
+  knee_train_filename = extract_filename_from_url(fastmri_train_url)
   logging.info(
       'Downloading fastmri train dataset from {}'.format(fastmri_train_url))
-  _download_url(url=fastmri_train_url, data_dir=data_dir).download()
+  _download_url(
+      url=fastmri_train_url, data_dir=data_dir, name=knee_train_filename)
 
   # Download fastmri val dataset
+  knee_val_filename = extract_filename_from_url(fastmri_val_url)
   logging.info(
       'Downloading fastmri val dataset from {}'.format(fastmri_val_url))
-  _download_url(url=fastmri_val_url, data_dir=data_dir).download()
+  _download_url(url=fastmri_val_url, data_dir=data_dir, name=knee_val_filename)
 
   # Download fastmri test dataset
+  knee_test_filename = extract_filename_from_url(fastmri_test_url)
+
   logging.info(
       'Downloading fastmri test dataset from {}'.format(fastmri_test_url))
-  _download_url(url=fastmri_test_url, data_dir=data_dir).download()
+  _download_url(
+      url=fastmri_test_url, data_dir=data_dir, name=knee_test_filename)
+  return data_dir
 
 
 def extract(source, dest):
   if not os.path.exists(dest):
     os.path.makedirs(dest)
-
+  print(f"extracting {source} to {dest}")
   tar = tarfile.open(source)
+  print(f"opened tar")
+
   tar.extractall(dest)
   tar.close()
 
 
-def setup_fastmri(data_dir):
-  train_tar_file_path = os.path.join(data_dir, FASTMRI_TRAIN_TAR_FILENAME)
-  val_tar_file_path = os.path.join(data_dir, FASTMRI_VAL_TAR_FILENAME)
-  test_tar_file_path = os.path.join(data_dir, FASTMRI_TEST_TAR_FILENAME)
+def setup_fastmri(data_dir, src_data_dir):
+
+  train_tar_file_path = os.path.join(src_data_dir, FASTMRI_TRAIN_TAR_FILENAME)
+  val_tar_file_path = os.path.join(src_data_dir, FASTMRI_VAL_TAR_FILENAME)
+  test_tar_file_path = os.path.join(src_data_dir, FASTMRI_TEST_TAR_FILENAME)
 
   # Make train, val and test subdirectories
   fastmri_data_dir = os.path.join(data_dir, 'fastmri')
   train_data_dir = os.path.join(fastmri_data_dir, 'train')
-  os.makedirs(train_data_dir)
+  os.makedirs(train_data_dir, exist_ok=True)
   val_data_dir = os.path.join(fastmri_data_dir, 'val')
-  os.makedirsval_data_dir()
+  os.makedirs(val_data_dir, exist_ok=True)
   test_data_dir = os.path.join(fastmri_data_dir, 'test')
-  os.makedirs(test_data_dir)
+  os.makedirs(test_data_dir, exist_ok=True)
 
   # Unzip tar file into subdirectories
-  logging.info('Unzipping {} to {}'.format(train_tar_file_path,
-                                           fastmri_data_dir))
+  logging.info('Unzipping {} to {}'.format(train_tar_file_path, train_data_dir))
   extract(train_tar_file_path, train_data_dir)
-  logging.info('Unzipping {} to {}'.format(val_tar_file_path, fastmri_data_dir))
+  logging.info('Unzipping {} to {}'.format(val_tar_file_path, val_data_dir))
   extract(val_tar_file_path, val_data_dir)
-  logging.info('Unzipping {} to {}'.format(val_tar_file_path, fastmri_data_dir))
+  logging.info('Unzipping {} to {}'.format(test_tar_file_path, test_data_dir))
   extract(test_tar_file_path, test_data_dir)
-  logging.info('Set up imagenet dataset for jax framework complete')
+  logging.info('Set up fastMRI dataset complete')
+  print(f"extraction completed! ")
 
 
 def download_imagenet(data_dir, imagenet_train_url, imagenet_val_url):
@@ -458,17 +495,26 @@ def download_imagenet_v2(data_dir):
       data_dir=data_dir).download_and_prepare()
 
 
-def download_librispeech(dataset_dir, tmp_dir, train_tokenizer):
+def download_librispeech(dataset_dir, tmp_dir):
   # After extraction the result is a folder named Librispeech containing audio
   # files in .flac format along with transcripts containing name of audio file
   # and corresponding transcription.
-  tmp_librispeech_dir = os.path.join(tmp_dir, 'LibriSpeech')
+  tmp_librispeech_dir = os.path.join(dataset_dir, 'librispeech')
+  extracted_data_dir = os.path.join(tmp_librispeech_dir, 'LibriSpeech')
+  final_data_dir = os.path.join(dataset_dir, 'librispeech_processed')
+
   _maybe_mkdir(tmp_librispeech_dir)
 
   for split in ['dev', 'test']:
     for version in ['clean', 'other']:
-      wget_cmd = f'wget http://www.openslr.org/resources/12/{split}-{version}.tar.gz -O - | tar xz'  # pylint: disable=line-too-long
-      subprocess.Popen(wget_cmd, shell=True, cwd=tmp_dir).communicate()
+      wget_cmd = (
+          f'wget --directory-prefix={tmp_librispeech_dir} '
+          f'http://www.openslr.org/resources/12/{split}-{version}.tar.gz')
+      subprocess.Popen(wget_cmd, shell=True).communicate()
+      tar_path = os.path.join(tmp_librispeech_dir, f'{split}-{version}.tar.gz')
+      subprocess.Popen(
+          f'tar xzvf {tar_path} --directory {tmp_librispeech_dir}',
+          shell=True).communicate()
 
   tars = [
       'raw-metadata.tar.gz',
@@ -477,19 +523,23 @@ def download_librispeech(dataset_dir, tmp_dir, train_tokenizer):
       'train-other-500.tar.gz',
   ]
   for tar_filename in tars:
-    wget_cmd = f'wget http://www.openslr.org/resources/12/{tar_filename} -O - | tar xz '  # pylint: disable=line-too-long
-    subprocess.Popen(wget_cmd, shell=True, cwd=tmp_dir).communicate()
+    wget_cmd = (f'wget --directory-prefix={tmp_librispeech_dir} '
+                f'http://www.openslr.org/resources/12/{tar_filename}')
+    subprocess.Popen(wget_cmd, shell=True).communicate()
+    tar_path = os.path.join(tmp_librispeech_dir, tar_filename)
+    subprocess.Popen(
+        f'tar xzvf {tar_path} --directory {tmp_librispeech_dir}',
+        shell=True).communicate()
+
+  tokenizer_vocab_path = os.path.join(extracted_data_dir, 'spm_model.vocab')
 
-  if train_tokenizer:
-    tokenizer_vocab_path = librispeech_tokenizer.run(
-        train=True, data_dir=tmp_librispeech_dir)
+  if not os.path.exists(tokenizer_vocab_path):
+    librispeech_tokenizer.run(train=True, data_dir=extracted_data_dir)
 
-    # Preprocess data.
-    librispeech_dir = os.path.join(dataset_dir, 'librispeech')
-    librispeech_preprocess.run(
-        input_dir=tmp_librispeech_dir,
-        output_dir=librispeech_dir,
-        tokenizer_vocab_path=tokenizer_vocab_path)
+  librispeech_preprocess.run(
+      input_dir=extracted_data_dir,
+      output_dir=final_data_dir,
+      tokenizer_vocab_path=tokenizer_vocab_path)
 
 
 def download_mnist(data_dir):
@@ -541,21 +591,26 @@ def main(_):
     download_mnist(data_dir)
 
   if FLAGS.all or FLAGS.fastmri:
+    print(f"starting fastMRI download...\n")
     logging.info('Downloading FastMRI...')
     knee_singlecoil_train_url = FLAGS.fastmri_knee_singlecoil_train_url
     knee_singlecoil_val_url = FLAGS.fastmri_knee_singlecoil_val_url
     knee_singlecoil_test_url = FLAGS.fastmri_knee_singlecoil_test_url
-    if (knee_singlecoil_train_url is None or knee_singlecoil_val_url is None or
-        knee_singlecoil_val_url is None):
+    if None in (knee_singlecoil_train_url,
+                knee_singlecoil_val_url,
+                knee_singlecoil_test_url):
       raise ValueError(
-          'Must provide both --fastmri_knee_singlecoil_{train,val}_url to '
-          'download the FastMRI dataset. Sign up for the URLs at '
+          f'Must provide three --fastmri_knee_singlecoil_[train,val,test]_url to '
+          'download the FastMRI dataset.\nSign up for the URLs at '
           'https://fastmri.med.nyu.edu/.')
-    download_fastmri(data_dir,
-                     tmp_dir,
-                     knee_singlecoil_train_url,
-                     knee_singlecoil_val_url,
-                     knee_singlecoil_test_url)
+
+    updated_data_dir = download_fastmri(data_dir,
+                                        knee_singlecoil_train_url,
+                                        knee_singlecoil_val_url,
+                                        knee_singlecoil_test_url)
+
+    print(f"fastMRI download completed. Extracting...")
+    setup_fastmri(data_dir, updated_data_dir)
 
   if FLAGS.all or FLAGS.imagenet:
     flags.mark_flag_as_required('imagenet_train_url')
@@ -577,7 +632,7 @@ def main(_):
 
   if FLAGS.all or FLAGS.librispeech:
     logging.info('Downloading Librispeech...')
-    download_librispeech(data_dir, tmp_dir, train_tokenizer=True)
+    download_librispeech(data_dir, tmp_dir)
 
   if FLAGS.all or FLAGS.cifar:
     logging.info('Downloading CIFAR...')
diff --git a/datasets/librispeech_preprocess.py b/datasets/librispeech_preprocess.py
index 2ce8d79ca..0968f2a00 100644
--- a/datasets/librispeech_preprocess.py
+++ b/datasets/librispeech_preprocess.py
@@ -9,7 +9,6 @@
 import threading
 import time
 
-from absl import flags
 from absl import logging
 import numpy as np
 import pandas as pd
@@ -23,15 +22,6 @@
 exists = tf.io.gfile.exists
 rename = tf.io.gfile.rename
 
-flags.DEFINE_string('raw_input_dir',
-                    '',
-                    'Path to the raw training data directory.')
-flags.DEFINE_string('output_dir', '', 'Dir to write the processed data to.')
-flags.DEFINE_string('tokenizer_vocab_path',
-                    '',
-                    'Path to sentence piece tokenizer vocab file.')
-FLAGS = flags.FLAGS
-
 TRANSCRIPTION_MAX_LENGTH = 256
 AUDIO_MAX_LENGTH = 320000
 
@@ -178,11 +168,3 @@ def run(input_dir, output_dir, tokenizer_vocab_path):
                        'expected count: {} vs expected {}'.format(
                            num_entries, librispeech_example_counts[subset]))
     example_ids.to_csv(os.path.join(output_dir, f'{subset}.csv'))
-
-
-def main():
-  run(FLAGS.input_dir, FLAGS.output_dir, FLAGS.tokenizer_vocab_path)
-
-
-if __name__ == '__main__':
-  main()
diff --git a/datasets/librispeech_tokenizer.py b/datasets/librispeech_tokenizer.py
index 71aa719c2..e701d59d4 100644
--- a/datasets/librispeech_tokenizer.py
+++ b/datasets/librispeech_tokenizer.py
@@ -8,7 +8,6 @@
 import tempfile
 from typing import Dict
 
-from absl import flags
 from absl import logging
 import sentencepiece as spm
 import tensorflow as tf
@@ -21,13 +20,6 @@
 
 Features = Dict[str, tf.Tensor]
 
-flags.DEFINE_string('input_dir', '', 'Path to training data directory.')
-flags.DEFINE_boolean(
-    'train',
-    False,
-    'Whether to train a new tokenizer or load existing one to test.')
-FLAGS = flags.FLAGS
-
 
 def dump_chars_for_training(data_folder, splits, maxchars: int = int(1e7)):
   char_count = 0
@@ -118,13 +110,15 @@ def load_tokenizer(model_filepath):
 
 def run(train, data_dir):
   logging.info('Data dir: %s', data_dir)
+  vocab_path = os.path.join(data_dir, 'spm_model.vocab')
+  logging.info('vocab_path = ', vocab_path)
 
   if train:
     logging.info('Training...')
     splits = ['train-clean-100']
-    return train_tokenizer(data_dir, splits)
+    train_tokenizer(data_dir, splits, model_path=vocab_path)
   else:
-    tokenizer = load_tokenizer(os.path.join(data_dir, 'spm_model.vocab'))
+    tokenizer = load_tokenizer(vocab_path)
     test_input = 'OPEN SOURCE ROCKS'
     tokens = tokenizer.tokenize(test_input)
     detokenized = tokenizer.detokenize(tokens).numpy().decode('utf-8')
@@ -135,11 +129,3 @@ def run(train, data_dir):
 
     if detokenized == test_input:
       logging.info('Tokenizer working correctly!')
-
-
-def main():
-  run(FLAGS.train, FLAGS.data_dir)
-
-
-if __name__ == '__main__':
-  main()
diff --git a/docker/Dockerfile b/docker/Dockerfile
index d2d946851..d178d6bf1 100644
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@@ -6,12 +6,13 @@
 
 # To build Docker image
 FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04
+ARG DEBIAN_FRONTEND=noninteractive
 
 # Installing machine packages
 RUN echo "Setting up machine"
 RUN apt-get update
 RUN apt-get install -y curl tar
-RUN apt-get install -y git python3 pip wget
+RUN apt-get install -y git python3 pip wget ffmpeg
 RUN apt-get install libtcmalloc-minimal4
 RUN export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4
 
diff --git a/docker/README.md b/docker/README.md
deleted file mode 100644
index 7fac6df77..000000000
--- a/docker/README.md
+++ /dev/null
@@ -1,155 +0,0 @@
-## Docker Instructions
-
-### General 
-
-#### Prerequisites
-You may have to install the NVIDIA Container Toolkit so that the containers can locate the NVIDIA drivers and GPUs.
-
-If you are working with a GCP VM with Container Optimized OS setup, you will have to mount the NVIDIA drivers and devices on 
-`docker run` command (see below).
-
-#### Building Image
-
-From `algorithmic-efficiency/docker/` run:
-```
-docker build -t <docker_image_name> .
-```
-
-#### Container Entry Point Flags
-You can run a container that will download data to the host VM (if not already downloaded), run a submission or both. If you only want to download data you can run the container with just the `-d` and `-f` flags (`-f` is only required if `-d` is 'imagenet'). If you want to run a submission the `-d`, `-f`, `-s`, `-t`, `-e`, `-w` flags are all required to locate the data and run the submission script.
-
-The container entrypoint script provides the following flags:
-- `-d` dataset: can be 'imagenet', 'fastmri', 'librispeech', 'criteo1tb', 'wmt', or 'ogbg'. Setting this flag will download data if `~/data/<dataset>` does not exist on the host machine. Required for running a submission.
-- `-f` framework: can be either 'pytorch' or 'jax'. If you just want to download data, this flag is required for `-d imagenet` since we have two versions of data for imagenet. This flag is also required for running a submission.
-- `-s` submission_path: path to submission file on container filesystem. If this flag is set, the container will run a submission, so it is required for running a submission. 
-- `-t` tuning_search_space: path to file containing tuning search space on container filesystem. Required for running a submission.
-- `-e` experiment_name: name of experiment. Required for running a submission.
-- `-w` workload: can be 'imagenet_resnet', 'imagenet_jax', 'librispeech_deepspeech', 'librispeech_conformer', 'ogbg', 'wmt', 'fastmri' or 'criteo1tb'. Required for running a submission.
-- `-m` max_steps: maximum number of steps to run the workload for. Optional.
-- `-b` debugging_mode: can be true or false. If `-b ` (debugging_mode) is `true` the main process on the container will persist.
-
-
-#### Starting container w end-to-end submission runner
-To run the docker container that will download data (if not found host) and run a submisison run:
-```
-docker run -t -d \
--v $HOME_DIR/data/:/data/ \
--v $HOME_DIR/experiment_runs/:/experiment_runs \
--v $HOME_DIR/experiment_runs/logs:/logs \
---gpus all \
---ipc=host \
-<docker_image_name> \
--d <dataset> \
--f <framework> \
--s <submission_path> \
--t <tuning_search_space> \
--e <experiment_name> \
--w <workload> \
--b <debugging_mode> \
-```
-This will print the container ID to the terminal.
-If debugging_mode is `true` the main process on the container will persist after finishing the submission runner.
-
-
-#### Starting a container with automated data download
-To run a docker container that will only download data (if not found on host):
-```
-docker run -t -d \
--v $HOME_DIR/data/:/data/ \
--v $HOME_DIR/experiment_runs/:/experiment_runs \
--v $HOME_DIR/experiment_runs/logs:/logs \
---gpus all \
---ipc=host \
-<docker_image_name> \
--d <dataset> \
--f <framework> \
--b <debugging_mode> \
-```
-If debugging_mode is `true` the main process on the container will persist after finishing the data download.
-This run command is useful if you manually want to run a sumbission or look around.
-
-#### Interacting with the container
-To find the container IDs of running containers run:
-```
-docker ps 
-```
-
-To see the status of the data download or submission runner run: 
-```
-docker logs <container_id> 
-```
-
-To enter a bash session in the container run:
-```
-docker exec -it <container_id> /bin/bash
-```
-
-## GCP Integration
-If you want to run containers on GCP VMs or store and retrieve Docker images from the Google Cloud Container Registry, please read ahead.
-
-### Google Cloud Container Registry 
-If you'd like to maintain or use images stored on our Google Cloud Container Registry read this section.
-You will have to use an authentication helper to set up permissions to access the repository:
-```
-ARTIFACT_REGISTRY_URL=us-central1-docker.pkg.dev
-gcloud auth configure-docker $ARTIFACT_REGISTRY_URL
-```
-
-To push built image to artifact registry on GCP do this : 
-```
-PROJECT=training-algorithms-external
-REPO=mlcommons-docker-repo
-
-docker tag base_image:latest us-central1-docker.pkg.dev/$PROJECT/$REPO/base_image:latest
-docker push us-central1-docker.pkg.dev/$PROJECT/$REPO/base_image:latest
-```
-
-To pull the latest image to GCP run:
-```
-PROJECT=training-algorithms-external
-REPO=mlcommons-docker-repo
-docker pull us-central1-docker.pkg.dev/$PROJECT/$REPO/base_image:latest
-```
-
-### Setting up a Linux VM
-If you'd like to use a Linux VM, you will have to install the correct GPU drivers and the NVIDIA Docker toolkit.
-We recommmend to use the Deep Learning on Linux image. Further instructions are based on that.
-
-#### Installing GPU Drivers
-You can use the `scripts/cloud-startup.sh` as a startup script for the VM. This will automate the installation of the
-NVIDIA GPU Drivers and NVIDIA Docker toolkit.
-
-#### Authentication for Google Cloud Container Registry
-To access the Google Cloud Container Registry, you will have to authenticate to the repository whenever you use Docker.
-Use the gcloud credential helper as documented [here](https://cloud.google.com/artifact-registry/docs/docker/pushing-and-pulling#cred-helper).
-
-### Setting up a Container Optimized OS VMs on GCP
-You may want use a [Container Optimized OS](https://cloud.google.com/container-optimized-os/docs) to run submissions. 
-However, the Container Optimized OS does not support CUDA 11.7. If you go down this route,
-please adjust the base image in the Dockerfile to CUDA 11.6. 
-We don't guarantee compatibility of the `algorithmic_efficiency` package with CUDA 11.6 though.
-
-#### Installing GPU Drivers
-To install NVIDIA GPU drivers on container optimized OS you can use the `cos` installer.
-Follow instructions [here](https://cloud.google.com/container-optimized-os/docs/how-to/run-gpus)
-
-#### Authentication for Google Cloud Container Registry
-To access the Google Cloud Container Registry, you will have to authenticate to the repository whenever you use Docker.
-Use a standalone credential helper as documented [here](https://cloud.google.com/artifact-registry/docs/docker/pushing-and-pulling#cred-helper).
-
-#### cloud-init script
-You can automate installation GPU Drivers and authentication for Cloud Container Registry with a cloud-init script, by passing
-the content of the script as `user-data` in the VMs metadata.
-
-
-## Other Tips and tricks
-
-How to avoid sudo for docker ?
-
-```
-sudo groupadd docker
-sudo usermod -aG docker $USER
-newgrp docker
-```
-
-Recommendation : Use a GCP CPU VM to build mlcommons docker image. Do not use cloudshell to build mlcommons docker images as the cloudshell provisioned machine runs out of storage
diff --git a/docker/build_docker_images.sh b/docker/build_docker_images.sh
index 4a5ae08dc..f3c891c6f 100644
--- a/docker/build_docker_images.sh
+++ b/docker/build_docker_images.sh
@@ -1,6 +1,6 @@
 # Bash script to build and push dev docker images to artifact repo
 # Usage:
-# bash build_docker_images.sh -b <git_branch>
+#   bash build_docker_images.sh -b <git_branch>
 
 while getopts b: flag
 do
diff --git a/docker/scripts/startup.sh b/docker/scripts/startup.sh
index c76340397..cdd2c649c 100644
--- a/docker/scripts/startup.sh
+++ b/docker/scripts/startup.sh
@@ -7,26 +7,107 @@
 # our algorithmic-efficiency repo. To do so 
 # set the -i flag to true.
 
+function usage() {
+    cat <<USAGE
+
+    Usage:
+        $0  [--dataset dataset] [--framework framework] [--submission_path submission_path]
+            [--tuning_search_space tuning_search_space] [--experiment_name experiment_name] 
+            [--workload workload] [--max_global_steps max_global_steps] [--rsync_data rsync_data]
+            [--internal_contributor true]        
+     
+    Options:
+        -d | --dataset:                 Can be imagenet, criteo1tb, ogbg, fastmri, wmt, librispeech.
+        -f | --framework:               Can be jax or pytorch.
+        -s | --submission_path:         Path to submission module. If relative path, from algorithmic-efficiency top directory.
+        -t | --tuning_search_space:     Path to tuning search space. If relative path, from algorithmic-efficiency top directory.
+        -e | --experiment_name:         Name of experiment.
+        -w | --workload:                Can be imagenet_resnet, imagenet_vit, criteo1tb, fastmri,
+                                        wmt, librispeech_deepspeech, librispeech_conformer.
+        -a | --keep_container_alive:    If true, docker container will be kept alive. Useful for 
+                                        developing or debugging.
+        -m | --max_global_steps:        Maximum number of global steps for submission.
+        -o | --overwrite:               If true, overwrite the experiment directory with the identical
+                                        experiment name.
+        -c | --save_checkpoints         If true, save all checkpoints from all evals. 
+        -r | --rsync_data:              If true and if --internal_contributor mode is true, rsync data
+                                        from internal GCP bucket.
+        -i | --internal_contributor:    If true, allow rsync of data and transfer of experiment results 
+                                        with GCP project.
+USAGE
+    exit 1
+}
+
 # Defaults
-DEBUG_MODE="false"
-
-while getopts d:f:s:t:e:w:b:m:o:c:r:i: flag
-do
-    case "${flag}" in
-        d) DATASET=${OPTARG};;
-        f) FRAMEWORK=${OPTARG};;
-        s) SUBMISSION_PATH=${OPTARG};;
-        t) TUNING_SEARCH_SPACE=${OPTARG};;
-        e) EXPERIMENT_NAME=${OPTARG};;
-        w) WORKLOAD=${OPTARG};;
-        b) DEBUG_MODE=${OPTARG};;
-        m) MAX_STEPS=${OPTARG};;
-        o) OVERWRITE=${OPTARG};;
-        c) SAVE_CHECKPOINTS=${OPTARG};;
-        r) RSYNC_DATA=${OPTARG};;
-        i) INTERNAL_CONTRIBUTOR_MODE=${OPTARG};;
+INTERNAL_CONTRIBUTOR_MODE="false"
+HOME_DIR=""
+RSYNC_DATA="true"
+OVERWRITE="false"
+SAVE_CHECKPOINTS="true"
+
+# Pass flag
+while [ "$1" != "" ]; do
+    case $1 in 
+        -d | --dataset) 
+            shift
+            DATASET=$1
+            ;;
+        -f | --framework)
+            shift 
+            FRAMEWORK=$1
+            ;;
+        -s | --submission_path)
+            shift
+            SUBMISSION_PATH=$1
+            ;;
+        -t | --tuning_search_space)
+            shift
+            TUNING_SEARCH_SPACE=$1
+            ;;
+        -e | --experiment_name)
+            shift
+            EXPERIMENT_NAME=$1
+            ;;
+        -w | --workload)
+            shift
+            WORKLOAD=$1
+            ;;
+        -a | --keep_container_alive)
+            shift 
+            KEEP_CONTAINER_ALIVE=$1
+            ;;
+        -m | --max_global_steps)
+            shift
+            MAX_GLOBAL_STEPS=$1
+            ;;
+        -o | --overwrite)
+            shift
+            OVERWRITE=$1
+            ;;
+        -c | --save_checkpoints)
+            shift
+            SAVE_CHECKPOINTS=$1
+            ;;
+        -r | --rsync_data)
+            shift
+            RSYNC_DATA=$1
+            ;;
+        -i | --internal_contributor)
+            shift
+            INTERNAL_CONTRIBUTOR_MODE=$1
+            ;;
+        -h | --home_dir)
+            shift
+            HOME_DIR=$1
+            ;;
+        *) 
+            usage 
+            exit 1
+            ;;
     esac
-done
+    shift 
+done 
+
 
 # Check if arguments are valid
 VALID_DATASETS=("criteo1tb" "imagenet"  "fastmri" "ogbg" "librispeech" \
@@ -34,6 +115,14 @@ VALID_DATASETS=("criteo1tb" "imagenet"  "fastmri" "ogbg" "librispeech" \
 VALID_WORKLOADS=("criteo1tb" "imagenet_resnet" "imagenet_vit" "fastmri" "ogbg" \
                  "wmt" "librispeech_deepspeech" "librispeech_conformer" "mnist")
 
+
+# Set data and experiment paths
+ROOT_DATA_BUCKET="gs://mlcommons-data"
+ROOT_DATA_DIR="${HOME_DIR}/data"
+
+EXPERIMENT_BUCKET="gs://mlcommons-runs"
+EXPERIMENT_DIR="${HOME_DIR}/experiment_runs"
+
 if [[ -n ${DATASET+x} ]]; then 
     if [[ ! " ${VALID_DATASETS[@]} " =~ " $DATASET " ]]; then
         echo "Error: invalid argument for dataset (d)."
@@ -48,13 +137,6 @@ if [[ -n ${WORKLOAD+x} ]]; then
     fi
 fi
 
-# Set data and experiment paths
-ROOT_DATA_BUCKET="gs://mlcommons-data"
-ROOT_DATA_DIR="/data"
-
-EXPERIMENT_BUCKET="gs://mlcommons-runs"
-EXPERIMENT_DIR="/experiment_runs"
-
 # Set run command prefix depending on framework
 if [[ "${FRAMEWORK}" == "jax" ]]; then
     COMMAND_PREFIX="python3"
@@ -71,15 +153,6 @@ elif [[ ! -z "${DATASET}" ]]; then
     DATA_BUCKET="${ROOT_DATA_BUCKET}/${DATASET}"
 fi
 
-# Copy data from MLCommons bucket if data has not been downloaded yet
-if [[ -z ${INTERNAL_CONTRIBUTOR_MODE+x} ]]; then
-    INTERNAL_CONTRIBUTOR_MODE='false' # Set default for contributor mode to false
-fi 
-
-if [[ -z ${RSYNC_DATA+x} ]]; then 
-    RSYNC_DATA='true' # Set default value for rsync to true
-fi 
-
 if [[ ! -z $DATA_DIR ]] && [[ ! -d ${DATA_DIR} ]]; then
     mkdir -p ${DATA_DIR}
 fi 
@@ -91,35 +164,26 @@ fi
 # Optionally run workload if SUBMISSION_PATH is set
 if [[ ! -z ${SUBMISSION_PATH+x} ]]; then
     NOW=$(date +"%m-%d-%Y-%H-%M-%S")
-    LOG_DIR="/logs"
+    LOG_DIR="${HOME_DIR}/logs"
     LOG_FILE="$LOG_DIR/${WORKLOAD}_${FRAMEWORK}_${NOW}.log"
     mkdir -p ${LOG_DIR}
     cd algorithmic-efficiency
 
     # Optionally define max steps flag for submission runner 
-    if [[ ! -z ${MAX_STEPS+x} ]]; then 
-        MAX_STEPS_FLAG="--max_global_steps=${MAX_STEPS}"
-    fi
-
-    # Set overwrite flag to false by default if not set
-    if [[  -z ${OVERWRITE+x} ]]; then 
-        OVERWRITE='False'
-    fi
-
-    if [[  -z ${SAVE_CHECKPOINTS+x} ]]; then 
-        SAVE_CHECKPOINTS='True'
+    if [[ ! -z ${MAX_GLOBAL_STEPS+x} ]]; then 
+        MAX_STEPS_FLAG="--max_global_steps=${MAX_GLOBAL_STEPS}"
     fi
 
     # Define special flags for imagenet and librispeech workloads
-    if [[ ${DATASET} == 'imagenet' ]]; then 
+    if [[ ${DATASET} == "imagenet" ]]; then 
         SPECIAL_FLAGS="--imagenet_v2_data_dir=${DATA_DIR}"
-    elif [[ ${DATASET} == 'librispeech' ]]; then 
+    elif [[ ${DATASET} == "librispeech" ]]; then 
         SPECIAL_FLAGS="--librispeech_tokenizer_vocab_path=${DATA_DIR}/spm_model.vocab"
     fi 
 
     # Optionally run torch compile
     if [[ ${FRAMEWORK} == "pytorch" ]]; then
-        TORCH_COMPILE_FLAG="--torch_compile=True"
+        TORCH_COMPILE_FLAG="--torch_compile=true"
     fi
     
     # The TORCH_RUN_COMMAND_PREFIX is only set if FRAMEWORK is "pytorch"
@@ -142,7 +206,7 @@ if [[ ! -z ${SUBMISSION_PATH+x} ]]; then
     eval $COMMAND
     RETURN_CODE=$?
 
-    if [[ $INTERNAL_CONTRIBUTOR_MODE == 'true' ]]; then 
+    if [[ $INTERNAL_CONTRIBUTOR_MODE == "true" ]]; then 
         /google-cloud-sdk/bin/gsutil -m cp -r ${EXPERIMENT_DIR}/${EXPERIMENT_NAME}/${WORKLOAD}_${FRAMEWORK} ${EXPERIMENT_BUCKET}/${EXPERIMENT_NAME}/
         /google-cloud-sdk/bin/gsutil -m cp ${LOG_FILE} ${EXPERIMENT_BUCKET}/${EXPERIMENT_NAME}/${WORKLOAD}_${FRAMEWORK}/
     fi
@@ -150,7 +214,7 @@ if [[ ! -z ${SUBMISSION_PATH+x} ]]; then
 fi
 
 # Keep main process running in debug mode to avoid the container from stopping
-if [[ ${DEBUG_MODE} == 'true' ]]
+if [[ ${KEEP_CONTAINER_ALIVE} == "true" ]]
 then 
     while true
     do 
diff --git a/docker/scripts/test_startup.sh b/docker/scripts/test_startup.sh
new file mode 100644
index 000000000..9f62b6d4e
--- /dev/null
+++ b/docker/scripts/test_startup.sh
@@ -0,0 +1,46 @@
+#!/bin/bash
+
+# Test for startup script for docker startup script.
+#
+# Usage:
+#     bash algorithmic-efficiency/docker/scripts/test_startup.sh
+
+
+test_startup_script_short_flags() {
+    command="bash algorithmic-efficiency/docker/scripts/startup.sh \
+    -d mnist \
+    -f jax \
+    -s baselines/adamw/jax/submission.py \
+    -w mnist \
+    -t baselines/adamw/tuning_search_space.json \
+    -e test_docker_entrypoint/adamw \
+    -m 10 \
+    -c false \
+    -o true \
+    -r false \
+    -h $HOME
+    "
+    echo $command
+    eval $command
+}
+
+test_startup_script_long_flags() {
+    command="bash algorithmic-efficiency/docker/scripts/startup.sh 
+    --dataset mnist \
+    --framework jax \
+    --submission_path baselines/adamw/jax/submission.py \
+    --workload mnist \
+    --tuning_search_space baselines/adamw/tuning_search_space.json \
+    --experiment_name test_docker_entrypoint/adamw \
+    --max_global_steps 10 \
+    --save_checkpoints False \
+    --overwrite true 
+    --rsync_data false
+    --home_dir $HOME"
+    echo $command
+    eval $command
+}
+
+test_startup_script_short_flags
+test_startup_script_long_flags
+
diff --git a/getting_started.md b/getting_started.md
index d6dd7fcd3..2942e632b 100644
--- a/getting_started.md
+++ b/getting_started.md
@@ -1,18 +1,18 @@
 # Getting Started
 
 Table of Contents:
-- [Set up  and installation](#workspace-set-up-and-installation)
+- [Set up  and installation](#set-up-and-installation)
 - [Download the data](#download-the-data)
 - [Develop your submission](#develop-your-submission)
 - [Run your submission](#run-your-submission)
     - [Docker](#run-your-submission-in-a-docker-container)
 - [Score your submission](#score-your-submission)
 
-## Workspace set up and installation
+## Set up and installation
 To get started you will have to make a few decisions and install the repository along with its dependencies. Specifically:
 1. Decide if you would like to develop your submission in either Pytorch or Jax.
-    2. Set up your workstation or VM. We recommend to use a setup similar to the [benchmarking hardware](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#benchmarking-hardware). 
-    The specs on the benchmarking machines are:
+2. Set up your workstation or VM. We recommend to use a setup similar to the [benchmarking hardware](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#benchmarking-hardware). 
+The specs on the benchmarking machines are:
     -  8 V100 GPUs
     - 240 GB in RAM
     - 2 TB in storage (for datasets). 
@@ -109,14 +109,14 @@ torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 \
 ### Run your submission in a Docker container
 
 The container entrypoint script provides the following flags:
-- `-d` dataset: can be 'imagenet', 'fastmri', 'librispeech', 'criteo1tb', 'wmt', or 'ogbg'. Setting this flag will download data if `~/data/<dataset>` does not exist on the host machine. Required for running a submission.
-- `-f` framework: can be either 'pytorch' or 'jax'. If you just want to download data, this flag is required for `-d imagenet` since we have two versions of data for imagenet. This flag is also required for running a submission.
-- `-s` submission_path: path to submission file on container filesystem. If this flag is set, the container will run a submission, so it is required for running a submission. 
-- `-t` tuning_search_space: path to file containing tuning search space on container filesystem. Required for running a submission.
-- `-e` experiment_name: name of experiment. Required for running a submission.
-- `-w` workload: can be 'imagenet_resnet', 'imagenet_jax', 'librispeech_deepspeech', 'librispeech_conformer', 'ogbg', 'wmt', 'fastmri' or 'criteo1tb'. Required for running a submission.
-- `-m` max_steps: maximum number of steps to run the workload for. Optional.
-- `-b` debugging_mode: can be true or false. If `-b ` (debugging_mode) is `true` the main process on the container will persist.
+- `--dataset` dataset: can be 'imagenet', 'fastmri', 'librispeech', 'criteo1tb', 'wmt', or 'ogbg'. Setting this flag will download data if `~/data/<dataset>` does not exist on the host machine. Required for running a submission.
+- `--framework` framework: can be either 'pytorch' or 'jax'. If you just want to download data, this flag is required for `-d imagenet` since we have two versions of data for imagenet. This flag is also required for running a submission.
+- `--submission_path` submission_path: path to submission file on container filesystem. If this flag is set, the container will run a submission, so it is required for running a submission. 
+- `--tuning_search_space` tuning_search_space: path to file containing tuning search space on container filesystem. Required for running a submission.
+- `--experiment_name` experiment_name: name of experiment. Required for running a submission.
+- `--workload` workload: can be 'imagenet_resnet', 'imagenet_jax', 'librispeech_deepspeech', 'librispeech_conformer', 'ogbg', 'wmt', 'fastmri' or 'criteo1tb'. Required for running a submission.
+- `--max_global_steps` max_global_steps: maximum number of steps to run the workload for. Optional.
+- `--keep_container_alive` : can be true or false. If`true` the container will not be killed automatically. This is useful for developing or debugging.
 
 
 To run the docker container that will run the submission runner run:
@@ -128,16 +128,15 @@ docker run -t -d \
 --gpus all \
 --ipc=host \
 <docker_image_name> \
--d <dataset> \
--f <framework> \
--s <submission_path> \
--t <tuning_search_space> \
--e <experiment_name> \
--w <workload> \
--b <debug_mode>
+--dataset <dataset> \
+--framework <framework> \
+--submission_path <submission_path> \
+--tuning_search_space <tuning_search_space> \
+--experiment_name <experiment_name> \
+--workload <workload> \
+--keep_container_alive <keep_container_alive>
 ```
 This will print the container ID to the terminal.
-If debugging_mode is `true` the main process on the container will persist after finishing the submission runner.
 
 #### Docker Tips ####
 
@@ -162,5 +161,7 @@ To produce performance profile and performance table:
 python3 scoring/score_submission.py --experiment_path=<path_to_experiment_dir> --output_dir=<output_dir>
 ```
 
+We provide the scores and performance profiles for the baseline algorithms in the "Baseline Results" section in [Benchmarking Neural Network Training Algorithms](https://arxiv.org/abs/2306.07179). 
+
 
 ## Good Luck!