From 4472cd5f13ba41de5061bb99e00b4c6933d0d017 Mon Sep 17 00:00:00 2001
From: Priya Kasimbeg <kasimbeg@google.com>
Date: Tue, 28 Nov 2023 18:13:44 +0000
Subject: [PATCH] add entry to changelog. remove conformer pytorch warning

---
 CHANGELOG.md     | 2 +-
 DOCUMENTATION.md | 3 ---
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index f8c3db0e6..4ff1cc068 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,5 @@
 # Change Log
 
-## [0.1.0] - 2023-11-21
+## algoperf-benchmark-0.1.0 (2023-11-28)
 
 First release of the AlgoPerf: Training algorithms benchmarking code.
diff --git a/DOCUMENTATION.md b/DOCUMENTATION.md
index ae380af34..de7a3b7f8 100644
--- a/DOCUMENTATION.md
+++ b/DOCUMENTATION.md
@@ -577,6 +577,3 @@ The JAX and PyTorch versions of the Criteo, FastMRI, Librispeech, OGBG, and WMT
 Since we use PyTorch's [`DistributedDataParallel`](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel) implementation, there is one Python process for each device. Depending on the hardware and the settings of the cluster, running a TensorFlow input pipeline in each Python process can lead to errors, since too many threads are created in each process. See [this PR thread](https://github.com/mlcommons/algorithmic-efficiency/pull/85) for more details.
 While this issue might not affect all setups, we currently implement a different strategy: we only run the TensorFlow input pipeline in one Python process (with `rank == 0`), and [broadcast](https://pytorch.org/docs/stable/distributed.html#torch.distributed.broadcast) the batches to all other devices. This introduces an additional communication overhead for each batch. See the [implementation for the WMT workload](https://github.com/mlcommons/algorithmic-efficiency/blob/main/algorithmic_efficiency/workloads/wmt/wmt_pytorch/workload.py#L215-L288) as an example.
 
-### Pytorch Conformer CUDA OOM
-
-The Conformer PyTorch workload may run out of memory in the current state. Please set the `submission_runner.py` flag `reduce_pytorch_max_split_size` to `True` as a temporary workaround if you encounter this issue. This will set `max_split_size_mb:256`. Note that this will adversely impact the performance of the submission on this workload. See [tracking issue](https://github.com/mlcommons/algorithmic-efficiency/issues/497).