Minor doc updates

PiperOrigin-RevId: 567923786
google-research · Sep 29, 2023 · 82283f7 · 82283f7
1 parent 870d069
commit 82283f7
Show file tree

Hide file tree

Showing 5 changed files with 56 additions and 40 deletions.
diff --git a/README.md b/README.md
@@ -70,9 +70,9 @@ Check `gs://$GOOGLE_CLOUD_BUCKET_NAME/t5x/` for the output artifacts, which can
 be read by TensorBoard.
 
 ## GPU Usage
-UPDATE!: Nvidia has released an updated version of this repository with H100 FP8 support and broad GPU performance improvements here: [NVIDIA Rosetta](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/t5x)
+Note: NVIDIA has released an updated version of this repository with H100 FP8 support and broad GPU performance improvements. Please visit the [NVIDIA Rosetta](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/t5x) repository for more details and usage instructions.
 
-T5X can be run easily on GPUs either in single-node configurations or multi-node configurations with a SLURM+pyxis cluster. Further instructions at [t5x/contrib/gpu/scripts_gpu](https://github.com/google-research/t5x/blob/main/t5x/contrib/gpu/scripts_gpu/README.md). The `t5x/contrib/gpu/scripts_gpu` folder contains example scripts for pretraining T5X on [The Pile](https://pile.eleuther.ai/) and for finetuning on SQuAD and MNLI. These scripts and associated `gin` configurations also contain additional GPU optimizations for better throughput.
+T5X can be run easily on GPUs either in single-node configurations or multi-node configurations with a SLURM+pyxis cluster. Further instructions at [t5x/contrib/gpu](https://github.com/google-research/t5x/blob/main/t5x/contrib/gpu/README.md). The `t5x/contrib/gpu/scripts_gpu` folder contains example scripts for pretraining T5X on [The Pile](https://pile.eleuther.ai/) and for finetuning on SQuAD and MNLI. These scripts and associated `gin` configurations also contain additional GPU optimizations for better throughput. More examples and instructions can be found in the [NVIDIA Rosetta](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/t5x) repository maintained by NVIDIA with H100 FP8 support and broad GPU performance improvements.
 
 
 ## Installation
@@ -85,7 +85,7 @@ the TPU VM instance unless otherwise stated.
     to set up a Google Cloud Platform (GCP) account and enable the Cloud TPU
     API.
 
-    **Note:** T5X also works with GPU, please follow instructions in [t5x/contrib/gpu/scripts_gpu](https://github.com/google-research/t5x/blob/main/t5x/contrib/gpu/scripts_gpu/README.md) if you'd like to use GPU version.
+    **Note:** T5X also works with GPU, please follow instructions in [t5x/contrib/gpu](https://github.com/google-research/t5x/blob/main/t5x/contrib/gpu/README.md) if you'd like to use GPU version.
 
 2.  Create a
     [Cloud TPU VM instance](https://cloud.google.com/blog/products/compute/introducing-cloud-tpu-vms)

diff --git a/docs/models.md b/docs/models.md
@@ -17,18 +17,23 @@ checkpoint locations.
 
 Publicly Available Models:
 
-Model             | Use Case
----------------------------------------     | ---------------------------------------------------
-[T5 1.1](#t5-11-checkpoints)                | Improved T5, recommended for most research. English only.
-[T5](#t5-checkpoints)                       | The original T5 work for reproducibility. English only.
-[T5 1.1 LM-Adapted](#t5-11-lm-adapted-checkpoints)| Trained for 100k additional steps on the LM objective, per [prompt tuning paper](https://arxiv.org/abs/2104.08691).
-[mT5](#mt5-checkpoints)                     | Multilingual T5. Recommended for multilingual research. Note that at smaller scales (at least through XL), mT5 performance is lower than T5 on English tasks.
-[mT5 LM-Adapted](#mt5-lm-adapted-checkpoints)| Trained for 100k additional steps on the LM objective, per [zero-shot cross-lingual generation (XGen) paper](https://arxiv.org/abs/2205.12647).
-[umT5](#umt5-checkpoints)                   | umT5, an updated mT5 model trained using a more uniform language distribution, per [the UniMax paper](https://openreview.net/forum?id=kXwdL1cWOAi).
-[ByT5](#byt5-checkpoints)                   | ByT5. A "token-free" model that uses UTF-8 bytes for input and output. Recommended for tasks involving word-internal phenomena such as spelling, pronunciation, or morphology.
-[LongT5](#longt5-checkpoints)               | TBD
-[MoE](#mixture-of-experts-moe-checkpoints)  | Useful for MoE experimentation.
-[Flan-T5](#flan-t5-checkpoints)  | General purpose T5 checkpoints for few-shot and finetuning. We recommend Flan-T5 over vanilla T5 and T5 LM-adapted
+Model                                                | Use Case
+---------------------------------------------------- | --------
+[T5 1.1](#t5-11-checkpoints)                         | Improved T5, recommended for most research. English only.
+[T5](#t5-checkpoints)                                | The original T5 work for reproducibility. English only.
+[T5 1.1 LM-Adapted](#t5-11-lm-adapted-checkpoints)   | Trained for 100k additional steps on the LM objective, per [prompt tuning paper](https://arxiv.org/abs/2104.08691).
+[mT5](#mt5-checkpoints)                              | Multilingual T5. Recommended for multilingual research. Note that at smaller scales (at least through XL), mT5 performance is lower than T5 on English tasks.
+[mT5 LM-Adapted](#mt5-lm-adapted-checkpoints)        | Trained for 100k additional steps on the LM objective, per [zero-shot cross-lingual generation (XGen) paper](https://arxiv.org/abs/2205.12647).
+[umT5](#umt5-checkpoints)                            | umT5, an updated mT5 model trained using a more uniform language distribution, per [the UniMax paper](https://openreview.net/forum?id=kXwdL1cWOAi).
+[ByT5](#byt5-checkpoints)                            | ByT5. A "token-free" model that uses UTF-8 bytes for input and output. Recommended for tasks involving word-internal phenomena such as spelling, pronunciation, or morphology.
+[LongT5](#longt5-checkpoints)                        | Recommended checkpoints to fine-tune for long input sequence tasks
+[MoE](#mixture-of-experts-moe-checkpoints)           | Useful for MoE experimentation.
+[Flan-T5](#flan-t5-checkpoints)                      | General purpose T5 checkpoints for few-shot and finetuning. We recommend Flan-T5 over vanilla T5 and T5 LM-adapted
+[UL2](#ul2-checkpoints)                              | Checkpoints for 20B pretrained and FLAN-based instruction-tuned models using the UL2 objective from [UL2 paper](https://arxiv.org/abs/2205.05131)
+[BigScience](#bigscience-checkpoints)                | Checkpoints from the [BigScience paper](https://arxiv.org/abs/2204.05832)
+[FLIP](#flip-checkpoints)                            | Language-Image models trained with an alternative to CLIP, presented in the [FLIP paper](https://arxiv.org/abs/2212.00794)
+[RankGen](#rankgen-checkpoints)                      | 1.2B parameter encoder model for English to score model generations given a prefix for decoding from the [RankGen paper](https://arxiv.org/abs/2205.09726)
+[Dipper](#dipper-checkpoints)                        | 11B parameter paraphrase generation model from the [Dipper paper](https://arxiv.org/abs/2303.13408)
 
 
 ### Public Research Models
@@ -280,5 +285,35 @@ Flan-T5 XL    | [t5_1_1_xl.gin](https://github.com/google-research/t5x/blob/main
 Flan-T5 XXL   | [t5_1_1_xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/xxl.gin)     | [gs://t5-data/pretrained_models/t5x/flan_t5_xxl/checkpoint_1114000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/flan_t5_xxl/checkpoint_1114000)
 
 
+#### UL2 Checkpoints
 
+Checkpoints for 20B pretrained and FLAN-based instruction-tuned models using the
+UL2 objective from [UL2 paper](https://arxiv.org/abs/2205.05131). Checkpoints
+are released at
+https://github.com/google-research/google-research/tree/master/ul2#checkpoints.
+
+#### BigScience Checkpoints
+
+Checkpoints from the [BigScience paper](https://arxiv.org/abs/2204.05832),
+released at
+https://github.com/bigscience-workshop/architecture-objective/tree/main#checkpoints.
+
+#### FLIP Checkpoints
+
+Language-Image models trained with an alternative to CLIP, presented in the
+[FLIP paper](https://arxiv.org/abs/2212.00794). Checkpoints are released at
+https://github.com/facebookresearch/flip#results-and-pre-trained-flip-models.
+
+#### RankGen Checkpoints
+
+1.2B parameter encoder model for English to score model generations given a
+prefix for decoding from the [RankGen paper](https://arxiv.org/abs/2205.09726).
+Checkpoints are released at
+https://github.com/google-research/google-research/tree/master/rankgen.
+
+#### Dipper Checkpoints
+
+11B parameter paraphrase generation model from the
+[Dipper paper](https://arxiv.org/abs/2303.13408). Checkpoints are released at
+https://github.com/google-research/google-research/tree/master/dipper.
 
diff --git a/docs/usage/eval.md b/docs/usage/eval.md
@@ -41,7 +41,9 @@ SeqIO Task will be used:
     [`t5x/configs/models/t5_1_1_small.gin`](https://github.com/google-research/t5x/blob/main/t5x/google/examples/flaxformer_t5/configs/models/t5_1_1_small.gin).
 
 If you would like to fine-tune your model before evaluation, please follow the
-[fine-tuning](finetune.md) tutorial, and continue to Step 2.
+[fine-tuning](finetune.md) tutorial, and continue to Step 2. A list of all
+available pre-trained models (with model checkpoints and Gin config files) are
+available in the [Models](https://github.com/google-research/t5x/blob/main/docs/models.md) documentation.
 
 ## Step 2: Choose a SeqIO Task/Mixture
 

diff --git a/docs/usage/partitioning.md b/docs/usage/partitioning.md
@@ -413,11 +413,6 @@ partitioning.PjitPartitioner.logical_axis_rules = [
 ]
 ```
 
-## Recommended reading
-
-[Basic model and data partitioning for inference in P5X](https://docs.google.com/document/d/1bU8IuufbgkY0Wg8okyrEPnu3S5rfYGqVPMbUmeorFVo/edit)
-by brandonthorpe@, luyaoxu@
-
 <!-- Reference links -->
 
 <!---TODO(b/214235006): Use symbol reference instead of line number+rcl.-->

diff --git a/docs/usage/pretrain.md b/docs/usage/pretrain.md
@@ -25,25 +25,9 @@ span corruption pretraining objective is also showcased.
 
 To train a model, you need a Gin config file that defines the model params. For
 your convenience, Gin configs for common models have been made available for use
-in T5X. Following is a list of these models and their Gin locations.
-
-Model                                 | Gin File Location
-------------------------------------- | -----------------
-T5 Small                              | [t5_1_0/small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/small.gin)
-T5 Base                               | [t5_1_0/base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/base.gin)
-T5 Large                              | [t5_1_0/large.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/large.gin)
-T5 3B                                 | [t5_1_0/3B.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/3B.gin)
-T5 11B                                | [t5_1_0/11B.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/11B.gin)
-T5 1.1 Small                          | [t5_1_1/small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/small.gin)
-T5 1.1 Base                           | [t5_1_1/base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/base.gin)
-T5 1.1 Large                          | [t5_1_1/large.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/large.gin)
-T5 1.1 XL                             | [t5_1_1/xl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/xl.gin)
-T5 1.1 XXL                            | [t5_1_1/xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/xxl.gin)
-MT5 Small                             | [mt5/small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/small.gin)
-MT5 Base                              | [mt5/base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/base.gin)
-MT5 Large                             | [mt5/large.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/large.gin)
-MT5 XL                                | [mt5/xl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/xl.gin)
-MT5 XXL                               | [mt5/xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/xxl.gin)
+in T5X. A list of all the available pre-trained models (with model checkpoints
+and Gin config files) are available in the [Models](https://github.com/google-research/t5x/blob/main/docs/models.md)
+documentation.
 
 For the example run, you will use the T5 1.1 Small model. The Gin file for this
 model is located at