diff --git a/README.md b/README.md index 25f2a6dc2..916b684b8 100644 --- a/README.md +++ b/README.md @@ -70,9 +70,9 @@ Check `gs://$GOOGLE_CLOUD_BUCKET_NAME/t5x/` for the output artifacts, which can be read by TensorBoard. ## GPU Usage -UPDATE!: Nvidia has released an updated version of this repository with H100 FP8 support and broad GPU performance improvements here: [NVIDIA Rosetta](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/t5x) +Note: NVIDIA has released an updated version of this repository with H100 FP8 support and broad GPU performance improvements. Please visit the [NVIDIA Rosetta](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/t5x) repository for more details and usage instructions. -T5X can be run easily on GPUs either in single-node configurations or multi-node configurations with a SLURM+pyxis cluster. Further instructions at [t5x/contrib/gpu/scripts_gpu](https://github.com/google-research/t5x/blob/main/t5x/contrib/gpu/scripts_gpu/README.md). The `t5x/contrib/gpu/scripts_gpu` folder contains example scripts for pretraining T5X on [The Pile](https://pile.eleuther.ai/) and for finetuning on SQuAD and MNLI. These scripts and associated `gin` configurations also contain additional GPU optimizations for better throughput. +T5X can be run easily on GPUs either in single-node configurations or multi-node configurations with a SLURM+pyxis cluster. Further instructions at [t5x/contrib/gpu](https://github.com/google-research/t5x/blob/main/t5x/contrib/gpu/README.md). The `t5x/contrib/gpu/scripts_gpu` folder contains example scripts for pretraining T5X on [The Pile](https://pile.eleuther.ai/) and for finetuning on SQuAD and MNLI. These scripts and associated `gin` configurations also contain additional GPU optimizations for better throughput. More examples and instructions can be found in the [NVIDIA Rosetta](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/t5x) repository maintained by NVIDIA with H100 FP8 support and broad GPU performance improvements. ## Installation @@ -85,7 +85,7 @@ the TPU VM instance unless otherwise stated. to set up a Google Cloud Platform (GCP) account and enable the Cloud TPU API. - **Note:** T5X also works with GPU, please follow instructions in [t5x/contrib/gpu/scripts_gpu](https://github.com/google-research/t5x/blob/main/t5x/contrib/gpu/scripts_gpu/README.md) if you'd like to use GPU version. + **Note:** T5X also works with GPU, please follow instructions in [t5x/contrib/gpu](https://github.com/google-research/t5x/blob/main/t5x/contrib/gpu/README.md) if you'd like to use GPU version. 2. Create a [Cloud TPU VM instance](https://cloud.google.com/blog/products/compute/introducing-cloud-tpu-vms) diff --git a/docs/models.md b/docs/models.md index f92ebf8a3..09800df6b 100644 --- a/docs/models.md +++ b/docs/models.md @@ -17,18 +17,23 @@ checkpoint locations. Publicly Available Models: -Model | Use Case ---------------------------------------- | --------------------------------------------------- -[T5 1.1](#t5-11-checkpoints) | Improved T5, recommended for most research. English only. -[T5](#t5-checkpoints) | The original T5 work for reproducibility. English only. -[T5 1.1 LM-Adapted](#t5-11-lm-adapted-checkpoints)| Trained for 100k additional steps on the LM objective, per [prompt tuning paper](https://arxiv.org/abs/2104.08691). -[mT5](#mt5-checkpoints) | Multilingual T5. Recommended for multilingual research. Note that at smaller scales (at least through XL), mT5 performance is lower than T5 on English tasks. -[mT5 LM-Adapted](#mt5-lm-adapted-checkpoints)| Trained for 100k additional steps on the LM objective, per [zero-shot cross-lingual generation (XGen) paper](https://arxiv.org/abs/2205.12647). -[umT5](#umt5-checkpoints) | umT5, an updated mT5 model trained using a more uniform language distribution, per [the UniMax paper](https://openreview.net/forum?id=kXwdL1cWOAi). -[ByT5](#byt5-checkpoints) | ByT5. A "token-free" model that uses UTF-8 bytes for input and output. Recommended for tasks involving word-internal phenomena such as spelling, pronunciation, or morphology. -[LongT5](#longt5-checkpoints) | TBD -[MoE](#mixture-of-experts-moe-checkpoints) | Useful for MoE experimentation. -[Flan-T5](#flan-t5-checkpoints) | General purpose T5 checkpoints for few-shot and finetuning. We recommend Flan-T5 over vanilla T5 and T5 LM-adapted +Model | Use Case +---------------------------------------------------- | -------- +[T5 1.1](#t5-11-checkpoints) | Improved T5, recommended for most research. English only. +[T5](#t5-checkpoints) | The original T5 work for reproducibility. English only. +[T5 1.1 LM-Adapted](#t5-11-lm-adapted-checkpoints) | Trained for 100k additional steps on the LM objective, per [prompt tuning paper](https://arxiv.org/abs/2104.08691). +[mT5](#mt5-checkpoints) | Multilingual T5. Recommended for multilingual research. Note that at smaller scales (at least through XL), mT5 performance is lower than T5 on English tasks. +[mT5 LM-Adapted](#mt5-lm-adapted-checkpoints) | Trained for 100k additional steps on the LM objective, per [zero-shot cross-lingual generation (XGen) paper](https://arxiv.org/abs/2205.12647). +[umT5](#umt5-checkpoints) | umT5, an updated mT5 model trained using a more uniform language distribution, per [the UniMax paper](https://openreview.net/forum?id=kXwdL1cWOAi). +[ByT5](#byt5-checkpoints) | ByT5. A "token-free" model that uses UTF-8 bytes for input and output. Recommended for tasks involving word-internal phenomena such as spelling, pronunciation, or morphology. +[LongT5](#longt5-checkpoints) | Recommended checkpoints to fine-tune for long input sequence tasks +[MoE](#mixture-of-experts-moe-checkpoints) | Useful for MoE experimentation. +[Flan-T5](#flan-t5-checkpoints) | General purpose T5 checkpoints for few-shot and finetuning. We recommend Flan-T5 over vanilla T5 and T5 LM-adapted +[UL2](#ul2-checkpoints) | Checkpoints for 20B pretrained and FLAN-based instruction-tuned models using the UL2 objective from [UL2 paper](https://arxiv.org/abs/2205.05131) +[BigScience](#bigscience-checkpoints) | Checkpoints from the [BigScience paper](https://arxiv.org/abs/2204.05832) +[FLIP](#flip-checkpoints) | Language-Image models trained with an alternative to CLIP, presented in the [FLIP paper](https://arxiv.org/abs/2212.00794) +[RankGen](#rankgen-checkpoints) | 1.2B parameter encoder model for English to score model generations given a prefix for decoding from the [RankGen paper](https://arxiv.org/abs/2205.09726) +[Dipper](#dipper-checkpoints) | 11B parameter paraphrase generation model from the [Dipper paper](https://arxiv.org/abs/2303.13408) ### Public Research Models @@ -280,5 +285,35 @@ Flan-T5 XL | [t5_1_1_xl.gin](https://github.com/google-research/t5x/blob/main Flan-T5 XXL | [t5_1_1_xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/xxl.gin) | [gs://t5-data/pretrained_models/t5x/flan_t5_xxl/checkpoint_1114000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/flan_t5_xxl/checkpoint_1114000) +#### UL2 Checkpoints +Checkpoints for 20B pretrained and FLAN-based instruction-tuned models using the +UL2 objective from [UL2 paper](https://arxiv.org/abs/2205.05131). Checkpoints +are released at +https://github.com/google-research/google-research/tree/master/ul2#checkpoints. + +#### BigScience Checkpoints + +Checkpoints from the [BigScience paper](https://arxiv.org/abs/2204.05832), +released at +https://github.com/bigscience-workshop/architecture-objective/tree/main#checkpoints. + +#### FLIP Checkpoints + +Language-Image models trained with an alternative to CLIP, presented in the +[FLIP paper](https://arxiv.org/abs/2212.00794). Checkpoints are released at +https://github.com/facebookresearch/flip#results-and-pre-trained-flip-models. + +#### RankGen Checkpoints + +1.2B parameter encoder model for English to score model generations given a +prefix for decoding from the [RankGen paper](https://arxiv.org/abs/2205.09726). +Checkpoints are released at +https://github.com/google-research/google-research/tree/master/rankgen. + +#### Dipper Checkpoints + +11B parameter paraphrase generation model from the +[Dipper paper](https://arxiv.org/abs/2303.13408). Checkpoints are released at +https://github.com/google-research/google-research/tree/master/dipper. diff --git a/docs/usage/eval.md b/docs/usage/eval.md index 29d86ed3b..e29a42754 100644 --- a/docs/usage/eval.md +++ b/docs/usage/eval.md @@ -41,7 +41,9 @@ SeqIO Task will be used: [`t5x/configs/models/t5_1_1_small.gin`](https://github.com/google-research/t5x/blob/main/t5x/google/examples/flaxformer_t5/configs/models/t5_1_1_small.gin). If you would like to fine-tune your model before evaluation, please follow the -[fine-tuning](finetune.md) tutorial, and continue to Step 2. +[fine-tuning](finetune.md) tutorial, and continue to Step 2. A list of all +available pre-trained models (with model checkpoints and Gin config files) are +available in the [Models](https://github.com/google-research/t5x/blob/main/docs/models.md) documentation. ## Step 2: Choose a SeqIO Task/Mixture diff --git a/docs/usage/partitioning.md b/docs/usage/partitioning.md index 02032e1f5..2a62c76d1 100644 --- a/docs/usage/partitioning.md +++ b/docs/usage/partitioning.md @@ -413,11 +413,6 @@ partitioning.PjitPartitioner.logical_axis_rules = [ ] ``` -## Recommended reading - -[Basic model and data partitioning for inference in P5X](https://docs.google.com/document/d/1bU8IuufbgkY0Wg8okyrEPnu3S5rfYGqVPMbUmeorFVo/edit) -by brandonthorpe@, luyaoxu@ - diff --git a/docs/usage/pretrain.md b/docs/usage/pretrain.md index 7d895e7ac..c302974b6 100644 --- a/docs/usage/pretrain.md +++ b/docs/usage/pretrain.md @@ -25,25 +25,9 @@ span corruption pretraining objective is also showcased. To train a model, you need a Gin config file that defines the model params. For your convenience, Gin configs for common models have been made available for use -in T5X. Following is a list of these models and their Gin locations. - -Model | Gin File Location -------------------------------------- | ----------------- -T5 Small | [t5_1_0/small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/small.gin) -T5 Base | [t5_1_0/base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/base.gin) -T5 Large | [t5_1_0/large.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/large.gin) -T5 3B | [t5_1_0/3B.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/3B.gin) -T5 11B | [t5_1_0/11B.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/11B.gin) -T5 1.1 Small | [t5_1_1/small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/small.gin) -T5 1.1 Base | [t5_1_1/base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/base.gin) -T5 1.1 Large | [t5_1_1/large.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/large.gin) -T5 1.1 XL | [t5_1_1/xl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/xl.gin) -T5 1.1 XXL | [t5_1_1/xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/xxl.gin) -MT5 Small | [mt5/small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/small.gin) -MT5 Base | [mt5/base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/base.gin) -MT5 Large | [mt5/large.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/large.gin) -MT5 XL | [mt5/xl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/xl.gin) -MT5 XXL | [mt5/xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/xxl.gin) +in T5X. A list of all the available pre-trained models (with model checkpoints +and Gin config files) are available in the [Models](https://github.com/google-research/t5x/blob/main/docs/models.md) +documentation. For the example run, you will use the T5 1.1 Small model. The Gin file for this model is located at