Skip to content

Commit

Permalink
Minor doc updates
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 567923786
  • Loading branch information
gauravmishra authored and t5-copybara committed Sep 29, 2023
1 parent 870d069 commit 82283f7
Show file tree
Hide file tree
Showing 5 changed files with 56 additions and 40 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,9 @@ Check `gs://$GOOGLE_CLOUD_BUCKET_NAME/t5x/` for the output artifacts, which can
be read by TensorBoard.

## GPU Usage
UPDATE!: Nvidia has released an updated version of this repository with H100 FP8 support and broad GPU performance improvements here: [NVIDIA Rosetta](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/t5x)
Note: NVIDIA has released an updated version of this repository with H100 FP8 support and broad GPU performance improvements. Please visit the [NVIDIA Rosetta](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/t5x) repository for more details and usage instructions.

T5X can be run easily on GPUs either in single-node configurations or multi-node configurations with a SLURM+pyxis cluster. Further instructions at [t5x/contrib/gpu/scripts_gpu](https://github.com/google-research/t5x/blob/main/t5x/contrib/gpu/scripts_gpu/README.md). The `t5x/contrib/gpu/scripts_gpu` folder contains example scripts for pretraining T5X on [The Pile](https://pile.eleuther.ai/) and for finetuning on SQuAD and MNLI. These scripts and associated `gin` configurations also contain additional GPU optimizations for better throughput.
T5X can be run easily on GPUs either in single-node configurations or multi-node configurations with a SLURM+pyxis cluster. Further instructions at [t5x/contrib/gpu](https://github.com/google-research/t5x/blob/main/t5x/contrib/gpu/README.md). The `t5x/contrib/gpu/scripts_gpu` folder contains example scripts for pretraining T5X on [The Pile](https://pile.eleuther.ai/) and for finetuning on SQuAD and MNLI. These scripts and associated `gin` configurations also contain additional GPU optimizations for better throughput. More examples and instructions can be found in the [NVIDIA Rosetta](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/t5x) repository maintained by NVIDIA with H100 FP8 support and broad GPU performance improvements.


## Installation
Expand All @@ -85,7 +85,7 @@ the TPU VM instance unless otherwise stated.
to set up a Google Cloud Platform (GCP) account and enable the Cloud TPU
API.

**Note:** T5X also works with GPU, please follow instructions in [t5x/contrib/gpu/scripts_gpu](https://github.com/google-research/t5x/blob/main/t5x/contrib/gpu/scripts_gpu/README.md) if you'd like to use GPU version.
**Note:** T5X also works with GPU, please follow instructions in [t5x/contrib/gpu](https://github.com/google-research/t5x/blob/main/t5x/contrib/gpu/README.md) if you'd like to use GPU version.

2. Create a
[Cloud TPU VM instance](https://cloud.google.com/blog/products/compute/introducing-cloud-tpu-vms)
Expand Down
59 changes: 47 additions & 12 deletions docs/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,23 @@ checkpoint locations.

Publicly Available Models:

Model | Use Case
--------------------------------------- | ---------------------------------------------------
[T5 1.1](#t5-11-checkpoints) | Improved T5, recommended for most research. English only.
[T5](#t5-checkpoints) | The original T5 work for reproducibility. English only.
[T5 1.1 LM-Adapted](#t5-11-lm-adapted-checkpoints)| Trained for 100k additional steps on the LM objective, per [prompt tuning paper](https://arxiv.org/abs/2104.08691).
[mT5](#mt5-checkpoints) | Multilingual T5. Recommended for multilingual research. Note that at smaller scales (at least through XL), mT5 performance is lower than T5 on English tasks.
[mT5 LM-Adapted](#mt5-lm-adapted-checkpoints)| Trained for 100k additional steps on the LM objective, per [zero-shot cross-lingual generation (XGen) paper](https://arxiv.org/abs/2205.12647).
[umT5](#umt5-checkpoints) | umT5, an updated mT5 model trained using a more uniform language distribution, per [the UniMax paper](https://openreview.net/forum?id=kXwdL1cWOAi).
[ByT5](#byt5-checkpoints) | ByT5. A "token-free" model that uses UTF-8 bytes for input and output. Recommended for tasks involving word-internal phenomena such as spelling, pronunciation, or morphology.
[LongT5](#longt5-checkpoints) | TBD
[MoE](#mixture-of-experts-moe-checkpoints) | Useful for MoE experimentation.
[Flan-T5](#flan-t5-checkpoints) | General purpose T5 checkpoints for few-shot and finetuning. We recommend Flan-T5 over vanilla T5 and T5 LM-adapted
Model | Use Case
---------------------------------------------------- | --------
[T5 1.1](#t5-11-checkpoints) | Improved T5, recommended for most research. English only.
[T5](#t5-checkpoints) | The original T5 work for reproducibility. English only.
[T5 1.1 LM-Adapted](#t5-11-lm-adapted-checkpoints) | Trained for 100k additional steps on the LM objective, per [prompt tuning paper](https://arxiv.org/abs/2104.08691).
[mT5](#mt5-checkpoints) | Multilingual T5. Recommended for multilingual research. Note that at smaller scales (at least through XL), mT5 performance is lower than T5 on English tasks.
[mT5 LM-Adapted](#mt5-lm-adapted-checkpoints) | Trained for 100k additional steps on the LM objective, per [zero-shot cross-lingual generation (XGen) paper](https://arxiv.org/abs/2205.12647).
[umT5](#umt5-checkpoints) | umT5, an updated mT5 model trained using a more uniform language distribution, per [the UniMax paper](https://openreview.net/forum?id=kXwdL1cWOAi).
[ByT5](#byt5-checkpoints) | ByT5. A "token-free" model that uses UTF-8 bytes for input and output. Recommended for tasks involving word-internal phenomena such as spelling, pronunciation, or morphology.
[LongT5](#longt5-checkpoints) | Recommended checkpoints to fine-tune for long input sequence tasks
[MoE](#mixture-of-experts-moe-checkpoints) | Useful for MoE experimentation.
[Flan-T5](#flan-t5-checkpoints) | General purpose T5 checkpoints for few-shot and finetuning. We recommend Flan-T5 over vanilla T5 and T5 LM-adapted
[UL2](#ul2-checkpoints) | Checkpoints for 20B pretrained and FLAN-based instruction-tuned models using the UL2 objective from [UL2 paper](https://arxiv.org/abs/2205.05131)
[BigScience](#bigscience-checkpoints) | Checkpoints from the [BigScience paper](https://arxiv.org/abs/2204.05832)
[FLIP](#flip-checkpoints) | Language-Image models trained with an alternative to CLIP, presented in the [FLIP paper](https://arxiv.org/abs/2212.00794)
[RankGen](#rankgen-checkpoints) | 1.2B parameter encoder model for English to score model generations given a prefix for decoding from the [RankGen paper](https://arxiv.org/abs/2205.09726)
[Dipper](#dipper-checkpoints) | 11B parameter paraphrase generation model from the [Dipper paper](https://arxiv.org/abs/2303.13408)


### Public Research Models
Expand Down Expand Up @@ -280,5 +285,35 @@ Flan-T5 XL | [t5_1_1_xl.gin](https://github.com/google-research/t5x/blob/main
Flan-T5 XXL | [t5_1_1_xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/xxl.gin) | [gs://t5-data/pretrained_models/t5x/flan_t5_xxl/checkpoint_1114000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/flan_t5_xxl/checkpoint_1114000)


#### UL2 Checkpoints

Checkpoints for 20B pretrained and FLAN-based instruction-tuned models using the
UL2 objective from [UL2 paper](https://arxiv.org/abs/2205.05131). Checkpoints
are released at
https://github.com/google-research/google-research/tree/master/ul2#checkpoints.

#### BigScience Checkpoints

Checkpoints from the [BigScience paper](https://arxiv.org/abs/2204.05832),
released at
https://github.com/bigscience-workshop/architecture-objective/tree/main#checkpoints.

#### FLIP Checkpoints

Language-Image models trained with an alternative to CLIP, presented in the
[FLIP paper](https://arxiv.org/abs/2212.00794). Checkpoints are released at
https://github.com/facebookresearch/flip#results-and-pre-trained-flip-models.

#### RankGen Checkpoints

1.2B parameter encoder model for English to score model generations given a
prefix for decoding from the [RankGen paper](https://arxiv.org/abs/2205.09726).
Checkpoints are released at
https://github.com/google-research/google-research/tree/master/rankgen.

#### Dipper Checkpoints

11B parameter paraphrase generation model from the
[Dipper paper](https://arxiv.org/abs/2303.13408). Checkpoints are released at
https://github.com/google-research/google-research/tree/master/dipper.

4 changes: 3 additions & 1 deletion docs/usage/eval.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,9 @@ SeqIO Task will be used:
[`t5x/configs/models/t5_1_1_small.gin`](https://github.com/google-research/t5x/blob/main/t5x/google/examples/flaxformer_t5/configs/models/t5_1_1_small.gin).

If you would like to fine-tune your model before evaluation, please follow the
[fine-tuning](finetune.md) tutorial, and continue to Step 2.
[fine-tuning](finetune.md) tutorial, and continue to Step 2. A list of all
available pre-trained models (with model checkpoints and Gin config files) are
available in the [Models](https://github.com/google-research/t5x/blob/main/docs/models.md) documentation.

## Step 2: Choose a SeqIO Task/Mixture

Expand Down
5 changes: 0 additions & 5 deletions docs/usage/partitioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -413,11 +413,6 @@ partitioning.PjitPartitioner.logical_axis_rules = [
]
```

## Recommended reading

[Basic model and data partitioning for inference in P5X](https://docs.google.com/document/d/1bU8IuufbgkY0Wg8okyrEPnu3S5rfYGqVPMbUmeorFVo/edit)
by brandonthorpe@, luyaoxu@

<!-- Reference links -->

<!---TODO(b/214235006): Use symbol reference instead of line number+rcl.-->
Expand Down
22 changes: 3 additions & 19 deletions docs/usage/pretrain.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,25 +25,9 @@ span corruption pretraining objective is also showcased.

To train a model, you need a Gin config file that defines the model params. For
your convenience, Gin configs for common models have been made available for use
in T5X. Following is a list of these models and their Gin locations.

Model | Gin File Location
------------------------------------- | -----------------
T5 Small | [t5_1_0/small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/small.gin)
T5 Base | [t5_1_0/base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/base.gin)
T5 Large | [t5_1_0/large.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/large.gin)
T5 3B | [t5_1_0/3B.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/3B.gin)
T5 11B | [t5_1_0/11B.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/11B.gin)
T5 1.1 Small | [t5_1_1/small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/small.gin)
T5 1.1 Base | [t5_1_1/base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/base.gin)
T5 1.1 Large | [t5_1_1/large.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/large.gin)
T5 1.1 XL | [t5_1_1/xl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/xl.gin)
T5 1.1 XXL | [t5_1_1/xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/xxl.gin)
MT5 Small | [mt5/small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/small.gin)
MT5 Base | [mt5/base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/base.gin)
MT5 Large | [mt5/large.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/large.gin)
MT5 XL | [mt5/xl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/xl.gin)
MT5 XXL | [mt5/xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/xxl.gin)
in T5X. A list of all the available pre-trained models (with model checkpoints
and Gin config files) are available in the [Models](https://github.com/google-research/t5x/blob/main/docs/models.md)
documentation.

For the example run, you will use the T5 1.1 Small model. The Gin file for this
model is located at
Expand Down

0 comments on commit 82283f7

Please sign in to comment.