diff --git a/README.md b/README.md index 8a408cd07f..96f31f3190 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,10 @@ +# TRL - Transformer Reinforcement Learning +
Full stack library to post-train large language models.
@@ -23,7 +25,7 @@ ## What is it? -TRL is a library that post-trains LLMs and diffusion models using methods such as Supervised Fine-Tuning (SFT), Proximal Policy Optimization (PPO), and Direct Preference Optimization (DPO). +TRL is a library that post-trains LLMs and diffusion models using methods such as Supervised Fine-Tuning (SFT), Proximal Policy Optimization (PPO), and Direct Preference Optimization (DPO). The library is built on top of [🤗 Transformers](https://github.com/huggingface/transformers) and is compatible with any model architecture available there. @@ -32,11 +34,14 @@ The library is built on top of [🤗 Transformers](https://github.com/huggingfac - **`Efficient and scalable`**: - [🤗 Accelerate](https://github.com/huggingface/accelerate) is the backbone of TRL that models training to scale from a single GPU to a large-scale multi-node cluster with methods such as DDP and DeepSpeed. - - [`PEFT`](https://github.com/huggingface/peft) is fully integrated and allows to train even the largest models on modest hardware with quantization and methods such as LoRA or QLoRA. + - [`PEFT`](https://github.com/huggingface/peft) is fully integrated and allows users to train even the largest models on modest hardware with quantization and methods such as LoRA or QLoRA. - [Unsloth](https://github.com/unslothai/unsloth) is also integrated and allows to significantly speed up training with dedicated kernels. + - **`CLI`**: With the [CLI](https://huggingface.co/docs/trl/clis) you can fine-tune and chat with LLMs without writing any code using a single command and a flexible config system. - **`Trainers`**: The trainer classes are an abstraction to apply many fine-tuning methods with ease such as the [`SFTTrainer`](https://huggingface.co/docs/trl/sft_trainer), [`DPOTrainer`](https://huggingface.co/docs/trl/dpo_trainer), [`RewardTrainer`](https://huggingface.co/docs/trl/reward_trainer), [`PPOTrainer`](https://huggingface.co/docs/trl/ppov2_trainer), and [`ORPOTrainer`](https://huggingface.co/docs/trl/orpo_trainer). + - **`AutoModels`**: The [`AutoModelForCausalLMWithValueHead`](https://huggingface.co/docs/trl/models#trl.AutoModelForCausalLMWithValueHead) & [`AutoModelForSeq2SeqLMWithValueHead`](https://huggingface.co/docs/trl/models#trl.AutoModelForSeq2SeqLMWithValueHead) classes add an additional value head to the model which allows to train them with RL algorithms such as PPO. + - **`Examples`**: Fine-tune Llama for chat applications or apply full RLHF using adapters etc, following the [examples](https://github.com/huggingface/trl/tree/main/examples). ## Installation @@ -95,7 +100,7 @@ For more flexibility and control over training, TRL provides dedicated trainer c ### `SFTTrainer` -Here is a basic example on how to use the `SFTTrainer`: +Here is a basic example of how to use the `SFTTrainer`: ```python from trl import SFTConfig, SFTTrainer @@ -114,7 +119,7 @@ trainer.train() ### `RewardTrainer` -Here is a basic example on how to use the `RewardTrainer`: +Here is a basic example of how to use the `RewardTrainer`: ```python from trl import RewardConfig, RewardTrainer @@ -215,3 +220,7 @@ make dev howpublished = {\url{https://github.com/huggingface/trl}} } ``` + +## License + +This repository's source code is available under the [Apache-2.0 License](LICENSE).