Update README.md (#2180)

* Update README.md * Update README.md * Update README.md Co-authored-by: Edward Beeching <[email protected]> * Update README.md * Update README.md --------- Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Edward Beeching <[email protected]>
huggingface · Oct 11, 2024 · f436c3e · f436c3e
1 parent cd1aa6b
commit f436c3e
Showing 1 changed file with 14 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -1,8 +1,10 @@
+# TRL - Transformer Reinforcement Learning
+
 <div style="text-align: center">
 <img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/trl_banner_dark.png">
 </div>
 
-# TRL - Transformer Reinforcement Learning
+<hr> <br>
 
 <h3 align="center">
     <p>Full stack library to post-train large language models.</p>
@@ -23,7 +25,7 @@
 
 ## What is it?
 
-TRL is a library that post-trains LLMs and diffusion models using methods such as Supervised Fine-Tuning (SFT), Proximal Policy Optimization (PPO), and Direct Preference Optimization (DPO). 
+TRL is a library that post-trains LLMs and diffusion models using methods such as Supervised Fine-Tuning (SFT), Proximal Policy Optimization (PPO), and Direct Preference Optimization (DPO).
 
 The library is built on top of [🤗 Transformers](https://github.com/huggingface/transformers) and is compatible with any model architecture available there.
 
@@ -32,11 +34,14 @@ The library is built on top of [🤗 Transformers](https://github.com/huggingfac
 
 - **`Efficient and scalable`**: 
     - [🤗 Accelerate](https://github.com/huggingface/accelerate) is the backbone of TRL that models training to scale from a single GPU to a large-scale multi-node cluster with methods such as DDP and DeepSpeed.
-    - [`PEFT`](https://github.com/huggingface/peft) is fully integrated and allows to train even the largest models on modest hardware with quantization and methods such as LoRA or QLoRA.
+    - [`PEFT`](https://github.com/huggingface/peft) is fully integrated and allows users to train even the largest models on modest hardware with quantization and methods such as LoRA or QLoRA.
     - [Unsloth](https://github.com/unslothai/unsloth) is also integrated and allows to significantly speed up training with dedicated kernels.
+
 - **`CLI`**: With the [CLI](https://huggingface.co/docs/trl/clis) you can fine-tune and chat with LLMs without writing any code using a single command and a flexible config system.
 - **`Trainers`**: The trainer classes are an abstraction to apply many fine-tuning methods with ease such as the [`SFTTrainer`](https://huggingface.co/docs/trl/sft_trainer), [`DPOTrainer`](https://huggingface.co/docs/trl/dpo_trainer), [`RewardTrainer`](https://huggingface.co/docs/trl/reward_trainer), [`PPOTrainer`](https://huggingface.co/docs/trl/ppov2_trainer), and [`ORPOTrainer`](https://huggingface.co/docs/trl/orpo_trainer).
+
 - **`AutoModels`**: The [`AutoModelForCausalLMWithValueHead`](https://huggingface.co/docs/trl/models#trl.AutoModelForCausalLMWithValueHead) & [`AutoModelForSeq2SeqLMWithValueHead`](https://huggingface.co/docs/trl/models#trl.AutoModelForSeq2SeqLMWithValueHead) classes add an additional value head to the model which allows to train them with RL algorithms such as PPO.
+
 - **`Examples`**: Fine-tune Llama for chat applications or apply full RLHF using adapters etc, following the [examples](https://github.com/huggingface/trl/tree/main/examples).
 
 ## Installation
@@ -95,7 +100,7 @@ For more flexibility and control over training, TRL provides dedicated trainer c
 
 ### `SFTTrainer`
 
-Here is a basic example on how to use the `SFTTrainer`:
+Here is a basic example of how to use the `SFTTrainer`:
 
 ```python
 from trl import SFTConfig, SFTTrainer
@@ -114,7 +119,7 @@ trainer.train()
 
 ### `RewardTrainer`
 
-Here is a basic example on how to use the `RewardTrainer`:
+Here is a basic example of how to use the `RewardTrainer`:
 
 ```python
 from trl import RewardConfig, RewardTrainer
@@ -215,3 +220,7 @@ make dev
   howpublished = {\url{https://github.com/huggingface/trl}}
 }
 ```
+
+## License
+
+This repository's source code is available under the [Apache-2.0 License](LICENSE).