Updated README.md with CLI examples and additional usage instructions (…

…#2199) * Updated README.md with CLI examples and additional usage instructions Added Command Line Interface (CLI) examples for SFT, DPO, and Chat features. Improved the "How to Use" section by providing code examples for SFTTrainer and RewardTrainer. Included installation instructions for both Python Package and source-based installation. Refined highlights to better showcase efficiency and scalability features. Updated the repository clone instructions for working with examples. Added new links to CLI documentation and contribution guide for better navigation. * Update README.md * Update README.md Co-authored-by: lewtun <[email protected]> * Update README.md Co-authored-by: lewtun <[email protected]> * Update README.md * update badges --------- Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: lewtun <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]>
huggingface · Oct 11, 2024 · 6004e03 · 6004e03
1 parent f436c3e
commit 6004e03
Showing 1 changed file with 17 additions and 28 deletions.
diff --git a/README.md b/README.md
@@ -1,54 +1,43 @@
 # TRL - Transformer Reinforcement Learning
 
 <div style="text-align: center">
-<img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/trl_banner_dark.png">
+<img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/trl_banner_dark.png" alt="TRL Banner">
 </div>
 
 <hr> <br>
 
 <h3 align="center">
-    <p>Full stack library to post-train large language models.</p>
+    <p>A comprehensive library to post-train foundation models</p>
 </h3>
 
 <p align="center">
-    <a href="https://github.com/huggingface/trl/blob/main/LICENSE">
-        <img alt="License" src="https://img.shields.io/github/license/huggingface/trl.svg?color=blue">
-    </a>
-    <a href="https://huggingface.co/docs/trl/index">
-        <img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/trl/index.svg?down_color=red&down_message=offline&up_message=online">
-    </a>
-    <a href="https://github.com/huggingface/trl/releases">
-        <img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/trl.svg">
-    </a>
+    <a href="https://github.com/huggingface/trl/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/github/license/huggingface/trl.svg?color=blue"></a>
+    <a href="https://huggingface.co/docs/trl/index"><img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/trl/index.svg?down_color=red&down_message=offline&up_color=blue&up_message=online"></a>
+    <a href="https://github.com/huggingface/trl/releases"><img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/trl.svg"></a>
 </p>
 
+## Overview
 
-## What is it?
-
-TRL is a library that post-trains LLMs and diffusion models using methods such as Supervised Fine-Tuning (SFT), Proximal Policy Optimization (PPO), and Direct Preference Optimization (DPO).
-
-The library is built on top of [🤗 Transformers](https://github.com/huggingface/transformers) and is compatible with any model architecture available there.
-
+TRL is a cutting-edge library designed for post-training foundation models using advanced techniques like Supervised Fine-Tuning (SFT), Proximal Policy Optimization (PPO), and Direct Preference Optimization (DPO). Built on top of the [🤗 Transformers](https://github.com/huggingface/transformers) ecosystem, TRL supports a variety of model architectures and modalities, and can be scaled-up across various hardware setups.
 
 ## Highlights
 
-- **`Efficient and scalable`**: 
-    - [🤗 Accelerate](https://github.com/huggingface/accelerate) is the backbone of TRL that models training to scale from a single GPU to a large-scale multi-node cluster with methods such as DDP and DeepSpeed.
-    - [`PEFT`](https://github.com/huggingface/peft) is fully integrated and allows users to train even the largest models on modest hardware with quantization and methods such as LoRA or QLoRA.
-    - [Unsloth](https://github.com/unslothai/unsloth) is also integrated and allows to significantly speed up training with dedicated kernels.
-
-- **`CLI`**: With the [CLI](https://huggingface.co/docs/trl/clis) you can fine-tune and chat with LLMs without writing any code using a single command and a flexible config system.
-- **`Trainers`**: The trainer classes are an abstraction to apply many fine-tuning methods with ease such as the [`SFTTrainer`](https://huggingface.co/docs/trl/sft_trainer), [`DPOTrainer`](https://huggingface.co/docs/trl/dpo_trainer), [`RewardTrainer`](https://huggingface.co/docs/trl/reward_trainer), [`PPOTrainer`](https://huggingface.co/docs/trl/ppov2_trainer), and [`ORPOTrainer`](https://huggingface.co/docs/trl/orpo_trainer).
+- **Efficient and scalable**: 
+    - Leverages [🤗 Accelerate](https://github.com/huggingface/accelerate) to scale from single GPU to multi-node clusters using methods like DDP and DeepSpeed.
+    - Full integration with [`PEFT`](https://github.com/huggingface/peft) enables training on large models with modest hardware via quantization and LoRA/QLoRA.
+    - Integrates [Unsloth](https://github.com/unslothai/unsloth) for accelerating training using optimized kernels.
+
+- **Command Line Interface (CLI)**: A simple interface lets you fine-tune and interact with models without needing to write code.
 
-- **`AutoModels`**: The [`AutoModelForCausalLMWithValueHead`](https://huggingface.co/docs/trl/models#trl.AutoModelForCausalLMWithValueHead) & [`AutoModelForSeq2SeqLMWithValueHead`](https://huggingface.co/docs/trl/models#trl.AutoModelForSeq2SeqLMWithValueHead) classes add an additional value head to the model which allows to train them with RL algorithms such as PPO.
+- **Trainers**: Various fine-tuning methods are easily accessible via trainers like [`SFTTrainer`](https://huggingface.co/docs/trl/sft_trainer), [`DPOTrainer`](https://huggingface.co/docs/trl/dpo_trainer), [`RewardTrainer`](https://huggingface.co/docs/trl/reward_trainer), [`ORPOTrainer`](https://huggingface.co/docs/trl/orpo_trainer) and more.
 
-- **`Examples`**: Fine-tune Llama for chat applications or apply full RLHF using adapters etc, following the [examples](https://github.com/huggingface/trl/tree/main/examples).
+- **AutoModels**: Use pre-defined model classes like [`AutoModelForCausalLMWithValueHead`](https://huggingface.co/docs/trl/models#trl.AutoModelForCausalLMWithValueHead) to simplify reinforcement learning (RL) with LLMs.
 
 ## Installation
 
-### Python package
+### Python Package
 
-Install the library with `pip`:
+Install the library using `pip`:
 
 ```bash
 pip install trl