Skip to content

Commit

Permalink
Update README.md (#2180)
Browse files Browse the repository at this point in the history
* Update README.md

* Update README.md

* Update README.md

Co-authored-by: Edward Beeching <[email protected]>

* Update README.md

* Update README.md

---------

Co-authored-by: Kashif Rasul <[email protected]>
Co-authored-by: Quentin Gallouédec <[email protected]>
Co-authored-by: Edward Beeching <[email protected]>
  • Loading branch information
4 people authored Oct 11, 2024
1 parent cd1aa6b commit f436c3e
Showing 1 changed file with 14 additions and 5 deletions.
19 changes: 14 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
# TRL - Transformer Reinforcement Learning

<div style="text-align: center">
<img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/trl_banner_dark.png">
</div>

# TRL - Transformer Reinforcement Learning
<hr> <br>

<h3 align="center">
<p>Full stack library to post-train large language models.</p>
Expand All @@ -23,7 +25,7 @@

## What is it?

TRL is a library that post-trains LLMs and diffusion models using methods such as Supervised Fine-Tuning (SFT), Proximal Policy Optimization (PPO), and Direct Preference Optimization (DPO).
TRL is a library that post-trains LLMs and diffusion models using methods such as Supervised Fine-Tuning (SFT), Proximal Policy Optimization (PPO), and Direct Preference Optimization (DPO).

The library is built on top of [🤗 Transformers](https://github.com/huggingface/transformers) and is compatible with any model architecture available there.

Expand All @@ -32,11 +34,14 @@ The library is built on top of [🤗 Transformers](https://github.com/huggingfac

- **`Efficient and scalable`**:
- [🤗 Accelerate](https://github.com/huggingface/accelerate) is the backbone of TRL that models training to scale from a single GPU to a large-scale multi-node cluster with methods such as DDP and DeepSpeed.
- [`PEFT`](https://github.com/huggingface/peft) is fully integrated and allows to train even the largest models on modest hardware with quantization and methods such as LoRA or QLoRA.
- [`PEFT`](https://github.com/huggingface/peft) is fully integrated and allows users to train even the largest models on modest hardware with quantization and methods such as LoRA or QLoRA.
- [Unsloth](https://github.com/unslothai/unsloth) is also integrated and allows to significantly speed up training with dedicated kernels.

- **`CLI`**: With the [CLI](https://huggingface.co/docs/trl/clis) you can fine-tune and chat with LLMs without writing any code using a single command and a flexible config system.
- **`Trainers`**: The trainer classes are an abstraction to apply many fine-tuning methods with ease such as the [`SFTTrainer`](https://huggingface.co/docs/trl/sft_trainer), [`DPOTrainer`](https://huggingface.co/docs/trl/dpo_trainer), [`RewardTrainer`](https://huggingface.co/docs/trl/reward_trainer), [`PPOTrainer`](https://huggingface.co/docs/trl/ppov2_trainer), and [`ORPOTrainer`](https://huggingface.co/docs/trl/orpo_trainer).

- **`AutoModels`**: The [`AutoModelForCausalLMWithValueHead`](https://huggingface.co/docs/trl/models#trl.AutoModelForCausalLMWithValueHead) & [`AutoModelForSeq2SeqLMWithValueHead`](https://huggingface.co/docs/trl/models#trl.AutoModelForSeq2SeqLMWithValueHead) classes add an additional value head to the model which allows to train them with RL algorithms such as PPO.

- **`Examples`**: Fine-tune Llama for chat applications or apply full RLHF using adapters etc, following the [examples](https://github.com/huggingface/trl/tree/main/examples).

## Installation
Expand Down Expand Up @@ -95,7 +100,7 @@ For more flexibility and control over training, TRL provides dedicated trainer c

### `SFTTrainer`

Here is a basic example on how to use the `SFTTrainer`:
Here is a basic example of how to use the `SFTTrainer`:

```python
from trl import SFTConfig, SFTTrainer
Expand All @@ -114,7 +119,7 @@ trainer.train()

### `RewardTrainer`

Here is a basic example on how to use the `RewardTrainer`:
Here is a basic example of how to use the `RewardTrainer`:

```python
from trl import RewardConfig, RewardTrainer
Expand Down Expand Up @@ -215,3 +220,7 @@ make dev
howpublished = {\url{https://github.com/huggingface/trl}}
}
```

## License

This repository's source code is available under the [Apache-2.0 License](LICENSE).

0 comments on commit f436c3e

Please sign in to comment.