Skip to content

Commit

Permalink
Update examples and doc
Browse files Browse the repository at this point in the history
  • Loading branch information
qgallouedec committed Nov 22, 2024
1 parent f2c7794 commit a592521
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 6 deletions.
4 changes: 2 additions & 2 deletions docs/source/detoxifying_a_lm.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,8 @@ and the optimizer will take care of computing the gradients in `bfloat16` precis
</div>

```python
ref_policy = create_reference_model(model, num_shared_layers=6)
trainer = PPOTrainer(..., ref_policy=ref_policy)
ref_model = create_reference_model(model, num_shared_layers=6)
trainer = PPOTrainer(..., ref_model=ref_model)
```

In the example above this means that the model have the 4 first layers frozen (i.e. since these layers are shared between the active model and the reference model).
Expand Down
4 changes: 2 additions & 2 deletions examples/scripts/ppo/ppo.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,8 +154,8 @@ def tokenize(element):
trainer = PPOTrainer(
config=training_args,
processing_class=tokenizer,
policy=policy,
ref_policy=ref_policy,
model=policy,
ref_model=ref_policy,
reward_model=reward_model,
value_model=value_model,
train_dataset=train_dataset,
Expand Down
4 changes: 2 additions & 2 deletions examples/scripts/ppo/ppo_tldr.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,8 +165,8 @@ def tokenize(element):
trainer = PPOTrainer(
config=training_args,
processing_class=tokenizer,
policy=policy,
ref_policy=ref_policy,
model=policy,
ref_model=ref_policy,
reward_model=reward_model,
value_model=value_model,
train_dataset=train_dataset,
Expand Down

0 comments on commit a592521

Please sign in to comment.