🕊️ Migration `PPOv2` -> `PPO` #2174

qgallouedec · 2024-10-04T15:04:51Z

What does this PR do?

Follows #2016

Fixes # (issue)

It basically:

Deletes PPO (including trainer, config, tests, examples, but not all, see my comment)
Renames PPOv2 to PPO (in trainer, config, examples, tests, doc, etc.)
Creates an alias PPOv2 for PPO

I've done my best, but it may break some code. To put things in perspective, PPO isn't as popular as it used to be, at the time of writing, the last PPO model created on the hub was two weeks ago.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2024-10-04T15:08:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…to migration-ppo

qgallouedec · 2024-10-08T13:25:00Z

While working on this, I noticed that most of our documentation and notebooks are based on PPO and may be outdated, relying on older datasets, models, references, etc. I believe updating all of our examples and guides falls outside the scope of this PR.

I recommend the following:

For now, keep the existing outdated documentation, guides, and notebooks as they are.
Begin a separate effort to rewrite these materials in future PR(s), primarily starting from scratch.

docs/source/detoxifying_a_lm.mdx

Co-authored-by: Edward Beeching <[email protected]>

qgallouedec · 2024-10-11T14:10:04Z

trl/trainer/ppo_config.py

@@ -1,4 +1,4 @@
-# Copyright 2022 The HuggingFace Team. All rights reserved.


This one is just the renaming of ppov2_config. the diff view makes no sense

qgallouedec · 2024-10-11T14:10:52Z

tests/test_ppo_trainer.py

@@ -11,1271 +11,53 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.


This one is just the tests/test_ppov2_trainer.py renamed. The diff view makes no sense

edbeeching

LGTM apart from a comment about the examples and 1 change

trl/trainer/ppov2_config.py

docs/source/example_overview.md

lewtun

Thanks for the migration @qgallouedec ! Overall LGTM with some nits about the docs since we can take this opportunity to nuke a bunch of outdated stuff

docs/source/customization.mdx

docs/source/detoxifying_a_lm.mdx

trl/trainer/ppo_config.py

lewtun · 2024-10-11T14:20:54Z

trl/trainer/ppo_config.py

-    horizon: float = 10000.0
-    gamma: float = 1.0
-    lam: float = 0.95
+    exp_name: str = os.path.basename(__file__)[: -len(".py")]


Perhaps it's a bug in the GitHub diff, but this looks repeated from above

Co-authored-by: Edward Beeching <[email protected]>

Co-authored-by: lewtun <[email protected]>

…to migration-ppo

sygi · 2024-10-18T08:36:39Z

Would it be possible to keep the "old" PPO code around until the new trainer achieves feature parity (in particular peft and arbitrary reward support) with the old one?

Void-025 · 2024-10-31T20:48:34Z

I would also like to have the old PPO code kept around, I was previously using the old ppotrainer.step() function to provide my own reward values for each query/response pair, but there doesn't seem to be an equivalent in PPOv2, unless I'm missing some other way of doing this?

qgallouedec · 2024-11-01T16:50:20Z

You should use trl==0.11

qgallouedec added 7 commits October 4, 2024 14:54

delete old ppo

8703f27

rename ppov2 files

12c1967

PPOv2 -> PPO

dad4b8b

rm old doc

5553df7

rename ppo doc file

2371762

rm old test

abea746

rename test

1cde8b8

qgallouedec changed the title ~~Migration PPOv2 -> PPO~~ 🕊️ Migration PPOv2 -> PPO Oct 4, 2024

qgallouedec and others added 15 commits October 4, 2024 15:31

re-add v2 with deprecation

be9eb85

style

1c07333

start update customization

655af26

Merge branch 'main' into migration-ppo

ad8ddac

Merge branch 'main' into migration-ppo

4ee7e55

Merge branch 'migration-ppo' of https://github.com/huggingface/trl in…

adbc1a0

…to migration-ppo

Merge branch 'main' into migration-ppo

c59e636

Merge branch 'main' into migration-ppo

de20892

Lion

6babb75

Finish update customization

b7d0008

Merge branch 'main' into migration-ppo

d79e5ab

remove ppo_multi_adaptater

ff56032

remove ppo example

b5c725c

update some doc

447de7a

Merge branch 'migration-ppo' of https://github.com/huggingface/trl in…

111c5fb

…to migration-ppo

qgallouedec added 3 commits October 8, 2024 13:34

rm test no peft

7e32b7b

rm hello world

5ad70d7

processing class

45f4dff

qgallouedec marked this pull request as ready for review October 8, 2024 14:25

qgallouedec requested a review from lewtun October 8, 2024 14:36

qgallouedec requested review from edbeeching and kashif October 8, 2024 14:36

edbeeching reviewed Oct 8, 2024

View reviewed changes

docs/source/detoxifying_a_lm.mdx Outdated Show resolved Hide resolved

qgallouedec and others added 4 commits October 8, 2024 16:41

Update docs/source/detoxifying_a_lm.mdx

1536812

Co-authored-by: Edward Beeching <[email protected]>

Merge branch 'main' into migration-ppo

81cbd3a

Merge branch 'main' into migration-ppo

6d17e2c

Merge branch 'main' into migration-ppo

babaae2

qgallouedec commented Oct 11, 2024

View reviewed changes

edbeeching approved these changes Oct 11, 2024

View reviewed changes

trl/trainer/ppov2_config.py Outdated Show resolved Hide resolved

docs/source/example_overview.md Show resolved Hide resolved

lewtun approved these changes Oct 11, 2024

View reviewed changes

qgallouedec and others added 12 commits October 11, 2024 16:35

Update trl/trainer/ppov2_config.py

c10918f

Co-authored-by: Edward Beeching <[email protected]>

Update docs/source/customization.mdx

1762245

Co-authored-by: lewtun <[email protected]>

Update docs/source/detoxifying_a_lm.mdx

b78995c

Co-authored-by: lewtun <[email protected]>

po to example overview

fd77f55

drop lion

f43896a

remove "Use 8-bit optimizer"

9bfab24

Update docs/source/customization.mdx

3f2cf5e

Update docs/source/customization.mdx

602d197

Co-authored-by: lewtun <[email protected]>

it applies to all trainers

59515ca

Merge branch 'migration-ppo' of https://github.com/huggingface/trl in…

f92017c

…to migration-ppo

Merge branch 'main' into migration-ppo

c4b90ed

Merge branch 'main' into migration-ppo

d8bed84

qgallouedec merged commit 70036bf into main Oct 11, 2024
9 of 10 checks passed

qgallouedec deleted the migration-ppo branch October 11, 2024 15:28

This was referenced Nov 5, 2024

PPOConfig.__init__() got an unexpected keyword argument 'model_name' #2314

Closed

Multiple Errors with PPOTrainer. error in ppo_trainer.dataloader #2340

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🕊️ Migration `PPOv2` -> `PPO` #2174

🕊️ Migration `PPOv2` -> `PPO` #2174

qgallouedec commented Oct 4, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 4, 2024

qgallouedec commented Oct 8, 2024 •

edited

Loading

qgallouedec Oct 11, 2024

qgallouedec Oct 11, 2024

edbeeching left a comment

lewtun left a comment

lewtun Oct 11, 2024

sygi commented Oct 18, 2024

Void-025 commented Oct 31, 2024

qgallouedec commented Nov 1, 2024

		@@ -1,4 +1,4 @@
		# Copyright 2022 The HuggingFace Team. All rights reserved.

		@@ -11,1271 +11,53 @@
		# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

🕊️ Migration PPOv2 -> PPO #2174

🕊️ Migration PPOv2 -> PPO #2174

Conversation

qgallouedec commented Oct 4, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Oct 4, 2024

qgallouedec commented Oct 8, 2024 • edited Loading

qgallouedec Oct 11, 2024

Choose a reason for hiding this comment

qgallouedec Oct 11, 2024

Choose a reason for hiding this comment

edbeeching left a comment

Choose a reason for hiding this comment

lewtun left a comment

Choose a reason for hiding this comment

lewtun Oct 11, 2024

Choose a reason for hiding this comment

sygi commented Oct 18, 2024

Void-025 commented Oct 31, 2024

qgallouedec commented Nov 1, 2024

🕊️ Migration `PPOv2` -> `PPO` #2174

🕊️ Migration `PPOv2` -> `PPO` #2174

qgallouedec commented Oct 4, 2024 •

edited

Loading

qgallouedec commented Oct 8, 2024 •

edited

Loading