Fix RLOO checkpointing #2114

bartoszzuk · 2024-09-24T16:34:01Z

This PR fixes RLOO checkpointing (in the same way as a recent fix for PPOv2 PR #2080).

This is needed after changes to _save_checkpoint method introduced in transformers v4.45.0.dev. Specifically we will get KeyError: 'TrainerControl' during saving of the trainer state (here is the exact line causing the issue). By passing stateful_callbacks to OnlineTrainerState explicitly the TrainerControl object is stored and can be properly accessed in _save_checkpoint.

qgallouedec · 2024-09-24T16:36:23Z

Nice, thanks @bartoszzuk. Without your fix, does it cause any error when running RLOO?

qgallouedec · 2024-09-24T16:36:53Z

make sure to run make precommit by the way to make the CI happy

HuggingFaceDocBuilderDev · 2024-09-24T16:40:02Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

bartoszzuk · 2024-09-24T16:42:18Z

@qgallouedec Yes when using transformers v4.45.0.dev I'm getting

...
File "/usr/local/lib/python3.10/dist-packages/trl/trainer/rloo_trainer.py", line 449, in train
  self._save_checkpoint(model, trial=None, metrics=metrics)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3016, in _save_checkpoint
  if isinstance(self.state.stateful_callbacks[cb_name], list):
KeyError: 'TrainerControl'

Because the self.state.stateful_callbacks is an empty dict

Sorry, totally forgot about make precommit will fix it ASAP

qgallouedec · 2024-09-24T16:53:57Z

Thanks. I'm not sure to understand why this failing mode doesn't break our CI

sahandrez · 2024-09-25T15:42:18Z

I am not sure if this is related, but I have observed a strange behaviour in RLOO checkpointing. For example, I have set it to checkpoint every 500 steps and it follows that for some time, but after a while it starts generating checkpoints every 2 steps. Is this an intended functionality?

lewtun

Hi @bartoszzuk thanks for the fix! Would you mind writing a regression test in test_rloo_trainer.py that fails on main but passes on your branch? That would help ensure future code changes don't accidentally introduce the bug again

bartoszzuk · 2024-10-06T07:39:34Z

Hey @lewtun, sorry for late response. I added a simple regression test for RLOO checkpointing. Hopefully it somewhat follows the conventions found in other tests (let me know If any improvements are required). The test should:

Succeed without the fix for transformers<4.45.0
Fail without the fix for transformers>=4.45.0
Succeed with the fix for transformers>=4.45.0

I also changed the test function to ensure that tokenizer matches SFT and Reward model.

qgallouedec · 2024-10-07T09:14:42Z

Succeed without the fix for transformers<4.45.0

✅

Fail without the fix for transformers>=4.45.0

✅

Succeed with the fix for transformers>=4.45.0

✅

qgallouedec

LGTM, thanks a lot @bartoszzuk, I'll merge as soon as the CI passes

lewtun

Thanks a lot for the fix and regression test @bartoszzuk! I think the CI is failing for some unrelated reason, so I've rerun it to be sure

qgallouedec · 2024-10-07T11:10:25Z

Yep, not related

* Fix RLOO checkpointing for transformers>=4.45.0 * Add missing import * Fix pre-commit issues * Added test for RLOO checkpointing * Ensure that tokenizer matches SFT and Reward model * Pre-commit formatting * processing class --------- Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]>

bartoszzuk added 3 commits September 24, 2024 18:02

Fix RLOO checkpointing for transformers>=4.45.0

04f32dc

Add missing import

2df3ac2

Merge branch 'main' into main

6987434

Fix pre-commit issues

af32e93

qgallouedec mentioned this pull request Sep 25, 2024

RLOO generating checkpoints every 2 steps #2124

Closed

Merge branch 'main' into main

69d6183

lewtun reviewed Sep 27, 2024

View reviewed changes

bartoszzuk added 3 commits October 6, 2024 09:25

Added test for RLOO checkpointing

18a246d

Ensure that tokenizer matches SFT and Reward model

5fa0fac

Pre-commit formatting

50f7a78

Merge branch 'main' into main

ace6591

qgallouedec and others added 2 commits October 7, 2024 11:15

Merge branch 'main' into main

8186e93

processing class

7c7cca3

qgallouedec approved these changes Oct 7, 2024

View reviewed changes

lewtun approved these changes Oct 7, 2024

View reviewed changes

qgallouedec linked an issue Oct 7, 2024 that may be closed by this pull request

RLOO generating checkpoints every 2 steps #2124

Closed

Merge branch 'main' into main

0be8979

qgallouedec merged commit 82ad390 into huggingface:main Oct 7, 2024
8 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix RLOO checkpointing #2114

Fix RLOO checkpointing #2114

bartoszzuk commented Sep 24, 2024

qgallouedec commented Sep 24, 2024

qgallouedec commented Sep 24, 2024

HuggingFaceDocBuilderDev commented Sep 24, 2024

bartoszzuk commented Sep 24, 2024

qgallouedec commented Sep 24, 2024

sahandrez commented Sep 25, 2024 •

edited

Loading

lewtun left a comment

bartoszzuk commented Oct 6, 2024

qgallouedec commented Oct 7, 2024 •

edited

Loading

qgallouedec left a comment

lewtun left a comment

qgallouedec commented Oct 7, 2024

Fix RLOO checkpointing #2114

Fix RLOO checkpointing #2114

Conversation

bartoszzuk commented Sep 24, 2024

qgallouedec commented Sep 24, 2024

qgallouedec commented Sep 24, 2024

HuggingFaceDocBuilderDev commented Sep 24, 2024

bartoszzuk commented Sep 24, 2024

qgallouedec commented Sep 24, 2024

sahandrez commented Sep 25, 2024 • edited Loading

lewtun left a comment

Choose a reason for hiding this comment

bartoszzuk commented Oct 6, 2024

qgallouedec commented Oct 7, 2024 • edited Loading

qgallouedec left a comment

Choose a reason for hiding this comment

lewtun left a comment

Choose a reason for hiding this comment

qgallouedec commented Oct 7, 2024

sahandrez commented Sep 25, 2024 •

edited

Loading

qgallouedec commented Oct 7, 2024 •

edited

Loading