Using Reward Predictor #16

eunjuyummy · 2024-03-07T13:20:58Z

Hi!
I'm following the piece-by-piece runs and it's working fine up to the pretrain_reward_predictor is working fine.
but the part where I use the generated reward predictor to learn the policy doesn't seem to be working properly.

This is the code I ran,
python3 run.py train_policy_with_preferences BreakoutNoFrameskip-v4 --load_reward_predictor_ckpt_dir runs/breakout-initial_predictor_3fca07c/reward_predictor_checkpoints --n_envs 16 --million_timesteps 0.1
and when I run it, it brings up the preference collection window again.

How should I write the code to learn using the newly created reward predictor?

The text was updated successfully, but these errors were encountered:

mrahtz · 2024-03-08T07:51:40Z

Hmm, it's been such a long time since I wrote this code that I can't remember how it all fits together now, and I don't think I'll have time any time soon to dig into it again. Sorry not to be of more help!

eunjuyummy changed the title ~~Using Predictor~~ Using Reward Predictor Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Reward Predictor #16

Using Reward Predictor #16

eunjuyummy commented Mar 7, 2024

mrahtz commented Mar 8, 2024

Using Reward Predictor #16

Using Reward Predictor #16

Comments

eunjuyummy commented Mar 7, 2024

mrahtz commented Mar 8, 2024