You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!
I'm following the piece-by-piece runs and it's working fine up to the pretrain_reward_predictor is working fine.
but the part where I use the generated reward predictor to learn the policy doesn't seem to be working properly.
This is the code I ran, python3 run.py train_policy_with_preferences BreakoutNoFrameskip-v4 --load_reward_predictor_ckpt_dir runs/breakout-initial_predictor_3fca07c/reward_predictor_checkpoints --n_envs 16 --million_timesteps 0.1
and when I run it, it brings up the preference collection window again.
How should I write the code to learn using the newly created reward predictor?
The text was updated successfully, but these errors were encountered:
Hmm, it's been such a long time since I wrote this code that I can't remember how it all fits together now, and I don't think I'll have time any time soon to dig into it again. Sorry not to be of more help!
Hi!
I'm following the piece-by-piece runs and it's working fine up to the pretrain_reward_predictor is working fine.
but the part where I use the generated reward predictor to learn the policy doesn't seem to be working properly.
This is the code I ran,
python3 run.py train_policy_with_preferences BreakoutNoFrameskip-v4 --load_reward_predictor_ckpt_dir runs/breakout-initial_predictor_3fca07c/reward_predictor_checkpoints --n_envs 16 --million_timesteps 0.1
and when I run it, it brings up the preference collection window again.
How should I write the code to learn using the newly created reward predictor?
The text was updated successfully, but these errors were encountered: