Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Reward Predictor #16

Open
eunjuyummy opened this issue Mar 7, 2024 · 1 comment
Open

Using Reward Predictor #16

eunjuyummy opened this issue Mar 7, 2024 · 1 comment

Comments

@eunjuyummy
Copy link

Hi!
I'm following the piece-by-piece runs and it's working fine up to the pretrain_reward_predictor is working fine.
but the part where I use the generated reward predictor to learn the policy doesn't seem to be working properly.

This is the code I ran,
python3 run.py train_policy_with_preferences BreakoutNoFrameskip-v4 --load_reward_predictor_ckpt_dir runs/breakout-initial_predictor_3fca07c/reward_predictor_checkpoints --n_envs 16 --million_timesteps 0.1
and when I run it, it brings up the preference collection window again.

How should I write the code to learn using the newly created reward predictor?

@eunjuyummy eunjuyummy changed the title Using Predictor Using Reward Predictor Mar 7, 2024
@mrahtz
Copy link
Owner

mrahtz commented Mar 8, 2024

Hmm, it's been such a long time since I wrote this code that I can't remember how it all fits together now, and I don't think I'll have time any time soon to dig into it again. Sorry not to be of more help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants