Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: DQN resets exploration rate from saved model #1629

Closed
5 tasks done
AMR-aa1405465 opened this issue Jul 26, 2023 · 3 comments
Closed
5 tasks done

[Bug]: DQN resets exploration rate from saved model #1629

AMR-aa1405465 opened this issue Jul 26, 2023 · 3 comments
Labels
duplicate This issue or pull request already exists question Further information is requested

Comments

@AMR-aa1405465
Copy link

AMR-aa1405465 commented Jul 26, 2023

🐛 Bug

Hello everyone, I appreciate your work. I have a little bit embarrassing problem =|

I have recently encountered an issue while attempting to train, save, and reload my DQN model within a Gymnasium environment with only one environment. The problem lies in the knowledge transfer from the saved to the loaded model. During the initial training phase on (CartPole-v1), I successfully reached a reward of 200 after 100 K timesteps.

However, when I reload the model for further training, I expect to start with a reward of around 200 and retain the hyperparameters set during the initial training (e.g., exploration rate). Unfortunately, this is not happening as expected.

In my case, the learning seems to start from the beginning as I see small rewards only, in addition, the exploration rate gets back to 1, not 0.05 as before.

Unlike DQN, I have tried this feature before on PPO and was working fine.

To fix this issue, I tried the following without effect:

  1. Saving/reloading the replay buffer.
  2. Manually setting the exploration rate on the loaded model to 0.05.
  3. I set the env (i.e., model.set_env()) with dummyVecEnv and normal env.
  4. Used model.set_parameters() instead of model.load()
  5. I saw the different issues about knowledge transfer that people encountered and applied their fixes when applicable (such as Does using model.save() and then using model.load() resume training exactly from the point where it was left? #29, Training the same model after loading. hill-a/stable-baselines#30, transferrable models #70)

To Reproduce

import gymnasium as gym

from stable_baselines3 import DQN
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.vec_env import DummyVecEnv

env = gym.make("CartPole-v1")

model = DQN("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=200000, log_interval=4)
model.save("dqn_cartpole")
model.save_replay_buffer("dqn_cartpole-buffer")

del model # remove to demonstrate saving and loading
# Here, I observe the last trend of rewards achieved by the agent

model = DQN.load("dqn_cartpole")
k = Monitor(gym.make("CartPole-v1"))
model.set_env(DummyVecEnv([lambda : k]))
model.load_replay_buffer("dqn_cartpole-buffer")
model.learn(total_timesteps=100000, log_interval=4)
# Here, please observe the rewards and the exploration rate of the agent, resets to 1 and rewards drops

Relevant log output / Error message

No response

System Info

  • OS: Linux-5.19.0-45-generic-x86_64-with-glibc2.35 # 46~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jun 7 15:06:04 UTC 20
  • Python: 3.11.3
  • Stable-Baselines3: 2.0.0
  • PyTorch: 2.0.1
  • GPU Enabled: True
  • Numpy: 1.24.3
  • Cloudpickle: 2.2.1
  • Gymnasium: 0.28.1
  • OpenAI Gym: 0.26.2

Checklist

  • My issue does not relate to a custom gym environment. (Use the custom gym env template instead)
  • I have checked that there is no similar issue in the repo
  • I have read the documentation
  • I have provided a minimal and working example to reproduce the bug
  • I've used the markdown code blocks for both code and stack traces.
@AMR-aa1405465 AMR-aa1405465 added the bug Something isn't working label Jul 26, 2023
@AMR-aa1405465 AMR-aa1405465 changed the title [Bug]: bug title [Bug]: DQN not transferring the knowledge from saved model Jul 26, 2023
@araffin araffin added question Further information is requested and removed bug Something isn't working labels Jul 27, 2023
@araffin
Copy link
Member

araffin commented Jul 27, 2023

Hello,
you are probably missing the reset_num_timesteps=False parameter (see doc).

Also related, why changing the exploration rate alone doesn't work (you need to change the schedule): #735 (comment)

Also related: #529

@araffin
Copy link
Member

araffin commented Jul 27, 2023

Looks like a duplicate of #597 (comment)
also related (for running multiple times learn()): #957

@araffin araffin changed the title [Bug]: DQN not transferring the knowledge from saved model [Bug]: DQN resets exploration rate from saved model Jul 27, 2023
@AMR-aa1405465
Copy link
Author

Yup, reset_num_timesteps=False did the trick =D
Thanks for the help mate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants