[Bug]: DQN resets exploration rate from saved model #1629

AMR-aa1405465 · 2023-07-26T20:58:31Z

🐛 Bug

Hello everyone, I appreciate your work. I have a little bit embarrassing problem =|

I have recently encountered an issue while attempting to train, save, and reload my DQN model within a Gymnasium environment with only one environment. The problem lies in the knowledge transfer from the saved to the loaded model. During the initial training phase on (CartPole-v1), I successfully reached a reward of 200 after 100 K timesteps.

However, when I reload the model for further training, I expect to start with a reward of around 200 and retain the hyperparameters set during the initial training (e.g., exploration rate). Unfortunately, this is not happening as expected.

In my case, the learning seems to start from the beginning as I see small rewards only, in addition, the exploration rate gets back to 1, not 0.05 as before.

Unlike DQN, I have tried this feature before on PPO and was working fine.

To fix this issue, I tried the following without effect:

Saving/reloading the replay buffer.
Manually setting the exploration rate on the loaded model to 0.05.
I set the env (i.e., model.set_env()) with dummyVecEnv and normal env.
Used model.set_parameters() instead of model.load()
I saw the different issues about knowledge transfer that people encountered and applied their fixes when applicable (such as Does using model.save() and then using model.load() resume training exactly from the point where it was left? #29, Training the same model after loading. hill-a/stable-baselines#30, transferrable models #70)

To Reproduce

import gymnasium as gym

from stable_baselines3 import DQN
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.vec_env import DummyVecEnv

env = gym.make("CartPole-v1")

model = DQN("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=200000, log_interval=4)
model.save("dqn_cartpole")
model.save_replay_buffer("dqn_cartpole-buffer")

del model # remove to demonstrate saving and loading
# Here, I observe the last trend of rewards achieved by the agent

model = DQN.load("dqn_cartpole")
k = Monitor(gym.make("CartPole-v1"))
model.set_env(DummyVecEnv([lambda : k]))
model.load_replay_buffer("dqn_cartpole-buffer")
model.learn(total_timesteps=100000, log_interval=4)
# Here, please observe the rewards and the exploration rate of the agent, resets to 1 and rewards drops

Relevant log output / Error message

No response

System Info

OS: Linux-5.19.0-45-generic-x86_64-with-glibc2.35 # 46~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jun 7 15:06:04 UTC 20
Python: 3.11.3
Stable-Baselines3: 2.0.0
PyTorch: 2.0.1
GPU Enabled: True
Numpy: 1.24.3
Cloudpickle: 2.2.1
Gymnasium: 0.28.1
OpenAI Gym: 0.26.2

Checklist

My issue does not relate to a custom gym environment. (Use the custom gym env template instead)
I have checked that there is no similar issue in the repo
I have read the documentation
I have provided a minimal and working example to reproduce the bug
I've used the markdown code blocks for both code and stack traces.

The text was updated successfully, but these errors were encountered:

araffin · 2023-07-27T07:05:19Z

Hello,
you are probably missing the reset_num_timesteps=False parameter (see doc).

Also related, why changing the exploration rate alone doesn't work (you need to change the schedule): #735 (comment)

Also related: #529

araffin · 2023-07-27T07:09:59Z

Looks like a duplicate of #597 (comment)
also related (for running multiple times learn()): #957

AMR-aa1405465 · 2023-07-27T07:25:22Z

Yup, reset_num_timesteps=False did the trick =D
Thanks for the help mate

AMR-aa1405465 added the bug Something isn't working label Jul 26, 2023

AMR-aa1405465 changed the title ~~[Bug]: bug title~~ [Bug]: DQN not transferring the knowledge from saved model Jul 26, 2023

araffin added question Further information is requested and removed bug Something isn't working labels Jul 27, 2023

araffin changed the title ~~[Bug]: DQN not transferring the knowledge from saved model~~ [Bug]: DQN resets exploration rate from saved model Jul 27, 2023

AMR-aa1405465 closed this as completed Jul 27, 2023

araffin added the duplicate This issue or pull request already exists label Jul 27, 2023

araffin mentioned this issue Oct 9, 2024

[Question] How to change my loaded model's "exploration_initial_eps", "exploration_final_eps", "exploration_fraction" parameters? #2018

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: DQN resets exploration rate from saved model #1629

[Bug]: DQN resets exploration rate from saved model #1629

AMR-aa1405465 commented Jul 26, 2023 •

edited by araffin

Loading

araffin commented Jul 27, 2023 •

edited

Loading

araffin commented Jul 27, 2023

AMR-aa1405465 commented Jul 27, 2023

[Bug]: DQN resets exploration rate from saved model #1629

[Bug]: DQN resets exploration rate from saved model #1629

Comments

AMR-aa1405465 commented Jul 26, 2023 • edited by araffin Loading

🐛 Bug

To Reproduce

Relevant log output / Error message

System Info

Checklist

araffin commented Jul 27, 2023 • edited Loading

araffin commented Jul 27, 2023

AMR-aa1405465 commented Jul 27, 2023

AMR-aa1405465 commented Jul 26, 2023 •

edited by araffin

Loading

araffin commented Jul 27, 2023 •

edited

Loading