You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello everyone, I appreciate your work. I have a little bit embarrassing problem =|
I have recently encountered an issue while attempting to train, save, and reload my DQN model within a Gymnasium environment with only one environment. The problem lies in the knowledge transfer from the saved to the loaded model. During the initial training phase on (CartPole-v1), I successfully reached a reward of 200 after 100 K timesteps.
However, when I reload the model for further training, I expect to start with a reward of around 200 and retain the hyperparameters set during the initial training (e.g., exploration rate). Unfortunately, this is not happening as expected.
In my case, the learning seems to start from the beginning as I see small rewards only, in addition, the exploration rate gets back to 1, not 0.05 as before.
Unlike DQN, I have tried this feature before on PPO and was working fine.
To fix this issue, I tried the following without effect:
Saving/reloading the replay buffer.
Manually setting the exploration rate on the loaded model to 0.05.
I set the env (i.e., model.set_env()) with dummyVecEnv and normal env.
Used model.set_parameters() instead of model.load()
importgymnasiumasgymfromstable_baselines3importDQNfromstable_baselines3.common.monitorimportMonitorfromstable_baselines3.common.vec_envimportDummyVecEnvenv=gym.make("CartPole-v1")
model=DQN("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=200000, log_interval=4)
model.save("dqn_cartpole")
model.save_replay_buffer("dqn_cartpole-buffer")
delmodel# remove to demonstrate saving and loading# Here, I observe the last trend of rewards achieved by the agentmodel=DQN.load("dqn_cartpole")
k=Monitor(gym.make("CartPole-v1"))
model.set_env(DummyVecEnv([lambda : k]))
model.load_replay_buffer("dqn_cartpole-buffer")
model.learn(total_timesteps=100000, log_interval=4)
# Here, please observe the rewards and the exploration rate of the agent, resets to 1 and rewards drops
Relevant log output / Error message
No response
System Info
OS: Linux-5.19.0-45-generic-x86_64-with-glibc2.35 # 46~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jun 7 15:06:04 UTC 20
Python: 3.11.3
Stable-Baselines3: 2.0.0
PyTorch: 2.0.1
GPU Enabled: True
Numpy: 1.24.3
Cloudpickle: 2.2.1
Gymnasium: 0.28.1
OpenAI Gym: 0.26.2
Checklist
My issue does not relate to a custom gym environment. (Use the custom gym env template instead)
I have checked that there is no similar issue in the repo
🐛 Bug
Hello everyone, I appreciate your work. I have a little bit embarrassing problem =|
I have recently encountered an issue while attempting to train, save, and reload my DQN model within a Gymnasium environment with only one environment. The problem lies in the knowledge transfer from the saved to the loaded model. During the initial training phase on (CartPole-v1), I successfully reached a reward of 200 after 100 K timesteps.
However, when I reload the model for further training, I expect to start with a reward of around 200 and retain the hyperparameters set during the initial training (e.g., exploration rate). Unfortunately, this is not happening as expected.
In my case, the learning seems to start from the beginning as I see small rewards only, in addition, the exploration rate gets back to 1, not 0.05 as before.
Unlike DQN, I have tried this feature before on PPO and was working fine.
To fix this issue, I tried the following without effect:
model.set_env()
) with dummyVecEnv and normal env.model.set_parameters()
instead ofmodel.load()
To Reproduce
Relevant log output / Error message
No response
System Info
Checklist
The text was updated successfully, but these errors were encountered: