-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: bug title: Proposal to Adjust Default n_steps for PPO for Better Outcomes with Small Environments #1846
Comments
sorry about the format of "To Reproduce", I will rewrite here: import bug_lib as BL
import subprocess
import os
import sys
sys.path.insert(0, './training_scripts/')
from stable_baselines3 import DQN, PPO, A2C, SAC
from stable_baselines3.common.callbacks import BaseCallback
import training_scripts.Env
def get_PPO_Model(env, model_path=os.path.join('RLTesting', 'logs', 'ppo.zip')):
if os.path.isfile(model_path):
print("loading existing model")
model = PPO.load(model_path, env=env)
else:
print("creating new model")
model = PPO('MlpPolicy', env)
# new_logger = configure(folder="logs", format_strings=["stdout", "log", "csv", "tensorboard"])
# model.set_logger(new_logger)
return model
'''
note here: max_steps=200 so that it's less than the default value of 2048; maybe making 2048 a small value will be better
'''
def train_PPO_model(model, max_steps=200, model_path=os.path.join('RLTesting', 'logs', 'ppo.zip')):
vec_env = model.get_env()
vec_env.reset()
vec_env.render()
model.learn(max_steps)
action_state_list = vec_env.envs[0].get_state_action_pairs()
model.save(model_path)
vec_env.close()
return action_state_list
if __name__ == '__main__':
env = training_scripts.Env.EnvWrapper()
env.reset()
model = get_PPO_Model(env=env)
result = train_PPO_model(model=model) |
Hello,
from our readme: "Note: Despite its simplicity of use, Stable Baselines3 (SB3) assumes you have some knowledge about Reinforcement Learning (RL). You should not utilize this library without some practice. To that extent, we provide good resources in the documentation to get started with RL."
The current default PPO values work well in many different scenarios, and compared to DQN, changing the default will probably cause more harm than benefits. PS: thanks for the code but it is not minimal, nor working... (see link in the checklist for the explanation) |
Hello: I still think it's a seminar problem like merge request #1785; You can not assume everyone using Stable-baselines3 is fimilar with every line in your documentation. I believe that unreasonable parameter settings could lead to users wasting their time, so perhaps adjusting to a smaller parameter or providing prompts such as 'Your model is actually still warming up and not yet training' would be more user-friendly. Ignore the demo code since every time you call 'model.learn(max_steps)' and max_step < 2048 will cause that problem. Regards |
To be percise, check #1760 |
I trained the ppo model for 300 rounds (each round has 1-80 steps), the pic below is using n_steps=16; X axis means round index, and Y axis means accuracy; The following picture uses default param, n=2048. The picture shows that the agent didn't learn anything. I think a user-friendly hint 'your model is only warming up' or 'your model is not trained' will help; Or you can just set n_steps a smaller number like the situation #1760 |
I find this problem do has something to do with my environment, cause I make the agent stop learning when it falls into the ice lake; so that if I set n_steps to 2048, in my environment it will never learn. But if I set it to 16, the result will benefit from more iterations. |
🐛 Bug
Hello Stable-Baselines3 Team,
I recently encountered an issue while training an agent on the FrozenLake environment using PPO's default settings. The agent showed no signs of learning, which prompted me to delve into the
stable_baselines3/ppo/ppo.py
source code. I discovered that the defaultn_steps
is set to 2048. This high value means that in a single environment setup, the PPO model requires over 2048 steps worth of data before starting to train.As a beginner, I find Stable-baselines3 incredibly user-friendly. However, I generally only adjusted the batch_size and learning_rate, not being aware of the impact of
n_steps
. This oversight led to some unsuccessful training attempts with small sample sizes using the default PPO parameters.My breakthrough came when I reduced
n_steps
to 16, which significantly improved the training results. This experience led me to look at similar parameters in other algorithms, like DQN. I noticed that in merge request #1785, thelearning_starts
parameter fordqn.py
was initially 50000, which was later reduced to 100 after recognizing that it was too high compared to other algorithms. This change aligns with the improved training outcomes I observed using the default settings of the newer DQN model on my small sample data.Considering these observations, I suggest revising the default
n_steps
value inppo.py
from 2048 to a smaller number. I believe this could be more beginner-friendly and may enhance the usability of Stable-baselines3, particularly for those working with smaller environments.Thank you for considering this adjustment.
Best regards,
Shiyu
To Reproduce
import bug_lib as BL
import subprocess
import os
import sys
sys.path.insert(0, './training_scripts/')
from stable_baselines3 import DQN, PPO, A2C, SAC
from stable_baselines3.common.callbacks import BaseCallback
import training_scripts.Env
def get_PPO_Model(env, model_path=os.path.join('RLTesting', 'logs', 'ppo.zip')):
if os.path.isfile(model_path):
print("loading existing model")
model = PPO.load(model_path, env=env)
else:
print("creating new model")
model = PPO('MlpPolicy', env)
# new_logger = configure(folder="logs", format_strings=["stdout", "log", "csv", "tensorboard"])
# model.set_logger(new_logger)
return model
'''
note here: max_steps=200 so that it's less than the default value of 2048; maybe making 2048 a small value will be better
'''
def train_PPO_model(model, max_steps=200, model_path=os.path.join('RLTesting', 'logs', 'ppo.zip')):
vec_env = model.get_env()
vec_env.reset()
vec_env.render()
model.learn(max_steps)
action_state_list = vec_env.envs[0].get_state_action_pairs()
model.save(model_path)
vec_env.close()
return action_state_list
if name == 'main':
env = training_scripts.Env.EnvWrapper()
env.reset()
model = get_PPO_Model(env=env)
result = train_PPO_model(model=model)
Relevant log output / Error message
max_steps=200, so that it's less than the default value of 2048; The model is untrained. Maybe making 2048 a small value will be better
System Info
No response
Checklist
The text was updated successfully, but these errors were encountered: