SB3 v1.5.0: Bug fixes, early stopping callback
SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
Breaking Changes:
- Switched minimum Gym version to 0.21.0.
New Features:
- Added
StopTrainingOnNoModelImprovement
to callback collection (@caburu) - Makes the length of keys and values in
HumanOutputFormat
configurable,
depending on desired maximum width of output. - Allow PPO to turn of advantage normalization (see PR #763) @vwxyzjn
SB3-Contrib
- coming soon: Cross Entropy Method, see Stable-Baselines-Team/stable-baselines3-contrib#62
Bug Fixes:
- Fixed a bug in
VecMonitor
. The monitor did not consider theinfo_keywords
during stepping (@ScheiklP) - Fixed a bug in
HumanOutputFormat
. Distinct keys truncated to the same prefix would overwrite each others value,
resulting in only one being output. This now raises an error (this should only affect a small fraction of use cases
with very long keys.) - Routing all the
nn.Module
calls through implicit rather than explict forward as per pytorch guidelines (@manuel-delverme) - Fixed a bug in
VecNormalize
where error occurs whennorm_obs
is set to False for environment with dictionary observation (@buoyancy99) - Set default
env
argument toNone
inHerReplayBuffer.sample
(@qgallouedec) - Fix
batch_size
typing inDQN
(@qgallouedec) - Fixed sample normalization in
DictReplayBuffer
(@qgallouedec)
Others:
- Fixed pytest warnings
- Removed parameter
remove_time_limit_termination
in off policy algorithms since it was dead code (@Gregwar)
Documentation:
- Added doc on Hugging Face integration (@simoninithomas)
- Added furuta pendulum project to project list (@Armandpl)
- Fix indentation 2 spaces to 4 spaces in custom env documentation example (@Gautam-J)
- Update MlpExtractor docstring (@gianlucadecola)
- Added explanation of the logger output
- Update
Directly Accessing The Summary Writer
in tensorboard integration (@xy9485)
Full Changelog: v1.4.0...v1.5.0