Skip to content

SB3 v1.3.0 : Bug fixes and improvements for the user

Compare
Choose a tag to compare
@araffin araffin released this 23 Oct 15:15
· 303 commits to master since this release
7b977d7

WARNING: This version will be the last one supporting Python 3.6 (end of life in Dec 2021).
We highly recommend you to upgrade to Python >= 3.7.

SB3-Contrib changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/releases/tag/v1.3.0

Breaking Changes:

  • sde_net_arch argument in policies is deprecated and will be removed in a future version.

  • _get_latent (ActorCriticPolicy) was removed

  • All logging keys now use underscores instead of spaces (@timokau). Concretely this changes:

    • time/total timesteps to time/total_timesteps for off-policy algorithms (PPO and A2C) and the eval callback (on-policy algorithms already used the underscored version),
    • rollout/exploration rate to rollout/exploration_rate and
    • rollout/success rate to rollout/success_rate.

New Features:

  • Added methods get_distribution and predict_values for ActorCriticPolicy for A2C/PPO/TRPO (@cyprienc)
  • Added methods forward_actor and forward_critic for MlpExtractor
  • Added sb3.get_system_info() helper function to gather version information relevant to SB3 (e.g., Python and PyTorch version)
  • Saved models now store system information where agent was trained, and load functions have print_system_info parameter to help debugging load issues.

Bug Fixes:

  • Fixed dtype of observations for SimpleMultiObsEnv
  • Allow VecNormalize to wrap discrete-observation environments to normalize reward
    when observation normalization is disabled.
  • Fixed a bug where DQN would throw an error when using Discrete observation and stochastic actions
  • Fixed a bug where sub-classed observation spaces could not be used
  • Added force_reset argument to load() and set_env() in order to be able to call learn(reset_num_timesteps=False) with a new environment

Others:

  • Cap gym max version to 0.19 to avoid issues with atari-py and other breaking changes
  • Improved error message when using dict observation with the wrong policy
  • Improved error message when using EvalCallback with two envs not wrapped the same way.
  • Added additional infos about supported python version for PyPi in setup.py

Documentation:

  • Add Rocket League Gym to list of supported projects (@AechPro)
  • Added gym-electric-motor to project page (@wkirgsn)
  • Added policy-distillation-baselines to project page (@CUN-bjy)
  • Added ONNX export instructions (@batu)
  • Update read the doc env (fixed docutils issue)
  • Fix PPO environment name (@IljaAvadiev)
  • Fix custom env doc and add env registration example
  • Update algorithms from SB3 Contrib
  • Use underscores for numeric literals in examples to improve clarity