Release SB3 v1.3.0 : Bug fixes and improvements for the user · DLR-RM/stable-baselines3

WARNING: This version will be the last one supporting Python 3.6 (end of life in Dec 2021).
We highly recommend you to upgrade to Python >= 3.7.

Breaking Changes:

sde_net_arch argument in policies is deprecated and will be removed in a future version.
_get_latent (ActorCriticPolicy) was removed
All logging keys now use underscores instead of spaces (@timokau). Concretely this changes:
- time/total timesteps to time/total_timesteps for off-policy algorithms (PPO and A2C) and the eval callback (on-policy algorithms already used the underscored version),
- rollout/exploration rate to rollout/exploration_rate and
- rollout/success rate to rollout/success_rate.

Added methods get_distribution and predict_values for ActorCriticPolicy for A2C/PPO/TRPO (@cyprienc)
Added methods forward_actor and forward_critic for MlpExtractor
Added sb3.get_system_info() helper function to gather version information relevant to SB3 (e.g., Python and PyTorch version)
Saved models now store system information where agent was trained, and load functions have print_system_info parameter to help debugging load issues.

Fixed dtype of observations for SimpleMultiObsEnv
Allow VecNormalize to wrap discrete-observation environments to normalize reward
when observation normalization is disabled.
Fixed a bug where DQN would throw an error when using Discrete observation and stochastic actions
Fixed a bug where sub-classed observation spaces could not be used
Added force_reset argument to load() and set_env() in order to be able to call learn(reset_num_timesteps=False) with a new environment

Cap gym max version to 0.19 to avoid issues with atari-py and other breaking changes
Improved error message when using dict observation with the wrong policy
Improved error message when using EvalCallback with two envs not wrapped the same way.
Added additional infos about supported python version for PyPi in setup.py