diff --git a/README.md b/README.md index 6e55f1030..018b8c6f3 100644 --- a/README.md +++ b/README.md @@ -85,7 +85,7 @@ Documentation is available online: [https://sb3-contrib.readthedocs.io/](https:/ ## Stable-Baselines Jax (SBX) -[Stable Baselines Jax (SBX)](https://github.com/araffin/sbx) is a proof of concept version of Stable-Baselines3 in Jax. +[Stable Baselines Jax (SBX)](https://github.com/araffin/sbx) is a proof of concept version of Stable-Baselines3 in Jax, with recent algorithms like DroQ or CrossQ. It provides a minimal number of features compared to SB3 but can be much faster (up to 20x times!): https://twitter.com/araffin2/status/1590714558628253698 diff --git a/docs/guide/algos.rst b/docs/guide/algos.rst index 33ac3ba46..d5e7ae1d2 100644 --- a/docs/guide/algos.rst +++ b/docs/guide/algos.rst @@ -43,7 +43,8 @@ Actions ``gym.spaces``: .. note:: - More algorithms (like QR-DQN or TQC) are implemented in our :ref:`contrib repo `. + More algorithms (like QR-DQN or TQC) are implemented in our :ref:`contrib repo ` + and in our :ref:`SBX (SB3 + Jax) repo ` (DroQ, CrossQ, ...). .. note:: diff --git a/docs/guide/sbx.rst b/docs/guide/sbx.rst index 52b4348bc..ed5369ea4 100644 --- a/docs/guide/sbx.rst +++ b/docs/guide/sbx.rst @@ -17,6 +17,7 @@ Implemented algorithms: - Deep Q Network (DQN) - Twin Delayed DDPG (TD3) - Deep Deterministic Policy Gradient (DDPG) +- Batch Normalization in Deep Reinforcement Learning (CrossQ) As SBX follows SB3 API, it is also compatible with the `RL Zoo `_. @@ -29,16 +30,17 @@ For that you will need to create two files: import rl_zoo3 import rl_zoo3.train from rl_zoo3.train import train - - from sbx import DDPG, DQN, PPO, SAC, TD3, TQC, DroQ + from sbx import DDPG, DQN, PPO, SAC, TD3, TQC, CrossQ rl_zoo3.ALGOS["ddpg"] = DDPG rl_zoo3.ALGOS["dqn"] = DQN - rl_zoo3.ALGOS["droq"] = DroQ + # See SBX readme to use DroQ configuration + # rl_zoo3.ALGOS["droq"] = DroQ rl_zoo3.ALGOS["sac"] = SAC rl_zoo3.ALGOS["ppo"] = PPO rl_zoo3.ALGOS["td3"] = TD3 rl_zoo3.ALGOS["tqc"] = TQC + rl_zoo3.ALGOS["crossq"] = CrossQ rl_zoo3.train.ALGOS = rl_zoo3.ALGOS rl_zoo3.exp_manager.ALGOS = rl_zoo3.ALGOS @@ -56,16 +58,17 @@ Then you can call ``python train_sbx.py --algo sac --env Pendulum-v1`` and use t import rl_zoo3 import rl_zoo3.enjoy from rl_zoo3.enjoy import enjoy - - from sbx import DDPG, DQN, PPO, SAC, TD3, TQC, DroQ + from sbx import DDPG, DQN, PPO, SAC, TD3, TQC, CrossQ rl_zoo3.ALGOS["ddpg"] = DDPG rl_zoo3.ALGOS["dqn"] = DQN - rl_zoo3.ALGOS["droq"] = DroQ + # See SBX readme to use DroQ configuration + # rl_zoo3.ALGOS["droq"] = DroQ rl_zoo3.ALGOS["sac"] = SAC rl_zoo3.ALGOS["ppo"] = PPO rl_zoo3.ALGOS["td3"] = TD3 rl_zoo3.ALGOS["tqc"] = TQC + rl_zoo3.ALGOS["crossq"] = CrossQ rl_zoo3.enjoy.ALGOS = rl_zoo3.ALGOS rl_zoo3.exp_manager.ALGOS = rl_zoo3.ALGOS diff --git a/docs/misc/changelog.rst b/docs/misc/changelog.rst index 331ae1db2..9080b6245 100644 --- a/docs/misc/changelog.rst +++ b/docs/misc/changelog.rst @@ -3,6 +3,18 @@ Changelog ========== +Release 2.3.1 (2024-04-22) +-------------------------- + +Bug Fixes: +^^^^^^^^^^ +- Cast return value of learning rate schedule to float, to avoid issue when loading model because of ``weights_only=True`` (@markscsmith) + +Documentation: +^^^^^^^^^^^^^^ +- Updated SBX documentation (CrossQ and deprecated DroQ) + + Release 2.3.0 (2024-03-31) -------------------------- @@ -48,7 +60,6 @@ New Features: Bug Fixes: ^^^^^^^^^^ - Fixed ``monitor_wrapper`` argument that was not passed to the parent class, and dones argument that wasn't passed to ``_update_into_buffer`` (@corentinlger) -- Fixed ``learning_rate`` argument that could cause weights_only=True to fail if passed a function with non-float types (e.g. ``learning_rate=lambda _: np.sin(1.0)``) (@markscsmith) `SB3-Contrib`_ ^^^^^^^^^^^^^^