Welcome to the RL Algorithms and Environments (RLAlgoEnv) repository! This project provides a collection of reinforcement learning (RL) algorithms implemented in PyTorch and some customized or packed-up environments.
- The codes are well tested on
pytorch==2.0.1+cu117
. - Install all dependent packages:
pip install -r requirements.txt
- The project is based the
gymnasium>=0.29.1
package, to render the mujoco, robotics, etc. environments, you need to modify thesite-packages\gymnasium\envs\mujoco\mujoco_rendering.py
file: replace thesolver_iter
(at around line #593) tosolver_niter
. - The running scripts can be run directly in PyCharm, or you may
need to execute:
export PYTHONPATH=<path to RLEnvsAlgos>:$PYTHONPATH
.
The project implements RL algorithms in separate independent classes, easy to read and modify.
The implementation of these algorithms is primarily based on the CleanRL library, which is also an excellent resource that we recommend for reference.
Algorithm | Description | Auther & Year | Discrete Control | Continuous Control |
---|---|---|---|---|
DQN | An enhanced version of Deep Q-Networks algorithm. | Mnih et al., 2015 | ✔️ | ❌ |
CategoricalDQN | An extension of DQN with categorical distributional Q-learning. | Bellemare et al., 2017 | ✔️ | ❌ |
NoisyNet (DQN) | An extension of DQN with noisy networks for exploration. | Fortunato et al., 2019 | ✔️ | ❌ |
PPO | Proximal Policy Optimization. | Schulman et al., 2017 | ✔️ | ✔️ ️ |
RPO | An improved version of PPO. | Md Masudur Rahman and Yexiang Xue, 2023 | ✔️ | ✔️ |
RND | Random Network Distillation, extended from PPO. | Burda et al., 2018 | ✔️ | ✔️ |
DDPG | Deep Deterministic Policy Gradient algorithm. | Silver et al., 2014 | ❌ | ✔️ |
TD3 | Twin Delayed DDPG, an improved version of DDPG. | Fujimoto et al., 2018 | ❌ | ✔️ |
SAC | Soft Actor-Critic. | Haarnoja et al., 2018 | ✔️ | ✔️ |
- Soft Q-Learning (SQL)
- Advantage Actor-Critic (A2C)
- Asynchronous Advantage Actor-Critic (A3C)
- and more...
We provide a variety of running scripts for different algorithms and environments. You can find them here.
- DQN algorithm (can only work on discrete action spaces):
- NoisyNet-DQN algorithm:
- PPO algorithm:
- RPO algorithm:
- RND algorithm:
- DDPG algorithm (can only work on continuous action spaces):
- TD3 algorithm (can only work on continuous action spaces):
- SAC algorithm:
We pack up and customize a variety of environments for testing and benchmarking RL algorithms. All environments packages can be found in RLEnvs folder.
- gymnasium: the OpenAI gymnasium library.
- MyMiniGrid: based on the MiniGrid environment, customized some wrappers and self-designed environments.
- MyPandaRobot: based on the panda-gym environment, customized some self-designed environments.
- MyFetchRobot: based on the gymnasium-robotics library, customized the reward function to only give sparse and delayed rewards for four FetchRobot tasks.
- MyMujoco: based on the gymnasium-mujoco library, customized the reward function to only give sparse and delayed rewards.
- MyMiniWorld: based on the MiniWorld environment, customized some self-designed and sparse-reward environments.
We provide some templates to interact with the environments:
- Using gymnasium
- Using gymnasium-robotics
- Using MyMiniGrid
- Using MyPandaRobot
- Using MyFetchRobot
- Using MyMujoco
We welcome contributions!
Actually, the codes are not thoroughly tested, so we sincerely invite you to help us update the repository. If you have improvements or bug fixes, please feel free to open an issue or a pull request. Thanks in advance for your help!