- Add Continuous Mountain Car environment
- A2C algorithm supports conditional down-sampling for bad trajectories
- Support continuous actions
- Add Pendulum environment that can run up to 100K concurrent replicates
- Add DDPG algorithms for training continuous action policies
- Add Acrobot environment that can run up to 100K concurrent replicates.
- Add Mountain Car environment that can run up to 100K concurrent replicates.
- Support single agent framework and start to add gym.classic_control
- Add Cartpole environment that can run up to 100K concurrent replicates.
- Introduce environment reset pool, so concurrent enviornment replicas can randomly reset themselves from the pool.
- Introduce new device context management and autoinit_pycuda
- Therefore, Torch (any version) will not conflict with PyCUDA in the GPU context
- Add ModelFactory class to manage custom models
- Add Xavier initialization for the model
- Improve trainer.fetch_episode_states() so it can fetch (s, a, r) and can replay with argmax.
- Factorize the data loading for placeholders and batches (obs, actions and rewards) for the trainer.
- v2 trainer integration with Pytorch Lightning
Big release:
- WarpDrive:
- Added data and function managers for both CUDA C and Numba.
- Added core library (sampler and reset) for Numba.
- Dual environment backends, supporting both CUDA C and Numba.
- Training pipeline compatible with both CUDA C and Numba.
- Full backward compatibility with version 1.
- Environments
- tag (continuous version) implemented in Numba.
- tag (gridworld version) implemented in Numba.
- Update PyCUDA version to 2022.1
- Allow for envs to span multiple blocks, adding the capability to train simulations with thousands of agents.
- Trainer integration with Pytorch Lightning (https://www.pytorchlightning.ai/).
- Added multi-GPU support.
- Auto-scaling to maximize the number of environment replicas and training batch size (on a single GPU).
- Added Python logging.
- Added a trainer module to fetch environment states for an episode.
- Add policy-specific training parameters.
- Added a parameter scheduler.
- Option to push a list of data arrays to the GPU at once.
- Option to pass multiple arguments to the CUDA step function as a list.
- CUDA utility to help index multi-dimensional arrays.
- Log the episodic rewards.
- Save metrics during training.
- Support to register custom environments.
- Support for 'Dict' observation spaces.
- WarpDrive
- data and function managers.
- CUDA C core library.
- environment wrapper.
- Python (CPU) vs. CUDA C (GPU) simulation implementation consistency checker
- training pipeline (with FC network, and A2C, PPO agents).
- Environments
- tag (grid-world version).
- tag (continuous version).