-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
6ebb920
commit 03bcb6e
Showing
138 changed files
with
2,537 additions
and
436 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
Curriculum Learning | ||
=================== | ||
|
||
One day this page will include a proper introduction to curriculum learning. For now this, paper is a good reference `Curriculum Learning for Reinforcement Learning Domains: | ||
A Framework and Survey <https://arxiv.org/pdf/2003.04960.pdf>`_ |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
Unsupervised Environment Design | ||
=============================== | ||
|
||
Unsupervised Environment Design is a curriculum learning paradigm proposed in `Emergent Complexity and Zero-shot Transfer via | ||
Unsupervised Environment Design <https://arxiv.org/pdf/2012.02096.pdf>`_ (Dennis et al. 2020) |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
Benchmarks | ||
========== | ||
|
||
We're actively working on testing and benchmarking the Syllabus implementations of curriculum learning algorithms. | ||
For now only Domain Randomization and Prioritized Level Replay been evaluated to match baseline performance. | ||
|
||
Our benchmark data is publically available at our weights and biases page, and we will continue to update it as we test more methods. | ||
|
||
Weights and Biases: wandb_ | ||
|
||
.. _wandb: https://wandb.ai/ryansullivan/syllabus/workspace?workspace=user-ryansullivan |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Configuration file for the Sphinx documentation builder. | ||
# | ||
# For the full list of built-in configuration values, see the documentation: | ||
# https://www.sphinx-doc.org/en/master/usage/configuration.html | ||
|
||
# -- Project information ----------------------------------------------------- | ||
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information | ||
|
||
project = 'Syllabus' | ||
copyright = '2023, Ryan Sullivan' | ||
author = 'Ryan Sullivan' | ||
release = '0.5' | ||
|
||
# -- General configuration --------------------------------------------------- | ||
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration | ||
|
||
extensions = ['sphinx.ext.autodoc', | ||
'sphinx.ext.intersphinx', | ||
'sphinx_tabs.tabs', | ||
'sphinx.ext.napoleon', | ||
'sphinxcontrib.spelling'] | ||
|
||
templates_path = ['_templates'] | ||
|
||
import sys | ||
import os | ||
sys.path.insert(0, os.path.abspath('../../syllabus')) | ||
|
||
|
||
# -- Options for HTML output ------------------------------------------------- | ||
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output | ||
|
||
html_theme = 'furo' | ||
html_static_path = [] |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
|
||
Creating Your Own Curriculum | ||
============================ | ||
|
||
To create your own curriculum, all you need to do is write a subclass of Syllabus's `Curriculum` class. | ||
`Curriculum` provides multiple methods for updating your curriculum, each meant for a different context. | ||
By subclassing the `Curriculum` class, your method will automatically work with all of Syllabus's provided tools and infrastructure. | ||
|
||
---------------- | ||
Required Methods | ||
---------------- | ||
|
||
Your curriculum method is REQUIRED to implement the following methods: | ||
|
||
* :mod:`sample(k: int = 1) <syllabus.core.curriculum_base.Curriculum.sample>` - Returns a list of `k` tasks sampled from the curriculum. | ||
|
||
The `sample` method is how your curriculum decides which task the environments will play. | ||
Most methods use some combination of logic and probability distributions to choose tasks, but there are no restrictions on how you choose tasks. | ||
|
||
|
||
---------------------------- | ||
Curriculum Dependent Methods | ||
---------------------------- | ||
|
||
Your curriculum will likely require some feedback from the RL training loop to guide its task selection. These might be rewards from the environment, error values from the agent, or some other metric that you define. | ||
Depending on which type of information your curriculum requires, you will need to implement one or more of the following methods: | ||
|
||
* :mod:`update_task_progress(task, progress) <syllabus.core.curriculum_base.Curriculum.update_task_progress>` - is called either after each step or each episode :sup:`1` . It receives a task name and a boolean or float value indicating the current progress on the provided task. Values of True or 1.0 typically indicate a completed task. | ||
|
||
* :mod:`update_on_step(obs, rew, term, trunc, info) <syllabus.core.curriculum_base.Curriculum.update_on_step>` - is called once for each environment step. | ||
|
||
* :mod:`update_on_episode <syllabus.core.curriculum_base.Curriculum.update_on_episode>` - (**Not yet implemented**) will be called once for each completed episode by the environment synchronization wrapper. | ||
|
||
* :mod:`update_on_demand(metrics) <syllabus.core.curriculum_base.Curriculum.update_on_demand>` - is meant to be called by the main learner process to update a curriculum with information from the training process, such as TD errors or gradient norms. It is never used by the individual environments. It receives a dictionary of metrics of arbitrary types. | ||
|
||
Your curriculum will probably only use one of these methods, so you can choose to only override the one that you need. For example, the Learning Progress Curriculum | ||
only uses episodic task progress updates with `update_task_progress` and Prioritized Level Replay receives updates from the main process through `update_on_demand`. | ||
|
||
:sup:`1` If you choose not to use `update_on_step()` to update your curriculum, set `update_on_step=False` when initializing the environment synchronization wrapper | ||
to prevent it from being called and improve performance (An exception with the same suggestion is raised by default). | ||
|
||
|
||
------------------- | ||
Recommended Methods | ||
------------------- | ||
|
||
For most curricula, we recommend implementing these methods to support convenience features in Syllabus: | ||
|
||
* :mod:`_sample_distribution() <syllabus.core.curriculum_base.Curriculum._sample_distribution>` - Returns a probability distribution over tasks | ||
|
||
* :mod:`log_metrics(writer) <syllabus.core.curriculum_base.Curriculum.log_metrics>` - Logs curriculum-specific metrics to the provided tensorboard or weights and biases logger. | ||
|
||
If your curriculum uses a probability distribution to sample tasks, you should implement `_sample_distribution()`. The default implementation of `log_metrics` will log the probabilities from `_sample_distribution()` | ||
for each task in a discrete task space to tensorboard or weights and biases. You can also override `log_metrics` to log other values for your specific curriculum. | ||
|
||
---------------- | ||
Optional Methods | ||
---------------- | ||
|
||
You can optionally choose to implement these additional methods: | ||
|
||
|
||
* :mod:`update_on_step_batch(update_list) <syllabus.core.curriculum_base.Curriculum.update_on_step_batch>` - Updates the curriculum with a batch of step updates. | ||
|
||
* :mod:`update_curriculum_batch(update_data) <syllabus.core.curriculum_base.Curriculum.update_curriculum_batch>` - Updates the curriculum with a batch of data. | ||
|
||
|
||
`update_curriculum_batch` and `update_on_step_batch` can be overridden to provide a more efficient curriculum-specific implementation. The default implementation simply iterates over the updates. | ||
|
||
|
||
Each curriculum also specifies two constants: REQUIRES_STEP_UPDATES and REQUIRES_CENTRAL_UPDATES. | ||
|
||
* REQUIRES_STEP_UPDATES - If True, the environment synchronization wrapper should set `update_on_step=True` to provide the curriculum with updates after each step. | ||
|
||
* REQUIRES_CENTRAL_UPDATES - If True, the user will need to call `update_on_demand()` to provide the curriculum with updates from the main process. We recommend adding a warning to your curriculum if too many tasks are sampled without receiving updates. |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
================== | ||
Curriculum Methods | ||
================== | ||
|
||
Syllabus has a small collection of curriculum learning methods implemented.These include simple techniques that are often used in practice | ||
but rarely highlighted in the literature,such as simulated annealing of difficulty, or sequential curricula of easy to hard tasks. We also | ||
have several popular curriculum learning baselines; Domain Randomization, Prioritized Level Replay (Jiang et al. 2021), and the learning | ||
progress curriculum introduced in Kanitscheider et al. 2021. | ||
|
||
----------------------------------------------------------------------------------------- | ||
:mod:`Domain Randomization <syllabus.curricula.domain_randomization.DomainRandomization>` | ||
----------------------------------------------------------------------------------------- | ||
|
||
A simple but strong baseline for curriculum learning that uniformly samples a task from the task space. | ||
|
||
--------------------------------------------------------------------------------- | ||
:mod:`Sequential Curriculum <syllabus.curricula.sequential.SequentialCurriculum>` | ||
--------------------------------------------------------------------------------- | ||
|
||
Plays a provided list of tasks in order for a prespecified number of episodes. | ||
It can be used to manually design curricula by providing tasks in an order that you feel will result in the best final performance. | ||
*Coming Soon*: functional stopping criteria instead of a fixed number of episodes. | ||
|
||
-------------------------------------------------------------------------------- | ||
:mod:`Simple Box Curriculum <syllabus.curricula.simple_box.SimpleBoxCurriculum>` | ||
-------------------------------------------------------------------------------- | ||
|
||
A simple curriculum that expands a zero-centered range from an initial range to a final range over a number of discrete steps. | ||
The curriculum increases the range to the next stage when a provided reward threshold is met. | ||
|
||
------------------------------------------------------------------------------------------ | ||
:mod:`Learning Progress <syllabus.curricula.learning_progress.LearningProgressCurriculum>` | ||
------------------------------------------------------------------------------------------ | ||
|
||
Uses a heuristic to estimate the learning progress of a task. It maintains a fast and slow exponential moving average (EMA) of the task | ||
completion rates for a set of discrete tasks. | ||
By measuring the difference between the fast and slow EMAs and reweighting it to adjust for the time delay created by the EMA, this method can | ||
estimate the learning progress of a task. | ||
The curriculum then assigns a higher probability to tasks with a very high or very low learning progress, indicating that the agent | ||
is either learning or forgetting the task. For more information you can read the original paper | ||
`Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft (Kanitscheider et al. 2021) <https://arxiv.org/pdf/2106.14876.pdf>`_. | ||
|
||
------------------------------------------------------------------------------------------- | ||
:mod:`Prioritized Level Replay <syllabus.curricula.plr.plr_wrapper.PrioritizedLevelReplay>` | ||
------------------------------------------------------------------------------------------- | ||
|
||
A curriculum learning method that estimates an agent's regret on particular environment instantiations and uses a prioritized replay buffer to | ||
replay levels for which the agent has high regret. This implementation is based on the open-source original implementation at | ||
https://github.com/facebookresearch/level-replay, but has been modified to support Syllabus task spaces instead of just environment seeds. | ||
PLR has been used in multiple prominent RL works. For more information you can read the original paper | ||
`Prioritized Level Replay (Jiang et al. 2021) <https://arxiv.org/pdf/2010.03934.pdf>`_. |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
Environment Support | ||
=================== | ||
|
||
Syllabus is implemented with the new Gymnasium API, which is different from the old OpenAI Gym API. | ||
However, it is possible to use environments implemented with the Gym API in Syllabus. | ||
We recommend using the `Shimmy <https://github.com/Farama-Foundation/Shimmy>`_ package to convert Gym environments to Gymnasium environments. | ||
|
||
.. code-block:: python | ||
import gym | ||
from shimmy.openai_gym_compatibility import GymV21CompatibilityV0 | ||
env = gym.make('CartPole-v0') | ||
env = GymV21CompatibilityV0(env) | ||
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
Evaluation | ||
========== | ||
|
||
Evaluating RL agents trained with curriculum learning requires special consideration. Typically training tasks are assumed to be drawn from the same distribution as the test tasks. However, curriculum learning methods modify the training task distribution to improve test performance. Therefore, training returns are not a good measure of performance. Agents should be periodically evaluated during training on uniformly sampled tasks, ideally from a held out test set. You can see an example of this approach in our `procgen script <https://github.com/RyanNavillus/Syllabus/tree/main/syllabus/examples>`_. | ||
|
||
Correctly implementing this evaluation code can be surprisingly challenging, so we list a few guidelines to keep in mind here: | ||
* Make sure not to bias evaluation results towards shorter episodes. This is is easy to do by accident if you try to multiprocess evaluations. For example, if you run a vectorized environment and save the first 10 results, your test returns will be biased toward shorter episodes, which likely earned lower returns. | ||
* Reset the environments before each evaluation. This may seem obvious, but if you since some vectorized environments don't allow you to directly reset the environments, some might be tempted to skip this step. | ||
* Use the same environment wrappers for the evaluation environment. This is important because some wrappers, such as the `TimeLimit` wrapper, can change the dynamics of the environment. If you use different wrappers, you may get different results. |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
.. Syllabus documentation master file, created by | ||
sphinx-quickstart on Mon Jul 10 07:05:19 2023. | ||
You can adapt this file completely to your liking, but it should at least | ||
contain the root `toctree` directive. | ||
Syllabus Documentation | ||
====================== | ||
|
||
Syllabus is a library for using curriculum learning to train reinforcement learning agents. It provides a Curriculum API from | ||
defining curriculum learning algorithms, implementations of popular curriculum learning methods, and a framework for synchronizing | ||
those curricula across environments running in multiple processes. Syllabus makes it easy to implement curriculum learning methods | ||
and add them to existing training code. It takes only a few lines of code to add a curriculum to an existing training script, and | ||
because of the shared Curriculum API, you can swap out different curriculum learning methods by changing a single line of code. | ||
|
||
It currently has support for environments run with Python native multiprocessing or Ray actors, which includes RLLib, CleanRL, | ||
Stable Baselines 3, and Monobeast (Torchbeast). We have working examples with CleanRL, RLLib, Stable Baselines 3, and Monobeast (Torchbeast). | ||
We also have preliminary support and examples for multiagent PettingZoo environments. | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
:caption: Getting Started: | ||
|
||
self | ||
installation | ||
quickstart | ||
environments | ||
evaluation/evaluation | ||
logging | ||
benchmarks | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:caption: Curriculum Learning Background: | ||
|
||
background/curriculum_learning | ||
background/ued | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
:caption: Curriculum API: | ||
|
||
modules/syllabus.core.curriculum | ||
curricula/custom_curricula | ||
curricula/implemented_curricula | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:caption: Curriculum Methods: | ||
|
||
modules/syllabus.curricula.plr | ||
modules/syllabus.curricula.domain_randomization | ||
modules/syllabus.curricula.learning_progress | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:caption: Task Spaces: | ||
|
||
modules/syllabus.task_space | ||
modules/syllabus.core.task_interface | ||
modules/syllabus.examples.task_wrappers | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:caption: Synchronization: | ||
|
||
modules/syllabus.core | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
:caption: Development: | ||
|
||
Github <https://github.com/RyanNavillus/Syllabus> | ||
modules/modules |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
Installation | ||
============ | ||
|
||
You can install Syllabus on pip with the following command: | ||
|
||
.. code-block:: bash | ||
pip install syllabus-rl | ||
To install a development branch of syllabus, you can clone the repository and install it with pip: | ||
|
||
.. code-block:: bash | ||
git clone [email protected]:RyanNavillus/Syllabus.git | ||
git checkout <branch-name> | ||
cd Syllabus | ||
pip install -e .[all] |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
Logging | ||
======= | ||
|
||
Syllabus currently has preliminary support for weights and biases or tensorboard logging through the :mod:`log_metrics <syllabus.core.curriculum_base.Curriculum.log_metrics>` function for curricula. |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
syllabus | ||
======== | ||
|
||
.. toctree:: | ||
:maxdepth: 4 | ||
|
||
syllabus |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
.. _Curriculum API: | ||
|
||
Curriculum | ||
========== | ||
|
||
Syllabus's Curriculum API is a unified interface for curriculum learning methods. Curricula following this API | ||
can be used with all of Syllabus's infrastructure. We hope that future curriculum learning research will provide | ||
implementations following this API to encourage reproducibility and ease of use. | ||
|
||
The full documentation for the curriculum class can be found :doc:`../modules/syllabus.core` | ||
|
||
The Curriculum class has three main jobs: | ||
|
||
- Maintain a sampling distribution over the task space. | ||
|
||
- Incorporate feedback from the environments or training process to update the sampling distribution. | ||
|
||
- Provide a sampling interface for the environment to draw tasks from. | ||
|
||
|
||
In reality, the sampling distribution can be whatever you want, such as a uniform distribution, | ||
a deterministic sequence of tasks, or a single constant task depending on the curriculum learning method. | ||
|
||
To incorporate feedback from the environment, the API provides multiple methods: | ||
|
||
- :mod:`update_on_step <syllabus.core.curriculum_base.Curriculum.update_on_step>` | ||
|
||
- :mod:`update_task_progress <syllabus.core.curriculum_base.Curriculum.update_task_progress>` | ||
|
||
- :mod:`update_on_episode <syllabus.core.curriculum_base.Curriculum.update_on_episode>` | ||
|
||
- :mod:`update_on_step_batch <syllabus.core.curriculum_base.Curriculum.update_on_step_batch>` | ||
|
||
- :mod:`update_curriculum_batch <syllabus.core.curriculum_base.Curriculum.update_curriculum_batch>` | ||
|
||
syllabus.core.curriculum\_base module | ||
------------------------------------- | ||
|
||
.. automodule:: syllabus.core.curriculum_base | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
Empty file.
File renamed without changes.
Empty file.
File renamed without changes.
Empty file.
9 changes: 9 additions & 0 deletions
9
docs-source/modules/syllabus.curricula.domain_randomization.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
Domain Randomization | ||
===================== | ||
|
||
Domain Randomization simply uniformly samples tasks from the task space. It can be a strong baseline in environments with relatively small task spaces. | ||
|
||
.. automodule:: syllabus.curricula.domain_randomization | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
Empty file.
Oops, something went wrong.