Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
RyanNavillus committed Nov 28, 2024
1 parent 6ebb920 commit 03bcb6e
Show file tree
Hide file tree
Showing 138 changed files with 2,537 additions and 436 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ To help people get started using Syllabus, I've added a few simple curriculum le
To build the documentation, run the following commands:

```
sphinx-build -M html ./docs/source ./docs
sphinx-build -M html ./docs-source ./docs
cp -r ./docs/html/* ./docs && rm -R ./docs/html/*
```

Expand All @@ -133,6 +133,7 @@ If you need to regenerate the module docs from scratch, you can use the followin
```
sphinx-apidoc -o ./docs/modules ./syllabus
```
Then manually merge the results with ./docs-source as needed.

## Citing Syllabus
To be added soon.
5 changes: 5 additions & 0 deletions docs-source/background/curriculum_learning.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Curriculum Learning
===================

One day this page will include a proper introduction to curriculum learning. For now this, paper is a good reference `Curriculum Learning for Reinforcement Learning Domains:
A Framework and Survey <https://arxiv.org/pdf/2003.04960.pdf>`_
Empty file.
5 changes: 5 additions & 0 deletions docs-source/background/ued.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Unsupervised Environment Design
===============================

Unsupervised Environment Design is a curriculum learning paradigm proposed in `Emergent Complexity and Zero-shot Transfer via
Unsupervised Environment Design <https://arxiv.org/pdf/2012.02096.pdf>`_ (Dennis et al. 2020)
Empty file.
11 changes: 11 additions & 0 deletions docs-source/benchmarks.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Benchmarks
==========

We're actively working on testing and benchmarking the Syllabus implementations of curriculum learning algorithms.
For now only Domain Randomization and Prioritized Level Replay been evaluated to match baseline performance.

Our benchmark data is publically available at our weights and biases page, and we will continue to update it as we test more methods.

Weights and Biases: wandb_

.. _wandb: https://wandb.ai/ryansullivan/syllabus/workspace?workspace=user-ryansullivan
Empty file.
34 changes: 34 additions & 0 deletions docs-source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = 'Syllabus'
copyright = '2023, Ryan Sullivan'
author = 'Ryan Sullivan'
release = '0.5'

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = ['sphinx.ext.autodoc',
'sphinx.ext.intersphinx',
'sphinx_tabs.tabs',
'sphinx.ext.napoleon',
'sphinxcontrib.spelling']

templates_path = ['_templates']

import sys
import os
sys.path.insert(0, os.path.abspath('../../syllabus'))


# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = 'furo'
html_static_path = []
Empty file.
75 changes: 75 additions & 0 deletions docs-source/curricula/custom_curricula.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@

Creating Your Own Curriculum
============================

To create your own curriculum, all you need to do is write a subclass of Syllabus's `Curriculum` class.
`Curriculum` provides multiple methods for updating your curriculum, each meant for a different context.
By subclassing the `Curriculum` class, your method will automatically work with all of Syllabus's provided tools and infrastructure.

----------------
Required Methods
----------------

Your curriculum method is REQUIRED to implement the following methods:

* :mod:`sample(k: int = 1) <syllabus.core.curriculum_base.Curriculum.sample>` - Returns a list of `k` tasks sampled from the curriculum.

The `sample` method is how your curriculum decides which task the environments will play.
Most methods use some combination of logic and probability distributions to choose tasks, but there are no restrictions on how you choose tasks.


----------------------------
Curriculum Dependent Methods
----------------------------

Your curriculum will likely require some feedback from the RL training loop to guide its task selection. These might be rewards from the environment, error values from the agent, or some other metric that you define.
Depending on which type of information your curriculum requires, you will need to implement one or more of the following methods:

* :mod:`update_task_progress(task, progress) <syllabus.core.curriculum_base.Curriculum.update_task_progress>` - is called either after each step or each episode :sup:`1` . It receives a task name and a boolean or float value indicating the current progress on the provided task. Values of True or 1.0 typically indicate a completed task.

* :mod:`update_on_step(obs, rew, term, trunc, info) <syllabus.core.curriculum_base.Curriculum.update_on_step>` - is called once for each environment step.

* :mod:`update_on_episode <syllabus.core.curriculum_base.Curriculum.update_on_episode>` - (**Not yet implemented**) will be called once for each completed episode by the environment synchronization wrapper.

* :mod:`update_on_demand(metrics) <syllabus.core.curriculum_base.Curriculum.update_on_demand>` - is meant to be called by the main learner process to update a curriculum with information from the training process, such as TD errors or gradient norms. It is never used by the individual environments. It receives a dictionary of metrics of arbitrary types.

Your curriculum will probably only use one of these methods, so you can choose to only override the one that you need. For example, the Learning Progress Curriculum
only uses episodic task progress updates with `update_task_progress` and Prioritized Level Replay receives updates from the main process through `update_on_demand`.

:sup:`1` If you choose not to use `update_on_step()` to update your curriculum, set `update_on_step=False` when initializing the environment synchronization wrapper
to prevent it from being called and improve performance (An exception with the same suggestion is raised by default).


-------------------
Recommended Methods
-------------------

For most curricula, we recommend implementing these methods to support convenience features in Syllabus:

* :mod:`_sample_distribution() <syllabus.core.curriculum_base.Curriculum._sample_distribution>` - Returns a probability distribution over tasks

* :mod:`log_metrics(writer) <syllabus.core.curriculum_base.Curriculum.log_metrics>` - Logs curriculum-specific metrics to the provided tensorboard or weights and biases logger.

If your curriculum uses a probability distribution to sample tasks, you should implement `_sample_distribution()`. The default implementation of `log_metrics` will log the probabilities from `_sample_distribution()`
for each task in a discrete task space to tensorboard or weights and biases. You can also override `log_metrics` to log other values for your specific curriculum.

----------------
Optional Methods
----------------

You can optionally choose to implement these additional methods:


* :mod:`update_on_step_batch(update_list) <syllabus.core.curriculum_base.Curriculum.update_on_step_batch>` - Updates the curriculum with a batch of step updates.

* :mod:`update_curriculum_batch(update_data) <syllabus.core.curriculum_base.Curriculum.update_curriculum_batch>` - Updates the curriculum with a batch of data.


`update_curriculum_batch` and `update_on_step_batch` can be overridden to provide a more efficient curriculum-specific implementation. The default implementation simply iterates over the updates.


Each curriculum also specifies two constants: REQUIRES_STEP_UPDATES and REQUIRES_CENTRAL_UPDATES.

* REQUIRES_STEP_UPDATES - If True, the environment synchronization wrapper should set `update_on_step=True` to provide the curriculum with updates after each step.

* REQUIRES_CENTRAL_UPDATES - If True, the user will need to call `update_on_demand()` to provide the curriculum with updates from the main process. We recommend adding a warning to your curriculum if too many tasks are sampled without receiving updates.
Empty file.
51 changes: 51 additions & 0 deletions docs-source/curricula/implemented_curricula.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
==================
Curriculum Methods
==================

Syllabus has a small collection of curriculum learning methods implemented.These include simple techniques that are often used in practice
but rarely highlighted in the literature,such as simulated annealing of difficulty, or sequential curricula of easy to hard tasks. We also
have several popular curriculum learning baselines; Domain Randomization, Prioritized Level Replay (Jiang et al. 2021), and the learning
progress curriculum introduced in Kanitscheider et al. 2021.

-----------------------------------------------------------------------------------------
:mod:`Domain Randomization <syllabus.curricula.domain_randomization.DomainRandomization>`
-----------------------------------------------------------------------------------------

A simple but strong baseline for curriculum learning that uniformly samples a task from the task space.

---------------------------------------------------------------------------------
:mod:`Sequential Curriculum <syllabus.curricula.sequential.SequentialCurriculum>`
---------------------------------------------------------------------------------

Plays a provided list of tasks in order for a prespecified number of episodes.
It can be used to manually design curricula by providing tasks in an order that you feel will result in the best final performance.
*Coming Soon*: functional stopping criteria instead of a fixed number of episodes.

--------------------------------------------------------------------------------
:mod:`Simple Box Curriculum <syllabus.curricula.simple_box.SimpleBoxCurriculum>`
--------------------------------------------------------------------------------

A simple curriculum that expands a zero-centered range from an initial range to a final range over a number of discrete steps.
The curriculum increases the range to the next stage when a provided reward threshold is met.

------------------------------------------------------------------------------------------
:mod:`Learning Progress <syllabus.curricula.learning_progress.LearningProgressCurriculum>`
------------------------------------------------------------------------------------------

Uses a heuristic to estimate the learning progress of a task. It maintains a fast and slow exponential moving average (EMA) of the task
completion rates for a set of discrete tasks.
By measuring the difference between the fast and slow EMAs and reweighting it to adjust for the time delay created by the EMA, this method can
estimate the learning progress of a task.
The curriculum then assigns a higher probability to tasks with a very high or very low learning progress, indicating that the agent
is either learning or forgetting the task. For more information you can read the original paper
`Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft (Kanitscheider et al. 2021) <https://arxiv.org/pdf/2106.14876.pdf>`_.

-------------------------------------------------------------------------------------------
:mod:`Prioritized Level Replay <syllabus.curricula.plr.plr_wrapper.PrioritizedLevelReplay>`
-------------------------------------------------------------------------------------------

A curriculum learning method that estimates an agent's regret on particular environment instantiations and uses a prioritized replay buffer to
replay levels for which the agent has high regret. This implementation is based on the open-source original implementation at
https://github.com/facebookresearch/level-replay, but has been modified to support Syllabus task spaces instead of just environment seeds.
PLR has been used in multiple prominent RL works. For more information you can read the original paper
`Prioritized Level Replay (Jiang et al. 2021) <https://arxiv.org/pdf/2010.03934.pdf>`_.
Empty file.
15 changes: 15 additions & 0 deletions docs-source/environments.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Environment Support
===================

Syllabus is implemented with the new Gymnasium API, which is different from the old OpenAI Gym API.
However, it is possible to use environments implemented with the Gym API in Syllabus.
We recommend using the `Shimmy <https://github.com/Farama-Foundation/Shimmy>`_ package to convert Gym environments to Gymnasium environments.

.. code-block:: python
import gym
from shimmy.openai_gym_compatibility import GymV21CompatibilityV0
env = gym.make('CartPole-v0')
env = GymV21CompatibilityV0(env)
Empty file.
9 changes: 9 additions & 0 deletions docs-source/evaluation/evaluation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Evaluation
==========

Evaluating RL agents trained with curriculum learning requires special consideration. Typically training tasks are assumed to be drawn from the same distribution as the test tasks. However, curriculum learning methods modify the training task distribution to improve test performance. Therefore, training returns are not a good measure of performance. Agents should be periodically evaluated during training on uniformly sampled tasks, ideally from a held out test set. You can see an example of this approach in our `procgen script <https://github.com/RyanNavillus/Syllabus/tree/main/syllabus/examples>`_.

Correctly implementing this evaluation code can be surprisingly challenging, so we list a few guidelines to keep in mind here:
* Make sure not to bias evaluation results towards shorter episodes. This is is easy to do by accident if you try to multiprocess evaluations. For example, if you run a vectorized environment and save the first 10 results, your test returns will be biased toward shorter episodes, which likely earned lower returns.
* Reset the environments before each evaluation. This may seem obvious, but if you since some vectorized environments don't allow you to directly reset the environments, some might be tempted to skip this step.
* Use the same environment wrappers for the evaluation environment. This is important because some wrappers, such as the `TimeLimit` wrapper, can change the dynamics of the environment. If you use different wrappers, you may get different results.
Empty file.
73 changes: 73 additions & 0 deletions docs-source/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
.. Syllabus documentation master file, created by
sphinx-quickstart on Mon Jul 10 07:05:19 2023.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Syllabus Documentation
======================

Syllabus is a library for using curriculum learning to train reinforcement learning agents. It provides a Curriculum API from
defining curriculum learning algorithms, implementations of popular curriculum learning methods, and a framework for synchronizing
those curricula across environments running in multiple processes. Syllabus makes it easy to implement curriculum learning methods
and add them to existing training code. It takes only a few lines of code to add a curriculum to an existing training script, and
because of the shared Curriculum API, you can swap out different curriculum learning methods by changing a single line of code.

It currently has support for environments run with Python native multiprocessing or Ray actors, which includes RLLib, CleanRL,
Stable Baselines 3, and Monobeast (Torchbeast). We have working examples with CleanRL, RLLib, Stable Baselines 3, and Monobeast (Torchbeast).
We also have preliminary support and examples for multiagent PettingZoo environments.

.. toctree::
:maxdepth: 2
:caption: Getting Started:

self
installation
quickstart
environments
evaluation/evaluation
logging
benchmarks

.. toctree::
:maxdepth: 1
:caption: Curriculum Learning Background:

background/curriculum_learning
background/ued

.. toctree::
:maxdepth: 2
:caption: Curriculum API:

modules/syllabus.core.curriculum
curricula/custom_curricula
curricula/implemented_curricula

.. toctree::
:maxdepth: 1
:caption: Curriculum Methods:

modules/syllabus.curricula.plr
modules/syllabus.curricula.domain_randomization
modules/syllabus.curricula.learning_progress

.. toctree::
:maxdepth: 1
:caption: Task Spaces:

modules/syllabus.task_space
modules/syllabus.core.task_interface
modules/syllabus.examples.task_wrappers

.. toctree::
:maxdepth: 1
:caption: Synchronization:

modules/syllabus.core

.. toctree::
:maxdepth: 2
:caption: Development:

Github <https://github.com/RyanNavillus/Syllabus>
modules/modules
Empty file.
17 changes: 17 additions & 0 deletions docs-source/installation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
Installation
============

You can install Syllabus on pip with the following command:

.. code-block:: bash
pip install syllabus-rl
To install a development branch of syllabus, you can clone the repository and install it with pip:

.. code-block:: bash
git clone [email protected]:RyanNavillus/Syllabus.git
git checkout <branch-name>
cd Syllabus
pip install -e .[all]
Empty file.
4 changes: 4 additions & 0 deletions docs-source/logging.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Logging
=======

Syllabus currently has preliminary support for weights and biases or tensorboard logging through the :mod:`log_metrics <syllabus.core.curriculum_base.Curriculum.log_metrics>` function for curricula.
Empty file.
7 changes: 7 additions & 0 deletions docs-source/modules/modules.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
syllabus
========

.. toctree::
:maxdepth: 4

syllabus
Empty file.
42 changes: 42 additions & 0 deletions docs-source/modules/syllabus.core.curriculum.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
.. _Curriculum API:

Curriculum
==========

Syllabus's Curriculum API is a unified interface for curriculum learning methods. Curricula following this API
can be used with all of Syllabus's infrastructure. We hope that future curriculum learning research will provide
implementations following this API to encourage reproducibility and ease of use.

The full documentation for the curriculum class can be found :doc:`../modules/syllabus.core`

The Curriculum class has three main jobs:

- Maintain a sampling distribution over the task space.

- Incorporate feedback from the environments or training process to update the sampling distribution.

- Provide a sampling interface for the environment to draw tasks from.


In reality, the sampling distribution can be whatever you want, such as a uniform distribution,
a deterministic sequence of tasks, or a single constant task depending on the curriculum learning method.

To incorporate feedback from the environment, the API provides multiple methods:

- :mod:`update_on_step <syllabus.core.curriculum_base.Curriculum.update_on_step>`

- :mod:`update_task_progress <syllabus.core.curriculum_base.Curriculum.update_task_progress>`

- :mod:`update_on_episode <syllabus.core.curriculum_base.Curriculum.update_on_episode>`

- :mod:`update_on_step_batch <syllabus.core.curriculum_base.Curriculum.update_on_step_batch>`

- :mod:`update_curriculum_batch <syllabus.core.curriculum_base.Curriculum.update_curriculum_batch>`

syllabus.core.curriculum\_base module
-------------------------------------

.. automodule:: syllabus.core.curriculum_base
:members:
:undoc-members:
:show-inheritance:
Empty file.
File renamed without changes.
Empty file.
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Domain Randomization
=====================

Domain Randomization simply uniformly samples tasks from the task space. It can be a strong baseline in environments with relatively small task spaces.

.. automodule:: syllabus.curricula.domain_randomization
:members:
:undoc-members:
:show-inheritance:
Empty file.
Loading

0 comments on commit 03bcb6e

Please sign in to comment.