Update docs

RyanNavillus · Nov 28, 2024 · 03bcb6e · 03bcb6e
1 parent 6ebb920
commit 03bcb6e
Show file tree

Hide file tree

Showing 138 changed files with 2,537 additions and 436 deletions.
diff --git a/README.md b/README.md
@@ -123,7 +123,7 @@ To help people get started using Syllabus, I've added a few simple curriculum le
 To build the documentation, run the following commands:
 
 ```
-sphinx-build -M html ./docs/source ./docs
+sphinx-build -M html ./docs-source ./docs
 cp -r ./docs/html/* ./docs && rm -R ./docs/html/*
 ```
 
@@ -133,6 +133,7 @@ If you need to regenerate the module docs from scratch, you can use the followin
 ```
 sphinx-apidoc -o ./docs/modules ./syllabus
 ```
+Then manually merge the results with ./docs-source as needed.
 
 ## Citing Syllabus
 To be added soon.
diff --git a/docs-source/background/curriculum_learning.rst b/docs-source/background/curriculum_learning.rst
@@ -0,0 +1,5 @@
+Curriculum Learning
+===================
+
+One day this page will include a proper introduction to curriculum learning. For now this, paper is a good reference `Curriculum Learning for Reinforcement Learning Domains:
+A Framework and Survey <https://arxiv.org/pdf/2003.04960.pdf>`_
diff --git a/docs-source/background/curriculum_learning.rst:Zone.Identifier b/docs-source/background/curriculum_learning.rst:Zone.Identifier
diff --git a/docs-source/background/ued.rst b/docs-source/background/ued.rst
@@ -0,0 +1,5 @@
+Unsupervised Environment Design
+===============================
+
+Unsupervised Environment Design is a curriculum learning paradigm proposed in `Emergent Complexity and Zero-shot Transfer via
+Unsupervised Environment Design <https://arxiv.org/pdf/2012.02096.pdf>`_ (Dennis et al. 2020)
diff --git a/docs-source/background/ued.rst:Zone.Identifier b/docs-source/background/ued.rst:Zone.Identifier
diff --git a/docs-source/benchmarks.rst b/docs-source/benchmarks.rst
@@ -0,0 +1,11 @@
+Benchmarks
+==========
+
+We're actively working on testing and benchmarking the Syllabus implementations of curriculum learning algorithms.
+For now only Domain Randomization and Prioritized Level Replay been evaluated to match baseline performance.
+
+Our benchmark data is publically available at our weights and biases page, and we will continue to update it as we test more methods.
+
+Weights and Biases: wandb_
+
+.. _wandb: https://wandb.ai/ryansullivan/syllabus/workspace?workspace=user-ryansullivan
diff --git a/docs-source/benchmarks.rst:Zone.Identifier b/docs-source/benchmarks.rst:Zone.Identifier
diff --git a/docs-source/conf.py b/docs-source/conf.py
@@ -0,0 +1,34 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# For the full list of built-in configuration values, see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Project information -----------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
+
+project = 'Syllabus'
+copyright = '2023, Ryan Sullivan'
+author = 'Ryan Sullivan'
+release = '0.5'
+
+# -- General configuration ---------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
+
+extensions = ['sphinx.ext.autodoc',
+              'sphinx.ext.intersphinx',
+              'sphinx_tabs.tabs',
+              'sphinx.ext.napoleon',
+              'sphinxcontrib.spelling']
+
+templates_path = ['_templates']
+
+import sys
+import os
+sys.path.insert(0, os.path.abspath('../../syllabus'))
+
+
+# -- Options for HTML output -------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
+
+html_theme = 'furo'
+html_static_path = []
diff --git a/docs-source/conf.py:Zone.Identifier b/docs-source/conf.py:Zone.Identifier
diff --git a/docs-source/curricula/custom_curricula.rst b/docs-source/curricula/custom_curricula.rst
@@ -0,0 +1,75 @@
+
+Creating Your Own Curriculum
+============================
+
+To create your own curriculum, all you need to do is write a subclass of Syllabus's `Curriculum` class. 
+`Curriculum` provides multiple methods for updating your curriculum, each meant for a different context. 
+By subclassing the `Curriculum` class, your method will automatically work with all of Syllabus's provided tools and infrastructure.
+
+----------------
+Required Methods
+----------------
+
+Your curriculum method is REQUIRED to implement the following methods:
+
+* :mod:`sample(k: int = 1) <syllabus.core.curriculum_base.Curriculum.sample>` - Returns a list of `k` tasks sampled from the curriculum.
+
+The `sample` method is how your curriculum decides which task the environments will play.
+Most methods use some combination of logic and probability distributions to choose tasks, but there are no restrictions on how you choose tasks.
+
+
+----------------------------
+Curriculum Dependent Methods
+----------------------------
+
+Your curriculum will likely require some feedback from the RL training loop to guide its task selection. These might be rewards from the environment, error values from the agent, or some other metric that you define. 
+Depending on which type of information your curriculum requires, you will need to implement one or more of the following methods:
+
+* :mod:`update_task_progress(task, progress) <syllabus.core.curriculum_base.Curriculum.update_task_progress>` - is called either after each step or each episode :sup:`1` . It receives a task name and a boolean or float value indicating the current progress on the provided task. Values of True or 1.0 typically indicate a completed task.
+
+* :mod:`update_on_step(obs, rew, term, trunc, info)  <syllabus.core.curriculum_base.Curriculum.update_on_step>` - is called once for each environment step.
+
+* :mod:`update_on_episode  <syllabus.core.curriculum_base.Curriculum.update_on_episode>` - (**Not yet implemented**) will be called once for each completed episode by the environment synchronization wrapper.
+
+* :mod:`update_on_demand(metrics)  <syllabus.core.curriculum_base.Curriculum.update_on_demand>` - is meant to be called by the main learner process to update a curriculum with information from the training process, such as TD errors or gradient norms. It is never used by the individual environments. It receives a dictionary of metrics of arbitrary types.
+
+Your curriculum will probably only use one of these methods, so you can choose to only override the one that you need. For example, the Learning Progress Curriculum
+only uses episodic task progress updates with `update_task_progress` and Prioritized Level Replay receives updates from the main process through `update_on_demand`.
+
+:sup:`1` If you choose not to use `update_on_step()` to update your curriculum, set `update_on_step=False` when initializing the environment synchronization wrapper
+to prevent it from being called and improve performance (An exception with the same suggestion is raised by default).
+
+
+-------------------
+Recommended Methods
+-------------------
+
+For most curricula, we recommend implementing these methods to support convenience features in Syllabus:
+
+* :mod:`_sample_distribution()  <syllabus.core.curriculum_base.Curriculum._sample_distribution>` - Returns a probability distribution over tasks
+
+* :mod:`log_metrics(writer)  <syllabus.core.curriculum_base.Curriculum.log_metrics>` - Logs curriculum-specific metrics to the provided tensorboard or weights and biases logger.
+
+If your curriculum uses a probability distribution to sample tasks, you should implement `_sample_distribution()`. The default implementation of `log_metrics` will log the probabilities from `_sample_distribution()`
+for each task in a discrete task space to tensorboard or weights and biases. You can also override `log_metrics` to log other values for your specific curriculum.
+
+----------------
+Optional Methods
+----------------
+
+You can optionally choose to implement these additional methods:
+
+
+* :mod:`update_on_step_batch(update_list)  <syllabus.core.curriculum_base.Curriculum.update_on_step_batch>` - Updates the curriculum with a batch of step updates.
+
+* :mod:`update_curriculum_batch(update_data)  <syllabus.core.curriculum_base.Curriculum.update_curriculum_batch>` - Updates the curriculum with a batch of data.
+
+
+`update_curriculum_batch` and `update_on_step_batch` can be overridden to provide a more efficient curriculum-specific implementation. The default implementation simply iterates over the updates.
+
+
+Each curriculum also specifies two constants: REQUIRES_STEP_UPDATES and REQUIRES_CENTRAL_UPDATES.
+
+* REQUIRES_STEP_UPDATES - If True, the environment synchronization wrapper should set `update_on_step=True` to provide the curriculum with updates after each step.
+
+* REQUIRES_CENTRAL_UPDATES - If True, the user will need to call `update_on_demand()` to provide the curriculum with updates from the main process. We recommend adding a warning to your curriculum if too many tasks are sampled without receiving updates.
diff --git a/docs-source/curricula/custom_curricula.rst:Zone.Identifier b/docs-source/curricula/custom_curricula.rst:Zone.Identifier
diff --git a/docs-source/curricula/implemented_curricula.rst b/docs-source/curricula/implemented_curricula.rst
@@ -0,0 +1,51 @@
+==================
+Curriculum Methods
+==================
+
+Syllabus has a small collection of curriculum learning methods implemented.These include simple techniques that are often used in practice
+but rarely highlighted in the literature,such as simulated annealing of difficulty, or sequential curricula of easy to hard tasks. We also
+have several popular curriculum learning baselines; Domain Randomization, Prioritized Level Replay (Jiang et al. 2021), and the learning
+progress curriculum introduced in Kanitscheider et al. 2021.
+
+-----------------------------------------------------------------------------------------
+:mod:`Domain Randomization <syllabus.curricula.domain_randomization.DomainRandomization>`
+-----------------------------------------------------------------------------------------
+
+A simple but strong baseline for curriculum learning that uniformly samples a task from the task space.
+
+---------------------------------------------------------------------------------
+:mod:`Sequential Curriculum <syllabus.curricula.sequential.SequentialCurriculum>`
+---------------------------------------------------------------------------------
+
+Plays a provided list of tasks in order for a prespecified number of episodes.
+It can be used to manually design curricula by providing tasks in an order that you feel will result in the best final performance.
+*Coming Soon*: functional stopping criteria instead of a fixed number of episodes.
+
+--------------------------------------------------------------------------------
+:mod:`Simple Box Curriculum <syllabus.curricula.simple_box.SimpleBoxCurriculum>`
+--------------------------------------------------------------------------------
+
+A simple curriculum that expands a zero-centered range from an initial range to a final range over a number of discrete steps.
+The curriculum increases the range to the next stage when a provided reward threshold is met.
+
+------------------------------------------------------------------------------------------
+:mod:`Learning Progress <syllabus.curricula.learning_progress.LearningProgressCurriculum>`
+------------------------------------------------------------------------------------------
+
+Uses a heuristic to estimate the learning progress of a task. It maintains a fast and slow exponential moving average (EMA) of the task
+completion rates for a set of discrete tasks.
+By measuring the difference between the fast and slow EMAs and reweighting it to adjust for the time delay created by the EMA, this method can
+estimate the learning progress of a task.
+The curriculum then assigns a higher probability to tasks with a very high or very low learning progress, indicating that the agent
+is either learning or forgetting the task. For more information you can read the original paper
+`Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft (Kanitscheider et al. 2021) <https://arxiv.org/pdf/2106.14876.pdf>`_.
+
+-------------------------------------------------------------------------------------------
+:mod:`Prioritized Level Replay <syllabus.curricula.plr.plr_wrapper.PrioritizedLevelReplay>`
+-------------------------------------------------------------------------------------------
+
+A curriculum learning method that estimates an agent's regret on particular environment instantiations and uses a prioritized replay buffer to
+replay levels for which the agent has high regret. This implementation is based on the open-source original implementation at
+https://github.com/facebookresearch/level-replay, but has been modified to support Syllabus task spaces instead of just environment seeds.
+PLR has been used in multiple prominent RL works. For more information you can read the original paper
+`Prioritized Level Replay (Jiang et al. 2021) <https://arxiv.org/pdf/2010.03934.pdf>`_.
diff --git a/docs-source/curricula/implemented_curricula.rst:Zone.Identifier b/docs-source/curricula/implemented_curricula.rst:Zone.Identifier
diff --git a/docs-source/environments.rst b/docs-source/environments.rst
@@ -0,0 +1,15 @@
+Environment Support
+===================
+
+Syllabus is implemented with the new Gymnasium API, which is different from the old OpenAI Gym API.
+However, it is possible to use environments implemented with the Gym API in Syllabus.
+We recommend using the `Shimmy <https://github.com/Farama-Foundation/Shimmy>`_ package to convert Gym environments to Gymnasium environments.
+
+.. code-block:: python
+
+        import gym 
+        from shimmy.openai_gym_compatibility import GymV21CompatibilityV0
+
+        env = gym.make('CartPole-v0')
+        env = GymV21CompatibilityV0(env)
+
diff --git a/docs-source/environments.rst:Zone.Identifier b/docs-source/environments.rst:Zone.Identifier
diff --git a/docs-source/evaluation/evaluation.rst b/docs-source/evaluation/evaluation.rst
@@ -0,0 +1,9 @@
+Evaluation
+==========
+
+Evaluating RL agents trained with curriculum learning requires special consideration. Typically training tasks are assumed to be drawn from the same distribution as the test tasks. However, curriculum learning methods modify the training task distribution to improve test performance. Therefore, training returns are not a good measure of performance. Agents should be periodically evaluated during training on uniformly sampled tasks, ideally from a held out test set. You can see an example of this approach in our `procgen script <https://github.com/RyanNavillus/Syllabus/tree/main/syllabus/examples>`_.
+
+Correctly implementing this evaluation code can be surprisingly challenging, so we list a few guidelines to keep in mind here:
+* Make sure not to bias evaluation results towards shorter episodes. This is is easy to do by accident if you try to multiprocess evaluations. For example, if you run a vectorized environment and save the first 10 results, your test returns will be biased toward shorter episodes, which likely earned lower returns.
+* Reset the environments before each evaluation. This may seem obvious, but if you since some vectorized environments don't allow you to directly reset the environments, some might be tempted to skip this step.
+* Use the same environment wrappers for the evaluation environment. This is important because some wrappers, such as the `TimeLimit` wrapper, can change the dynamics of the environment. If you use different wrappers, you may get different results.
diff --git a/docs-source/evaluation/evaluation.rst:Zone.Identifier b/docs-source/evaluation/evaluation.rst:Zone.Identifier
diff --git a/docs-source/index.rst b/docs-source/index.rst
@@ -0,0 +1,73 @@
+.. Syllabus documentation master file, created by
+   sphinx-quickstart on Mon Jul 10 07:05:19 2023.
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+
+Syllabus Documentation
+======================
+
+Syllabus is a library for using curriculum learning to train reinforcement learning agents. It provides a Curriculum API from
+defining curriculum learning algorithms, implementations of popular curriculum learning methods, and a framework for synchronizing 
+those curricula across environments running in multiple processes. Syllabus makes it easy to implement curriculum learning methods
+and add them to existing training code. It takes only a few lines of code to add a curriculum to an existing training script, and
+because of the shared Curriculum API, you can swap out different curriculum learning methods by changing a single line of code.
+
+It currently has support for environments run with Python native multiprocessing or Ray actors, which includes RLLib, CleanRL, 
+Stable Baselines 3, and Monobeast (Torchbeast). We have working examples with CleanRL, RLLib, Stable Baselines 3, and Monobeast (Torchbeast). 
+We also have preliminary support and examples for multiagent PettingZoo environments.
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Getting Started:
+
+   self
+   installation
+   quickstart
+   environments
+   evaluation/evaluation
+   logging
+   benchmarks
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Curriculum Learning Background:
+
+   background/curriculum_learning
+   background/ued
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Curriculum API:
+
+   modules/syllabus.core.curriculum
+   curricula/custom_curricula
+   curricula/implemented_curricula
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Curriculum Methods:
+
+   modules/syllabus.curricula.plr
+   modules/syllabus.curricula.domain_randomization
+   modules/syllabus.curricula.learning_progress
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Task Spaces:
+
+   modules/syllabus.task_space
+   modules/syllabus.core.task_interface
+   modules/syllabus.examples.task_wrappers
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Synchronization:
+
+   modules/syllabus.core
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Development:
+
+   Github <https://github.com/RyanNavillus/Syllabus>
+   modules/modules
diff --git a/docs-source/index.rst:Zone.Identifier b/docs-source/index.rst:Zone.Identifier
diff --git a/docs-source/installation.rst b/docs-source/installation.rst
@@ -0,0 +1,17 @@
+Installation
+============
+
+You can install Syllabus on pip with the following command:
+
+.. code-block:: bash
+    
+        pip install syllabus-rl
+
+To install a development branch of syllabus, you can clone the repository and install it with pip:
+
+.. code-block:: bash
+
+        git clone [email protected]:RyanNavillus/Syllabus.git
+        git checkout <branch-name>
+        cd Syllabus
+        pip install -e .[all]
diff --git a/docs-source/installation.rst:Zone.Identifier b/docs-source/installation.rst:Zone.Identifier
diff --git a/docs-source/logging.rst b/docs-source/logging.rst
@@ -0,0 +1,4 @@
+Logging
+=======
+
+Syllabus currently has preliminary support for weights and biases or tensorboard logging through the :mod:`log_metrics <syllabus.core.curriculum_base.Curriculum.log_metrics>` function for curricula.
diff --git a/docs-source/logging.rst:Zone.Identifier b/docs-source/logging.rst:Zone.Identifier
diff --git a/docs-source/modules/modules.rst b/docs-source/modules/modules.rst
@@ -0,0 +1,7 @@
+syllabus
+========
+
+.. toctree::
+   :maxdepth: 4
+
+   syllabus
diff --git a/docs-source/modules/modules.rst:Zone.Identifier b/docs-source/modules/modules.rst:Zone.Identifier
diff --git a/docs-source/modules/syllabus.core.curriculum.rst b/docs-source/modules/syllabus.core.curriculum.rst
@@ -0,0 +1,42 @@
+.. _Curriculum API:
+
+Curriculum
+==========
+
+Syllabus's Curriculum API is a unified interface for curriculum learning methods. Curricula following this API
+can be used with all of Syllabus's infrastructure. We hope that future curriculum learning research will provide
+implementations following this API to encourage reproducibility and ease of use.
+
+The full documentation for the curriculum class can be found :doc:`../modules/syllabus.core`
+
+The Curriculum class has three main jobs:
+
+- Maintain a sampling distribution over the task space.
+
+- Incorporate feedback from the environments or training process to update the sampling distribution.
+
+- Provide a sampling interface for the environment to draw tasks from.
+
+
+In reality, the sampling distribution can be whatever you want, such as a uniform distribution,
+a deterministic sequence of tasks, or a single constant task depending on the curriculum learning method.
+
+To incorporate feedback from the environment, the API provides multiple methods:
+
+- :mod:`update_on_step <syllabus.core.curriculum_base.Curriculum.update_on_step>`
+
+- :mod:`update_task_progress <syllabus.core.curriculum_base.Curriculum.update_task_progress>`
+
+- :mod:`update_on_episode <syllabus.core.curriculum_base.Curriculum.update_on_episode>`
+
+- :mod:`update_on_step_batch <syllabus.core.curriculum_base.Curriculum.update_on_step_batch>`
+
+- :mod:`update_curriculum_batch <syllabus.core.curriculum_base.Curriculum.update_curriculum_batch>`
+
+syllabus.core.curriculum\_base module
+-------------------------------------
+
+.. automodule:: syllabus.core.curriculum_base
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs-source/modules/syllabus.core.curriculum.rst:Zone.Identifier b/docs-source/modules/syllabus.core.curriculum.rst:Zone.Identifier
diff --git a/docs/source/modules/syllabus.core.rst → docs-source/modules/syllabus.core.rst b/docs/source/modules/syllabus.core.rst → docs-source/modules/syllabus.core.rst
diff --git a/docs-source/modules/syllabus.core.rst:Zone.Identifier b/docs-source/modules/syllabus.core.rst:Zone.Identifier
diff --git a/.../modules/syllabus.core.task_interface.rst → .../modules/syllabus.core.task_interface.rst b/.../modules/syllabus.core.task_interface.rst → .../modules/syllabus.core.task_interface.rst
diff --git a/docs-source/modules/syllabus.core.task_interface.rst:Zone.Identifier b/docs-source/modules/syllabus.core.task_interface.rst:Zone.Identifier
diff --git a/docs-source/modules/syllabus.curricula.domain_randomization.rst b/docs-source/modules/syllabus.curricula.domain_randomization.rst
@@ -0,0 +1,9 @@
+Domain Randomization
+=====================
+
+Domain Randomization simply uniformly samples tasks from the task space. It can be a strong baseline in environments with relatively small task spaces.
+
+.. automodule:: syllabus.curricula.domain_randomization
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs-source/modules/syllabus.curricula.domain_randomization.rst:Zone.Identifier b/docs-source/modules/syllabus.curricula.domain_randomization.rst:Zone.Identifier