Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
RyanNavillus committed Nov 26, 2024
1 parent e949b26 commit 6ebb920
Show file tree
Hide file tree
Showing 21 changed files with 1,857 additions and 115 deletions.
12 changes: 6 additions & 6 deletions docs/curricula/custom_curricula.html
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ <h1>Creating Your Own Curriculum<a class="headerlink" href="#creating-your-own-c
<h2>Required Methods<a class="headerlink" href="#required-methods" title="Link to this heading"></a></h2>
<p>Your curriculum method is REQUIRED to implement the following methods:</p>
<ul class="simple">
<li><p><a class="reference internal" href="../modules/syllabus.core.curriculum.html#syllabus.core.curriculum_base.Curriculum.sample" title="syllabus.core.curriculum_base.Curriculum.sample"><code class="xref py py-mod docutils literal notranslate"><span class="pre">sample(k:</span> <span class="pre">int</span> <span class="pre">=</span> <span class="pre">1)</span></code></a> - Returns a list of <cite>k</cite> tasks sampled from the curriculum.</p></li>
<li><p><code class="xref py py-mod docutils literal notranslate"><span class="pre">sample(k:</span> <span class="pre">int</span> <span class="pre">=</span> <span class="pre">1)</span></code> - Returns a list of <cite>k</cite> tasks sampled from the curriculum.</p></li>
</ul>
<p>The <cite>sample</cite> method is how your curriculum decides which task the environments will play.
Most methods use some combination of logic and probability distributions to choose tasks, but there are no restrictions on how you choose tasks.</p>
Expand All @@ -264,9 +264,9 @@ <h2>Curriculum Dependent Methods<a class="headerlink" href="#curriculum-dependen
<p>Your curriculum will likely require some feedback from the RL training loop to guide its task selection. These might be rewards from the environment, error values from the agent, or some other metric that you define.
Depending on which type of information your curriculum requires, you will need to implement one or more of the following methods:</p>
<ul class="simple">
<li><p><a class="reference internal" href="../modules/syllabus.core.curriculum.html#syllabus.core.curriculum_base.Curriculum.update_task_progress" title="syllabus.core.curriculum_base.Curriculum.update_task_progress"><code class="xref py py-mod docutils literal notranslate"><span class="pre">update_task_progress(task,</span> <span class="pre">progress)</span></code></a> - is called either after each step or each episode <sup>1</sup> . It receives a task name and a boolean or float value indicating the current progress on the provided task. Values of True or 1.0 typically indicate a completed task.</p></li>
<li><p><a class="reference internal" href="../modules/syllabus.core.curriculum.html#syllabus.core.curriculum_base.Curriculum.update_on_step" title="syllabus.core.curriculum_base.Curriculum.update_on_step"><code class="xref py py-mod docutils literal notranslate"><span class="pre">update_on_step(obs,</span> <span class="pre">rew,</span> <span class="pre">term,</span> <span class="pre">trunc,</span> <span class="pre">info)</span></code></a> - is called once for each environment step.</p></li>
<li><p><a class="reference internal" href="../modules/syllabus.core.curriculum.html#syllabus.core.curriculum_base.Curriculum.update_on_episode" title="syllabus.core.curriculum_base.Curriculum.update_on_episode"><code class="xref py py-mod docutils literal notranslate"><span class="pre">update_on_episode</span></code></a> - (<strong>Not yet implemented</strong>) will be called once for each completed episode by the environment synchronization wrapper.</p></li>
<li><p><code class="xref py py-mod docutils literal notranslate"><span class="pre">update_task_progress(task,</span> <span class="pre">progress)</span></code> - is called either after each step or each episode <sup>1</sup> . It receives a task name and a boolean or float value indicating the current progress on the provided task. Values of True or 1.0 typically indicate a completed task.</p></li>
<li><p><code class="xref py py-mod docutils literal notranslate"><span class="pre">update_on_step(obs,</span> <span class="pre">rew,</span> <span class="pre">term,</span> <span class="pre">trunc,</span> <span class="pre">info)</span></code> - is called once for each environment step.</p></li>
<li><p><code class="xref py py-mod docutils literal notranslate"><span class="pre">update_on_episode</span></code> - (<strong>Not yet implemented</strong>) will be called once for each completed episode by the environment synchronization wrapper.</p></li>
<li><p><code class="xref py py-mod docutils literal notranslate"><span class="pre">update_on_demand(metrics)</span></code> - is meant to be called by the main learner process to update a curriculum with information from the training process, such as TD errors or gradient norms. It is never used by the individual environments. It receives a dictionary of metrics of arbitrary types.</p></li>
</ul>
<p>Your curriculum will probably only use one of these methods, so you can choose to only override the one that you need. For example, the Learning Progress Curriculum
Expand All @@ -279,7 +279,7 @@ <h2>Recommended Methods<a class="headerlink" href="#recommended-methods" title="
<p>For most curricula, we recommend implementing these methods to support convenience features in Syllabus:</p>
<ul class="simple">
<li><p><code class="xref py py-mod docutils literal notranslate"><span class="pre">_sample_distribution()</span></code> - Returns a probability distribution over tasks</p></li>
<li><p><a class="reference internal" href="../modules/syllabus.core.curriculum.html#syllabus.core.curriculum_base.Curriculum.log_metrics" title="syllabus.core.curriculum_base.Curriculum.log_metrics"><code class="xref py py-mod docutils literal notranslate"><span class="pre">log_metrics(writer)</span></code></a> - Logs curriculum-specific metrics to the provided tensorboard or weights and biases logger.</p></li>
<li><p><code class="xref py py-mod docutils literal notranslate"><span class="pre">log_metrics(writer)</span></code> - Logs curriculum-specific metrics to the provided tensorboard or weights and biases logger.</p></li>
</ul>
<p>If your curriculum uses a probability distribution to sample tasks, you should implement <cite>_sample_distribution()</cite>. The default implementation of <cite>log_metrics</cite> will log the probabilities from <cite>_sample_distribution()</cite>
for each task in a discrete task space to tensorboard or weights and biases. You can also override <cite>log_metrics</cite> to log other values for your specific curriculum.</p>
Expand All @@ -288,7 +288,7 @@ <h2>Recommended Methods<a class="headerlink" href="#recommended-methods" title="
<h2>Optional Methods<a class="headerlink" href="#optional-methods" title="Link to this heading"></a></h2>
<p>You can optionally choose to implement these additional methods:</p>
<ul class="simple">
<li><p><a class="reference internal" href="../modules/syllabus.core.curriculum.html#syllabus.core.curriculum_base.Curriculum.update_on_step_batch" title="syllabus.core.curriculum_base.Curriculum.update_on_step_batch"><code class="xref py py-mod docutils literal notranslate"><span class="pre">update_on_step_batch(update_list)</span></code></a> - Updates the curriculum with a batch of step updates.</p></li>
<li><p><code class="xref py py-mod docutils literal notranslate"><span class="pre">update_on_step_batch(update_list)</span></code> - Updates the curriculum with a batch of step updates.</p></li>
<li><p><code class="xref py py-mod docutils literal notranslate"><span class="pre">update_curriculum_batch(update_data)</span></code> - Updates the curriculum with a batch of data.</p></li>
</ul>
<p><cite>update_curriculum_batch</cite> and <cite>update_on_step_batch</cite> can be overridden to provide a more efficient curriculum-specific implementation. The default implementation simply iterates over the updates.</p>
Expand Down
4 changes: 2 additions & 2 deletions docs/curricula/implemented_curricula.html
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ <h1>Curriculum Methods<a class="headerlink" href="#curriculum-methods" title="Li
have several popular curriculum learning baselines; Domain Randomization, Prioritized Level Replay (Jiang et al. 2021), and the learning
progress curriculum introduced in Kanitscheider et al. 2021.</p>
<section id="domain-randomization">
<h2><a class="reference internal" href="../modules/syllabus.curricula.domain_randomization.html#syllabus.curricula.domain_randomization.DomainRandomization" title="syllabus.curricula.domain_randomization.DomainRandomization"><code class="xref py py-mod docutils literal notranslate"><span class="pre">Domain</span> <span class="pre">Randomization</span></code></a><a class="headerlink" href="#domain-randomization" title="Link to this heading"></a></h2>
<h2><a class="reference internal" href="../modules/syllabus.curricula.html#syllabus.curricula.domain_randomization.DomainRandomization" title="syllabus.curricula.domain_randomization.DomainRandomization"><code class="xref py py-mod docutils literal notranslate"><span class="pre">Domain</span> <span class="pre">Randomization</span></code></a><a class="headerlink" href="#domain-randomization" title="Link to this heading"></a></h2>
<p>A simple but strong baseline for curriculum learning that uniformly samples a task from the task space.</p>
</section>
<section id="sequential-curriculum">
Expand All @@ -267,7 +267,7 @@ <h2><a class="reference internal" href="../modules/syllabus.curricula.html#sylla
The curriculum increases the range to the next stage when a provided reward threshold is met.</p>
</section>
<section id="learning-progress">
<h2><a class="reference internal" href="../modules/syllabus.curricula.learning_progress.html#syllabus.curricula.learning_progress.LearningProgressCurriculum" title="syllabus.curricula.learning_progress.LearningProgressCurriculum"><code class="xref py py-mod docutils literal notranslate"><span class="pre">Learning</span> <span class="pre">Progress</span></code></a><a class="headerlink" href="#learning-progress" title="Link to this heading"></a></h2>
<h2><a class="reference internal" href="../modules/syllabus.curricula.html#syllabus.curricula.learning_progress.LearningProgressCurriculum" title="syllabus.curricula.learning_progress.LearningProgressCurriculum"><code class="xref py py-mod docutils literal notranslate"><span class="pre">Learning</span> <span class="pre">Progress</span></code></a><a class="headerlink" href="#learning-progress" title="Link to this heading"></a></h2>
<p>Uses a heuristic to estimate the learning progress of a task. It maintains a fast and slow exponential moving average (EMA) of the task
completion rates for a set of discrete tasks.
By measuring the difference between the fast and slow EMAs and reweighting it to adjust for the time delay created by the EMA, this method can
Expand Down
Loading

0 comments on commit 6ebb920

Please sign in to comment.