doc(nyz): add ch4 value rescale doc and ch6/ch7 translation

opendilab · Aug 14, 2023 · da51b63 · da51b63
1 parent 8708959
commit da51b63
Show file tree

Hide file tree

Showing 4 changed files with 184 additions and 0 deletions.
diff --git a/grad_clip_value_zh.html b/grad_clip_value_zh.html
@@ -0,0 +1,48 @@
+<!DOCTYPE html>
+<html><head><meta charset="utf-8"></meta><title>Annonated Algorithm Visualization</title><link rel="stylesheet" href="pylit.css?v=1"></link><link rel="stylesheet" href="solarized.css"></link><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css" integrity="sha384-Juol1FqnotbkyZUT5Z7gUPjQ9gzlwCENvUZTpQBAPxtusdwFLRy382PSDx5UUJ4/" crossorigin="anonymous"></link><script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.js" integrity="sha384-97gW6UIJxnlKemYavrqDHSX3SiygeOwIZhwyOKRfSaf0JWKRVj9hLASHgFTzT+0O" crossorigin="anonymous"></script><script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/contrib/auto-render.min.js" integrity="sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05" crossorigin="anonymous" onload="renderMathInElement(document.body);" defer="True"></script><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/lib/codemirror.min.css"></link><script src="https://cdn.jsdelivr.net/npm/[email protected]/lib/codemirror.min.js"></script><script src="https://cdn.jsdelivr.net/npm/[email protected]/mode/python/python.min.js"></script></head><body><div class="section" id="section-0"><div class="docs doc-strings"><p><p><a href="index.html"><b>HOME<br></b></a></p></p><a href="https://github.com/opendilab/PPOxFamily" target="_blank"><img alt="GitHub" style="max-width:100%;" src="https://img.shields.io/github/stars/opendilab/PPOxFamily?style=social"></img></a>  <a href="https://space.bilibili.com/1112854351?spm_id_from=333.337.0.0" target="_blank"><img alt="bilibili" style="max-width:100%;" src="https://img.shields.io/badge/bilibili-video%20course-blue"></img></a>  <a href="https://twitter.com/OpenDILab" rel="nofollow" target="_blank"><img alt="twitter" style="max-width:100%;" src="https://img.shields.io/twitter/follow/opendilab?style=social"></img></a><br><a href="https://github.com/opendilab/PPOxFamily/tree/main/chapter7_tricks/grad_clip_value_zh.py" target="_blank">View code on GitHub</a><br><br>本文件是梯度裁剪模块 <span style="color:#00cbf694;font-family:Monaco,IBMPlexMono;">torch.nn.utils.grad_clip_value</span> 的 PyTorch 实现。</div></div><div class="section" id="section-1"><div class="docs doc-strings"><p>    <b>概述</b><br>        梯度裁剪函数的实现，即 grad_clip_value  <a href="https://pytorch.org/docs/stable/_modules/torch/nn/utils/clip_grad.html#clip_grad_value_">Related Link</a><br>        该函数在 loss 反向传播后使用，它会将网络参数的所有梯度剪裁 (clip) 到一个固定范围 [-clip_value, clip_value] 之间。<br>        注意这个函数是原地操作，修改梯度并没有任何返回值。</p></div><div class="code"><pre><code id="code_1" name="py_code">from typing import Union, Iterable
+import torch
+
+_tensor_or_tensors = Union[torch.Tensor, Iterable[torch.Tensor]]
+
+
+def grad_clip_value(parameters: _tensor_or_tensors, clip_value: float) -> None:</code></pre></div></div><div class="section" id="section-3"><div class="docs doc-strings"><p>    将可训练参数的非空梯度保存到列表中。</p></div><div class="code"><pre><code id="code_3" name="py_code">    if isinstance(parameters, torch.Tensor):
+        parameters = [parameters]
+    grads = [p.grad for p in parameters if p.grad is not None]</code></pre></div></div><div class="section" id="section-4"><div class="docs doc-strings"><p>    将原始 clip_value 转换为 float 类型。</p></div><div class="code"><pre><code id="code_4" name="py_code">    clip_value = float(clip_value)</code></pre></div></div><div class="section" id="section-5"><div class="docs doc-strings"><p>    将梯度原地剪裁到 [-clip_value, Clip_value]。</p></div><div class="code"><pre><code id="code_5" name="py_code">    for grad in grads:
+        grad.data.clamp_(min=-clip_value, max=clip_value)
+
+</code></pre></div></div><div class="section" id="section-6"><div class="docs doc-strings"><p>    <b>概述</b><br>        对于使用固定值做梯度裁剪的测试函数。</p></div><div class="code"><pre><code id="code_6" name="py_code">def test_grad_clip_value():</code></pre></div></div><div class="section" id="section-8"><div class="docs doc-strings"><p>    准备超参数, batch size=4, action=32</p></div><div class="code"><pre><code id="code_8" name="py_code">    B, N = 4, 32</code></pre></div></div><div class="section" id="section-9"><div class="docs doc-strings"><p>    设置 clip_value 为 1e-3</p></div><div class="code"><pre><code id="code_9" name="py_code">    clip_value = 1e-3</code></pre></div></div><div class="section" id="section-10"><div class="docs doc-strings"><p>    生成回归的 logit 值和标签，在实际应用中， logit 值是整个网络的输出，并需要梯度计算。</p></div><div class="code"><pre><code id="code_10" name="py_code">    logit = torch.randn(B, N).requires_grad_(True)
+    label = torch.randn(B, N)</code></pre></div></div><div class="section" id="section-11"><div class="docs doc-strings"><p>    定义标准并计算 loss。</p></div><div class="code"><pre><code id="code_11" name="py_code">    criterion = torch.nn.MSELoss()
+    output = criterion(logit, label)</code></pre></div></div><div class="section" id="section-12"><div class="docs doc-strings"><p>    进行 loss 的反向传播并计算梯度。</p></div><div class="code"><pre><code id="code_12" name="py_code">    output.backward()</code></pre></div></div><div class="section" id="section-13"><div class="docs doc-strings"><p>    使用固定值对梯度进行剪裁（clip）。</p></div><div class="code"><pre><code id="code_13" name="py_code">    grad_clip_value(logit, clip_value)</code></pre></div></div><div class="section" id="section-14"><div class="docs doc-strings"><p>    在剪裁后，断言（assert）剪裁后的梯度值是否合理。</p></div><div class="code"><pre><code id="code_14" name="py_code">    assert isinstance(logit.grad, torch.Tensor)
+    for g in logit.grad:
+        assert (g <= clip_value).all()
+        assert (g >= -clip_value).all()
+
+</code></pre></div></div><div class="section" id="section-14"><div class="docs doc-strings"><p><i>如果读者关于本文档有任何问题和建议，可以在 GitHub 提 issue 或是直接发邮件给我们 ([email protected]) 。</i></p></div></div></body><script type="text/javascript">
+window.onload = function(){
+    var codeElement = document.getElementsByName('py_code');
+    var lineCount = 1;
+    for (var i = 0; i < codeElement.length; i++) {
+        var code = codeElement[i].innerText;
+        if (code.length <= 1) {
+            continue;
+        }
+
+        codeElement[i].innerHTML = "";
+
+        var codeMirror = CodeMirror(
+          codeElement[i],
+          {
+            value: code,
+            mode: "python",
+            theme: "solarized dark",
+            lineNumbers: true,
+            firstLineNumber: lineCount,
+            readOnly: false,
+            lineWrapping: true,
+          }
+        );
+        var noNewLineCode = code.replace(/[\r\n]/g, "");
+        lineCount += code.length - noNewLineCode.length + 1;
+    }
+};
+</script></html>
diff --git a/index.html b/index.html
@@ -0,0 +1,30 @@
+<!DOCTYPE html>
+<html><head><meta charset="utf-8"></meta><title>Annonated Algorithm Visualization</title><link rel="stylesheet" href="pylit.css?v=1"></link><link rel="stylesheet" href="solarized.css"></link><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css" integrity="sha384-Juol1FqnotbkyZUT5Z7gUPjQ9gzlwCENvUZTpQBAPxtusdwFLRy382PSDx5UUJ4/" crossorigin="anonymous"></link><script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.js" integrity="sha384-97gW6UIJxnlKemYavrqDHSX3SiygeOwIZhwyOKRfSaf0JWKRVj9hLASHgFTzT+0O" crossorigin="anonymous"></script><script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/contrib/auto-render.min.js" integrity="sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05" crossorigin="anonymous" onload="renderMathInElement(document.body);" defer="True"></script><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/lib/codemirror.min.css"></link><script src="https://cdn.jsdelivr.net/npm/[email protected]/lib/codemirror.min.js"></script><script src="https://cdn.jsdelivr.net/npm/[email protected]/mode/python/python.min.js"></script></head><body><div class="section" id="section0"><div class="docs doc-strings"><p><a href="index.html"><b>HOME</b></a></p><a href="https://github.com/opendilab/PPOxFamily" target="_blank"><img alt="GitHub" style="max-width:100%;" src="https://img.shields.io/github/stars/opendilab/PPOxFamily?style=social"></img></a>  <a href="https://space.bilibili.com/1112854351?spm_id_from=333.337.0.0" target="_blank"><img alt="bilibili" style="max-width:100%;" src="https://img.shields.io/badge/bilibili-video%20course-blue"></img></a>  <a href="https://twitter.com/OpenDILab" rel="nofollow" target="_blank"><img alt="twitter" style="max-width:100%;" src="https://img.shields.io/twitter/follow/opendilab?style=social"></img></a><br><a href="https://github.com/opendilab/PPOxFamily" target="_blank">View code on GitHub</a></div></div><div class="section" id="section1"><div class="docs doc-strings"><h1><a href="https://github.com/opendilab/PPOxFamily">PPO × Family PyTorch 注解文档</a></h1><img alt="logo" src="./imgs/ppof_logo.png"></img><p>作为 PPO × Family 决策智能入门公开课的“算法-代码”注解文档，力求发掘 PPO 算法的每一个细节，帮助读者快速掌握设计决策人工智能的万能钥匙。</p></div></div><div class="section" id="section1"><div class="docs doc-strings"><h2>各章节代码解读示例目录</h2><h4>开启决策 AI 探索之旅</h4><li><a href="./pg_zh.html">策略梯度（PG）算法核心代码</a>  |  <a href="./pg.html">Policy Gradient core loss function</a></li><li><a href="./a2c_zh.html">A2C 算法核心代码</a>  |  <a href="./a2c.html">A2C core loss function</a></li><li><a href="./ppo_zh.html">PPO 算法核心代码</a>  |  <a href="./ppo.html">PPO core loss function</a></li><br><h4>解构复杂动作空间</h4><li><a href="./discrete_zh.html">PPO 建模离散动作空间</a>  |  <a href="./discrete.html">PPO in discrete action space</a></li><li><a href="./continuous_zh.html">PPO 建模连续动作空间</a>  |  <a href="./continuous.html">PPO in continuous action space</a></li><li><a href="./hybrid_zh.html">PPO 建模混合动作空间</a>  |  <a href="./hybrid.html">PPO in hybrid action space</a></li><br><h4>表征多模态观察空间</h4><li><a href="./encoding_zh.html">特征编码的各种技巧</a>  |  <a href="./encoding.html">Encoding methods for vector obs space</a></li><li><a href="./mario_wrapper_zh.html">图片动作空间的各类环境包装器</a>  |  <a href="./mario_wrapper.html">Env wrappers for image obs space</a></li><li><a href="./gradient_zh.html">神经网络梯度计算的代码解析</a>  |  <a href="./gradient.html">Automatic gradient mechanism</a></li><br><h4>解密稀疏奖励空间</h4><li><a href="./popart.html">Pop-Art normalization trick used in PPO</a></li><li><a href="./value_rescale.html">Value rescale trick used in PPO</a></li><br><h4>探索时序建模</h4><li><a href="./lstm.html">PPO + LSTM</a></li><li><a href="./gtrxl.html">PPO + Gated Transformer-XL</a></li><br><h4>统筹多智能体</h4><li><a href="./marl_network_zh.html">多智能体协作经典的神经网络架构</a>  |  <a href="./marl_network.html">Multi-Agent cooperation network</a></li><li><a href="./independentpg_zh.html">多智能体独立决策的策略梯度训练流程</a>  |  <a href="./independentpg.html">Independent policy gradient training</a></li><li><a href="./mapg_zh.html">多智能体协作决策的策略梯度训练流程</a>  |  <a href="./mapg.html">Multi-Agent policy gradient training</a></li><li><a href="./mappo_zh.html">多智能体协作决策的 PPO 算法训练流程</a>  |  <a href="./mappo.html">Multi-Agent PPO training</a></li><br><h4>挖掘黑科技</h4><li><a href="./gae.html">GAE technique used in PPO</a></li><li><a href="./recompute.html">Recompute adv trick used in PPO</a></li><li><a href="./grad_clip_norm_zh.html">PPO 中使用的梯度范数裁剪</a>  |  <a href="./grad_clip_norm.html">Gradient norm clip trick used in PPO</a></li><li><a href="./grad_clip_value_zh.html">PPO 中使用的梯度数值裁剪</a>  |  <a href="./grad_clip_value.html">Gradient value clip trick used in PPO</a></li><li><a href="./grad_ignore.html">Gradient ignore trick used in PPO</a></li><li><a href="./orthogonal_init.html">Orthogonal initialization of networks used in PPO</a></li><li><a href="./dual_clip.html">Dual clip trick used in PPO</a></li><li><a href="./value_clip.html">Value clip trick used in PPO</a></li></div></div><div class="section" id="section-final"><div class="docs doc-strings"><p><i>如果读者关于本文档有任何问题和建议，可以在 GitHub 提 issue 或是直接发邮件给我们 ([email protected]) 。</i></p></div></div></body><script type="text/javascript">
+window.onload = function(){
+    var codeElement = document.getElementsByName('py_code');
+    var lineCount = 1;
+    for (var i = 0; i < codeElement.length; i++) {
+        var code = codeElement[i].innerText;
+        if (code.length <= 1) {
+            continue;
+        }
+
+        codeElement[i].innerHTML = "";
+
+        var codeMirror = CodeMirror(
+          codeElement[i],
+          {
+            value: code,
+            mode: "python",
+            theme: "solarized dark",
+            lineNumbers: true,
+            firstLineNumber: lineCount,
+            readOnly: true,
+            lineWrapping: true,
+          }
+        );
+        var noNewLineCode = code.replace(/[\r\n]/g, "");
+        lineCount += code.length - noNewLineCode.length + 1;
+    }
+};
+</script></html>