Skip to content

Commit

Permalink
doc(nyz): add ch4 value rescale doc and ch6/ch7 translation
Browse files Browse the repository at this point in the history
  • Loading branch information
PaParaZz1 committed Aug 14, 2023
1 parent 8708959 commit da51b63
Show file tree
Hide file tree
Showing 4 changed files with 184 additions and 0 deletions.
48 changes: 48 additions & 0 deletions grad_clip_value_zh.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
<!DOCTYPE html>
<html><head><meta charset="utf-8"></meta><title>Annonated Algorithm Visualization</title><link rel="stylesheet" href="pylit.css?v=1"></link><link rel="stylesheet" href="solarized.css"></link><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css" integrity="sha384-Juol1FqnotbkyZUT5Z7gUPjQ9gzlwCENvUZTpQBAPxtusdwFLRy382PSDx5UUJ4/" crossorigin="anonymous"></link><script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.js" integrity="sha384-97gW6UIJxnlKemYavrqDHSX3SiygeOwIZhwyOKRfSaf0JWKRVj9hLASHgFTzT+0O" crossorigin="anonymous"></script><script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/contrib/auto-render.min.js" integrity="sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05" crossorigin="anonymous" onload="renderMathInElement(document.body);" defer="True"></script><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/lib/codemirror.min.css"></link><script src="https://cdn.jsdelivr.net/npm/[email protected]/lib/codemirror.min.js"></script><script src="https://cdn.jsdelivr.net/npm/[email protected]/mode/python/python.min.js"></script></head><body><div class="section" id="section-0"><div class="docs doc-strings"><p><p><a href="index.html"><b>HOME<br></b></a></p></p><a href="https://github.com/opendilab/PPOxFamily" target="_blank"><img alt="GitHub" style="max-width:100%;" src="https://img.shields.io/github/stars/opendilab/PPOxFamily?style=social"></img></a> <a href="https://space.bilibili.com/1112854351?spm_id_from=333.337.0.0" target="_blank"><img alt="bilibili" style="max-width:100%;" src="https://img.shields.io/badge/bilibili-video%20course-blue"></img></a> <a href="https://twitter.com/OpenDILab" rel="nofollow" target="_blank"><img alt="twitter" style="max-width:100%;" src="https://img.shields.io/twitter/follow/opendilab?style=social"></img></a><br><a href="https://github.com/opendilab/PPOxFamily/tree/main/chapter7_tricks/grad_clip_value_zh.py" target="_blank">View code on GitHub</a><br><br>本文件是梯度裁剪模块 <span style="color:#00cbf694;font-family:Monaco,IBMPlexMono;">torch.nn.utils.grad_clip_value</span> 的 PyTorch 实现。</div></div><div class="section" id="section-1"><div class="docs doc-strings"><p> <b>概述</b><br> 梯度裁剪函数的实现,即 grad_clip_value <a href="https://pytorch.org/docs/stable/_modules/torch/nn/utils/clip_grad.html#clip_grad_value_">Related Link</a><br> 该函数在 loss 反向传播后使用,它会将网络参数的所有梯度剪裁 (clip) 到一个固定范围 [-clip_value, clip_value] 之间。<br> 注意这个函数是原地操作,修改梯度并没有任何返回值。</p></div><div class="code"><pre><code id="code_1" name="py_code">from typing import Union, Iterable
import torch

_tensor_or_tensors = Union[torch.Tensor, Iterable[torch.Tensor]]


def grad_clip_value(parameters: _tensor_or_tensors, clip_value: float) -> None:</code></pre></div></div><div class="section" id="section-3"><div class="docs doc-strings"><p> 将可训练参数的非空梯度保存到列表中。</p></div><div class="code"><pre><code id="code_3" name="py_code"> if isinstance(parameters, torch.Tensor):
parameters = [parameters]
grads = [p.grad for p in parameters if p.grad is not None]</code></pre></div></div><div class="section" id="section-4"><div class="docs doc-strings"><p> 将原始 clip_value 转换为 float 类型。</p></div><div class="code"><pre><code id="code_4" name="py_code"> clip_value = float(clip_value)</code></pre></div></div><div class="section" id="section-5"><div class="docs doc-strings"><p> 将梯度原地剪裁到 [-clip_value, Clip_value]。</p></div><div class="code"><pre><code id="code_5" name="py_code"> for grad in grads:
grad.data.clamp_(min=-clip_value, max=clip_value)

</code></pre></div></div><div class="section" id="section-6"><div class="docs doc-strings"><p> <b>概述</b><br> 对于使用固定值做梯度裁剪的测试函数。</p></div><div class="code"><pre><code id="code_6" name="py_code">def test_grad_clip_value():</code></pre></div></div><div class="section" id="section-8"><div class="docs doc-strings"><p> 准备超参数, batch size=4, action=32</p></div><div class="code"><pre><code id="code_8" name="py_code"> B, N = 4, 32</code></pre></div></div><div class="section" id="section-9"><div class="docs doc-strings"><p> 设置 clip_value 为 1e-3</p></div><div class="code"><pre><code id="code_9" name="py_code"> clip_value = 1e-3</code></pre></div></div><div class="section" id="section-10"><div class="docs doc-strings"><p> 生成回归的 logit 值和标签,在实际应用中, logit 值是整个网络的输出,并需要梯度计算。</p></div><div class="code"><pre><code id="code_10" name="py_code"> logit = torch.randn(B, N).requires_grad_(True)
label = torch.randn(B, N)</code></pre></div></div><div class="section" id="section-11"><div class="docs doc-strings"><p> 定义标准并计算 loss。</p></div><div class="code"><pre><code id="code_11" name="py_code"> criterion = torch.nn.MSELoss()
output = criterion(logit, label)</code></pre></div></div><div class="section" id="section-12"><div class="docs doc-strings"><p> 进行 loss 的反向传播并计算梯度。</p></div><div class="code"><pre><code id="code_12" name="py_code"> output.backward()</code></pre></div></div><div class="section" id="section-13"><div class="docs doc-strings"><p> 使用固定值对梯度进行剪裁(clip)。</p></div><div class="code"><pre><code id="code_13" name="py_code"> grad_clip_value(logit, clip_value)</code></pre></div></div><div class="section" id="section-14"><div class="docs doc-strings"><p> 在剪裁后,断言(assert)剪裁后的梯度值是否合理。</p></div><div class="code"><pre><code id="code_14" name="py_code"> assert isinstance(logit.grad, torch.Tensor)
for g in logit.grad:
assert (g <= clip_value).all()
assert (g >= -clip_value).all()

</code></pre></div></div><div class="section" id="section-14"><div class="docs doc-strings"><p><i>如果读者关于本文档有任何问题和建议,可以在 GitHub 提 issue 或是直接发邮件给我们 ([email protected]) 。</i></p></div></div></body><script type="text/javascript">
window.onload = function(){
var codeElement = document.getElementsByName('py_code');
var lineCount = 1;
for (var i = 0; i < codeElement.length; i++) {
var code = codeElement[i].innerText;
if (code.length <= 1) {
continue;
}

codeElement[i].innerHTML = "";

var codeMirror = CodeMirror(
codeElement[i],
{
value: code,
mode: "python",
theme: "solarized dark",
lineNumbers: true,
firstLineNumber: lineCount,
readOnly: false,
lineWrapping: true,
}
);
var noNewLineCode = code.replace(/[\r\n]/g, "");
lineCount += code.length - noNewLineCode.length + 1;
}
};
</script></html>
30 changes: 30 additions & 0 deletions index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
<!DOCTYPE html>
<html><head><meta charset="utf-8"></meta><title>Annonated Algorithm Visualization</title><link rel="stylesheet" href="pylit.css?v=1"></link><link rel="stylesheet" href="solarized.css"></link><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css" integrity="sha384-Juol1FqnotbkyZUT5Z7gUPjQ9gzlwCENvUZTpQBAPxtusdwFLRy382PSDx5UUJ4/" crossorigin="anonymous"></link><script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.js" integrity="sha384-97gW6UIJxnlKemYavrqDHSX3SiygeOwIZhwyOKRfSaf0JWKRVj9hLASHgFTzT+0O" crossorigin="anonymous"></script><script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/contrib/auto-render.min.js" integrity="sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05" crossorigin="anonymous" onload="renderMathInElement(document.body);" defer="True"></script><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/lib/codemirror.min.css"></link><script src="https://cdn.jsdelivr.net/npm/[email protected]/lib/codemirror.min.js"></script><script src="https://cdn.jsdelivr.net/npm/[email protected]/mode/python/python.min.js"></script></head><body><div class="section" id="section0"><div class="docs doc-strings"><p><a href="index.html"><b>HOME</b></a></p><a href="https://github.com/opendilab/PPOxFamily" target="_blank"><img alt="GitHub" style="max-width:100%;" src="https://img.shields.io/github/stars/opendilab/PPOxFamily?style=social"></img></a> <a href="https://space.bilibili.com/1112854351?spm_id_from=333.337.0.0" target="_blank"><img alt="bilibili" style="max-width:100%;" src="https://img.shields.io/badge/bilibili-video%20course-blue"></img></a> <a href="https://twitter.com/OpenDILab" rel="nofollow" target="_blank"><img alt="twitter" style="max-width:100%;" src="https://img.shields.io/twitter/follow/opendilab?style=social"></img></a><br><a href="https://github.com/opendilab/PPOxFamily" target="_blank">View code on GitHub</a></div></div><div class="section" id="section1"><div class="docs doc-strings"><h1><a href="https://github.com/opendilab/PPOxFamily">PPO × Family PyTorch 注解文档</a></h1><img alt="logo" src="./imgs/ppof_logo.png"></img><p>作为 PPO × Family 决策智能入门公开课的“算法-代码”注解文档,力求发掘 PPO 算法的每一个细节,帮助读者快速掌握设计决策人工智能的万能钥匙。</p></div></div><div class="section" id="section1"><div class="docs doc-strings"><h2>各章节代码解读示例目录</h2><h4>开启决策 AI 探索之旅</h4><li><a href="./pg_zh.html">策略梯度(PG)算法核心代码</a> | <a href="./pg.html">Policy Gradient core loss function</a></li><li><a href="./a2c_zh.html">A2C 算法核心代码</a> | <a href="./a2c.html">A2C core loss function</a></li><li><a href="./ppo_zh.html">PPO 算法核心代码</a> | <a href="./ppo.html">PPO core loss function</a></li><br><h4>解构复杂动作空间</h4><li><a href="./discrete_zh.html">PPO 建模离散动作空间</a> | <a href="./discrete.html">PPO in discrete action space</a></li><li><a href="./continuous_zh.html">PPO 建模连续动作空间</a> | <a href="./continuous.html">PPO in continuous action space</a></li><li><a href="./hybrid_zh.html">PPO 建模混合动作空间</a> | <a href="./hybrid.html">PPO in hybrid action space</a></li><br><h4>表征多模态观察空间</h4><li><a href="./encoding_zh.html">特征编码的各种技巧</a> | <a href="./encoding.html">Encoding methods for vector obs space</a></li><li><a href="./mario_wrapper_zh.html">图片动作空间的各类环境包装器</a> | <a href="./mario_wrapper.html">Env wrappers for image obs space</a></li><li><a href="./gradient_zh.html">神经网络梯度计算的代码解析</a> | <a href="./gradient.html">Automatic gradient mechanism</a></li><br><h4>解密稀疏奖励空间</h4><li><a href="./popart.html">Pop-Art normalization trick used in PPO</a></li><li><a href="./value_rescale.html">Value rescale trick used in PPO</a></li><br><h4>探索时序建模</h4><li><a href="./lstm.html">PPO + LSTM</a></li><li><a href="./gtrxl.html">PPO + Gated Transformer-XL</a></li><br><h4>统筹多智能体</h4><li><a href="./marl_network_zh.html">多智能体协作经典的神经网络架构</a> | <a href="./marl_network.html">Multi-Agent cooperation network</a></li><li><a href="./independentpg_zh.html">多智能体独立决策的策略梯度训练流程</a> | <a href="./independentpg.html">Independent policy gradient training</a></li><li><a href="./mapg_zh.html">多智能体协作决策的策略梯度训练流程</a> | <a href="./mapg.html">Multi-Agent policy gradient training</a></li><li><a href="./mappo_zh.html">多智能体协作决策的 PPO 算法训练流程</a> | <a href="./mappo.html">Multi-Agent PPO training</a></li><br><h4>挖掘黑科技</h4><li><a href="./gae.html">GAE technique used in PPO</a></li><li><a href="./recompute.html">Recompute adv trick used in PPO</a></li><li><a href="./grad_clip_norm_zh.html">PPO 中使用的梯度范数裁剪</a> | <a href="./grad_clip_norm.html">Gradient norm clip trick used in PPO</a></li><li><a href="./grad_clip_value_zh.html">PPO 中使用的梯度数值裁剪</a> | <a href="./grad_clip_value.html">Gradient value clip trick used in PPO</a></li><li><a href="./grad_ignore.html">Gradient ignore trick used in PPO</a></li><li><a href="./orthogonal_init.html">Orthogonal initialization of networks used in PPO</a></li><li><a href="./dual_clip.html">Dual clip trick used in PPO</a></li><li><a href="./value_clip.html">Value clip trick used in PPO</a></li></div></div><div class="section" id="section-final"><div class="docs doc-strings"><p><i>如果读者关于本文档有任何问题和建议,可以在 GitHub 提 issue 或是直接发邮件给我们 ([email protected]) 。</i></p></div></div></body><script type="text/javascript">
window.onload = function(){
var codeElement = document.getElementsByName('py_code');
var lineCount = 1;
for (var i = 0; i < codeElement.length; i++) {
var code = codeElement[i].innerText;
if (code.length <= 1) {
continue;
}

codeElement[i].innerHTML = "";

var codeMirror = CodeMirror(
codeElement[i],
{
value: code,
mode: "python",
theme: "solarized dark",
lineNumbers: true,
firstLineNumber: lineCount,
readOnly: true,
lineWrapping: true,
}
);
var noNewLineCode = code.replace(/[\r\n]/g, "");
lineCount += code.length - noNewLineCode.length + 1;
}
};
</script></html>
Loading

0 comments on commit da51b63

Please sign in to comment.