-
Notifications
You must be signed in to change notification settings - Fork 176
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
doc(nyz): add ch4 value rescale doc and ch6/ch7 translation
- Loading branch information
Showing
4 changed files
with
184 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
<!DOCTYPE html> | ||
<html><head><meta charset="utf-8"></meta><title>Annonated Algorithm Visualization</title><link rel="stylesheet" href="pylit.css?v=1"></link><link rel="stylesheet" href="solarized.css"></link><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css" integrity="sha384-Juol1FqnotbkyZUT5Z7gUPjQ9gzlwCENvUZTpQBAPxtusdwFLRy382PSDx5UUJ4/" crossorigin="anonymous"></link><script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.js" integrity="sha384-97gW6UIJxnlKemYavrqDHSX3SiygeOwIZhwyOKRfSaf0JWKRVj9hLASHgFTzT+0O" crossorigin="anonymous"></script><script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/contrib/auto-render.min.js" integrity="sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05" crossorigin="anonymous" onload="renderMathInElement(document.body);" defer="True"></script><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/lib/codemirror.min.css"></link><script src="https://cdn.jsdelivr.net/npm/[email protected]/lib/codemirror.min.js"></script><script src="https://cdn.jsdelivr.net/npm/[email protected]/mode/python/python.min.js"></script></head><body><div class="section" id="section-0"><div class="docs doc-strings"><p><p><a href="index.html"><b>HOME<br></b></a></p></p><a href="https://github.com/opendilab/PPOxFamily" target="_blank"><img alt="GitHub" style="max-width:100%;" src="https://img.shields.io/github/stars/opendilab/PPOxFamily?style=social"></img></a> <a href="https://space.bilibili.com/1112854351?spm_id_from=333.337.0.0" target="_blank"><img alt="bilibili" style="max-width:100%;" src="https://img.shields.io/badge/bilibili-video%20course-blue"></img></a> <a href="https://twitter.com/OpenDILab" rel="nofollow" target="_blank"><img alt="twitter" style="max-width:100%;" src="https://img.shields.io/twitter/follow/opendilab?style=social"></img></a><br><a href="https://github.com/opendilab/PPOxFamily/tree/main/chapter7_tricks/grad_clip_value_zh.py" target="_blank">View code on GitHub</a><br><br>本文件是梯度裁剪模块 <span style="color:#00cbf694;font-family:Monaco,IBMPlexMono;">torch.nn.utils.grad_clip_value</span> 的 PyTorch 实现。</div></div><div class="section" id="section-1"><div class="docs doc-strings"><p> <b>概述</b><br> 梯度裁剪函数的实现,即 grad_clip_value <a href="https://pytorch.org/docs/stable/_modules/torch/nn/utils/clip_grad.html#clip_grad_value_">Related Link</a><br> 该函数在 loss 反向传播后使用,它会将网络参数的所有梯度剪裁 (clip) 到一个固定范围 [-clip_value, clip_value] 之间。<br> 注意这个函数是原地操作,修改梯度并没有任何返回值。</p></div><div class="code"><pre><code id="code_1" name="py_code">from typing import Union, Iterable | ||
import torch | ||
|
||
_tensor_or_tensors = Union[torch.Tensor, Iterable[torch.Tensor]] | ||
|
||
|
||
def grad_clip_value(parameters: _tensor_or_tensors, clip_value: float) -> None:</code></pre></div></div><div class="section" id="section-3"><div class="docs doc-strings"><p> 将可训练参数的非空梯度保存到列表中。</p></div><div class="code"><pre><code id="code_3" name="py_code"> if isinstance(parameters, torch.Tensor): | ||
parameters = [parameters] | ||
grads = [p.grad for p in parameters if p.grad is not None]</code></pre></div></div><div class="section" id="section-4"><div class="docs doc-strings"><p> 将原始 clip_value 转换为 float 类型。</p></div><div class="code"><pre><code id="code_4" name="py_code"> clip_value = float(clip_value)</code></pre></div></div><div class="section" id="section-5"><div class="docs doc-strings"><p> 将梯度原地剪裁到 [-clip_value, Clip_value]。</p></div><div class="code"><pre><code id="code_5" name="py_code"> for grad in grads: | ||
grad.data.clamp_(min=-clip_value, max=clip_value) | ||
|
||
</code></pre></div></div><div class="section" id="section-6"><div class="docs doc-strings"><p> <b>概述</b><br> 对于使用固定值做梯度裁剪的测试函数。</p></div><div class="code"><pre><code id="code_6" name="py_code">def test_grad_clip_value():</code></pre></div></div><div class="section" id="section-8"><div class="docs doc-strings"><p> 准备超参数, batch size=4, action=32</p></div><div class="code"><pre><code id="code_8" name="py_code"> B, N = 4, 32</code></pre></div></div><div class="section" id="section-9"><div class="docs doc-strings"><p> 设置 clip_value 为 1e-3</p></div><div class="code"><pre><code id="code_9" name="py_code"> clip_value = 1e-3</code></pre></div></div><div class="section" id="section-10"><div class="docs doc-strings"><p> 生成回归的 logit 值和标签,在实际应用中, logit 值是整个网络的输出,并需要梯度计算。</p></div><div class="code"><pre><code id="code_10" name="py_code"> logit = torch.randn(B, N).requires_grad_(True) | ||
label = torch.randn(B, N)</code></pre></div></div><div class="section" id="section-11"><div class="docs doc-strings"><p> 定义标准并计算 loss。</p></div><div class="code"><pre><code id="code_11" name="py_code"> criterion = torch.nn.MSELoss() | ||
output = criterion(logit, label)</code></pre></div></div><div class="section" id="section-12"><div class="docs doc-strings"><p> 进行 loss 的反向传播并计算梯度。</p></div><div class="code"><pre><code id="code_12" name="py_code"> output.backward()</code></pre></div></div><div class="section" id="section-13"><div class="docs doc-strings"><p> 使用固定值对梯度进行剪裁(clip)。</p></div><div class="code"><pre><code id="code_13" name="py_code"> grad_clip_value(logit, clip_value)</code></pre></div></div><div class="section" id="section-14"><div class="docs doc-strings"><p> 在剪裁后,断言(assert)剪裁后的梯度值是否合理。</p></div><div class="code"><pre><code id="code_14" name="py_code"> assert isinstance(logit.grad, torch.Tensor) | ||
for g in logit.grad: | ||
assert (g <= clip_value).all() | ||
assert (g >= -clip_value).all() | ||
|
||
</code></pre></div></div><div class="section" id="section-14"><div class="docs doc-strings"><p><i>如果读者关于本文档有任何问题和建议,可以在 GitHub 提 issue 或是直接发邮件给我们 ([email protected]) 。</i></p></div></div></body><script type="text/javascript"> | ||
window.onload = function(){ | ||
var codeElement = document.getElementsByName('py_code'); | ||
var lineCount = 1; | ||
for (var i = 0; i < codeElement.length; i++) { | ||
var code = codeElement[i].innerText; | ||
if (code.length <= 1) { | ||
continue; | ||
} | ||
|
||
codeElement[i].innerHTML = ""; | ||
|
||
var codeMirror = CodeMirror( | ||
codeElement[i], | ||
{ | ||
value: code, | ||
mode: "python", | ||
theme: "solarized dark", | ||
lineNumbers: true, | ||
firstLineNumber: lineCount, | ||
readOnly: false, | ||
lineWrapping: true, | ||
} | ||
); | ||
var noNewLineCode = code.replace(/[\r\n]/g, ""); | ||
lineCount += code.length - noNewLineCode.length + 1; | ||
} | ||
}; | ||
</script></html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
<!DOCTYPE html> | ||
<html><head><meta charset="utf-8"></meta><title>Annonated Algorithm Visualization</title><link rel="stylesheet" href="pylit.css?v=1"></link><link rel="stylesheet" href="solarized.css"></link><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css" integrity="sha384-Juol1FqnotbkyZUT5Z7gUPjQ9gzlwCENvUZTpQBAPxtusdwFLRy382PSDx5UUJ4/" crossorigin="anonymous"></link><script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.js" integrity="sha384-97gW6UIJxnlKemYavrqDHSX3SiygeOwIZhwyOKRfSaf0JWKRVj9hLASHgFTzT+0O" crossorigin="anonymous"></script><script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/contrib/auto-render.min.js" integrity="sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05" crossorigin="anonymous" onload="renderMathInElement(document.body);" defer="True"></script><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/lib/codemirror.min.css"></link><script src="https://cdn.jsdelivr.net/npm/[email protected]/lib/codemirror.min.js"></script><script src="https://cdn.jsdelivr.net/npm/[email protected]/mode/python/python.min.js"></script></head><body><div class="section" id="section0"><div class="docs doc-strings"><p><a href="index.html"><b>HOME</b></a></p><a href="https://github.com/opendilab/PPOxFamily" target="_blank"><img alt="GitHub" style="max-width:100%;" src="https://img.shields.io/github/stars/opendilab/PPOxFamily?style=social"></img></a> <a href="https://space.bilibili.com/1112854351?spm_id_from=333.337.0.0" target="_blank"><img alt="bilibili" style="max-width:100%;" src="https://img.shields.io/badge/bilibili-video%20course-blue"></img></a> <a href="https://twitter.com/OpenDILab" rel="nofollow" target="_blank"><img alt="twitter" style="max-width:100%;" src="https://img.shields.io/twitter/follow/opendilab?style=social"></img></a><br><a href="https://github.com/opendilab/PPOxFamily" target="_blank">View code on GitHub</a></div></div><div class="section" id="section1"><div class="docs doc-strings"><h1><a href="https://github.com/opendilab/PPOxFamily">PPO × Family PyTorch 注解文档</a></h1><img alt="logo" src="./imgs/ppof_logo.png"></img><p>作为 PPO × Family 决策智能入门公开课的“算法-代码”注解文档,力求发掘 PPO 算法的每一个细节,帮助读者快速掌握设计决策人工智能的万能钥匙。</p></div></div><div class="section" id="section1"><div class="docs doc-strings"><h2>各章节代码解读示例目录</h2><h4>开启决策 AI 探索之旅</h4><li><a href="./pg_zh.html">策略梯度(PG)算法核心代码</a> | <a href="./pg.html">Policy Gradient core loss function</a></li><li><a href="./a2c_zh.html">A2C 算法核心代码</a> | <a href="./a2c.html">A2C core loss function</a></li><li><a href="./ppo_zh.html">PPO 算法核心代码</a> | <a href="./ppo.html">PPO core loss function</a></li><br><h4>解构复杂动作空间</h4><li><a href="./discrete_zh.html">PPO 建模离散动作空间</a> | <a href="./discrete.html">PPO in discrete action space</a></li><li><a href="./continuous_zh.html">PPO 建模连续动作空间</a> | <a href="./continuous.html">PPO in continuous action space</a></li><li><a href="./hybrid_zh.html">PPO 建模混合动作空间</a> | <a href="./hybrid.html">PPO in hybrid action space</a></li><br><h4>表征多模态观察空间</h4><li><a href="./encoding_zh.html">特征编码的各种技巧</a> | <a href="./encoding.html">Encoding methods for vector obs space</a></li><li><a href="./mario_wrapper_zh.html">图片动作空间的各类环境包装器</a> | <a href="./mario_wrapper.html">Env wrappers for image obs space</a></li><li><a href="./gradient_zh.html">神经网络梯度计算的代码解析</a> | <a href="./gradient.html">Automatic gradient mechanism</a></li><br><h4>解密稀疏奖励空间</h4><li><a href="./popart.html">Pop-Art normalization trick used in PPO</a></li><li><a href="./value_rescale.html">Value rescale trick used in PPO</a></li><br><h4>探索时序建模</h4><li><a href="./lstm.html">PPO + LSTM</a></li><li><a href="./gtrxl.html">PPO + Gated Transformer-XL</a></li><br><h4>统筹多智能体</h4><li><a href="./marl_network_zh.html">多智能体协作经典的神经网络架构</a> | <a href="./marl_network.html">Multi-Agent cooperation network</a></li><li><a href="./independentpg_zh.html">多智能体独立决策的策略梯度训练流程</a> | <a href="./independentpg.html">Independent policy gradient training</a></li><li><a href="./mapg_zh.html">多智能体协作决策的策略梯度训练流程</a> | <a href="./mapg.html">Multi-Agent policy gradient training</a></li><li><a href="./mappo_zh.html">多智能体协作决策的 PPO 算法训练流程</a> | <a href="./mappo.html">Multi-Agent PPO training</a></li><br><h4>挖掘黑科技</h4><li><a href="./gae.html">GAE technique used in PPO</a></li><li><a href="./recompute.html">Recompute adv trick used in PPO</a></li><li><a href="./grad_clip_norm_zh.html">PPO 中使用的梯度范数裁剪</a> | <a href="./grad_clip_norm.html">Gradient norm clip trick used in PPO</a></li><li><a href="./grad_clip_value_zh.html">PPO 中使用的梯度数值裁剪</a> | <a href="./grad_clip_value.html">Gradient value clip trick used in PPO</a></li><li><a href="./grad_ignore.html">Gradient ignore trick used in PPO</a></li><li><a href="./orthogonal_init.html">Orthogonal initialization of networks used in PPO</a></li><li><a href="./dual_clip.html">Dual clip trick used in PPO</a></li><li><a href="./value_clip.html">Value clip trick used in PPO</a></li></div></div><div class="section" id="section-final"><div class="docs doc-strings"><p><i>如果读者关于本文档有任何问题和建议,可以在 GitHub 提 issue 或是直接发邮件给我们 ([email protected]) 。</i></p></div></div></body><script type="text/javascript"> | ||
window.onload = function(){ | ||
var codeElement = document.getElementsByName('py_code'); | ||
var lineCount = 1; | ||
for (var i = 0; i < codeElement.length; i++) { | ||
var code = codeElement[i].innerText; | ||
if (code.length <= 1) { | ||
continue; | ||
} | ||
|
||
codeElement[i].innerHTML = ""; | ||
|
||
var codeMirror = CodeMirror( | ||
codeElement[i], | ||
{ | ||
value: code, | ||
mode: "python", | ||
theme: "solarized dark", | ||
lineNumbers: true, | ||
firstLineNumber: lineCount, | ||
readOnly: true, | ||
lineWrapping: true, | ||
} | ||
); | ||
var noNewLineCode = code.replace(/[\r\n]/g, ""); | ||
lineCount += code.length - noNewLineCode.length + 1; | ||
} | ||
}; | ||
</script></html> |
Oops, something went wrong.