-
Notifications
You must be signed in to change notification settings - Fork 176
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
99 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
<!DOCTYPE html> | ||
<html><head><meta charset="utf-8"></meta><title>Annonated Algorithm Visualization</title><link rel="stylesheet" href="pylit.css?v=1"></link><link rel="stylesheet" href="solarized.css"></link><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css" integrity="sha384-Juol1FqnotbkyZUT5Z7gUPjQ9gzlwCENvUZTpQBAPxtusdwFLRy382PSDx5UUJ4/" crossorigin="anonymous"></link><script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.js" integrity="sha384-97gW6UIJxnlKemYavrqDHSX3SiygeOwIZhwyOKRfSaf0JWKRVj9hLASHgFTzT+0O" crossorigin="anonymous"></script><script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/contrib/auto-render.min.js" integrity="sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05" crossorigin="anonymous" onload="renderMathInElement(document.body);" defer="True"></script><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/lib/codemirror.min.css"></link><script src="https://cdn.jsdelivr.net/npm/[email protected]/lib/codemirror.min.js"></script><script src="https://cdn.jsdelivr.net/npm/[email protected]/mode/python/python.min.js"></script></head><body><div class="section" id="section0"><div class="docs doc-strings"><p><a href="index.html"><b>HOME</b></a></p><a href="https://github.com/opendilab/PPOxFamily" target="_blank"><img alt="GitHub" style="max-width:100%;" src="https://img.shields.io/github/stars/opendilab/PPOxFamily?style=social"></img></a> <a href="https://space.bilibili.com/1112854351?spm_id_from=333.337.0.0" target="_blank"><img alt="bilibili" style="max-width:100%;" src="https://img.shields.io/badge/bilibili-video%20course-blue"></img></a> <a href="https://twitter.com/OpenDILab" rel="nofollow" target="_blank"><img alt="twitter" style="max-width:100%;" src="https://img.shields.io/twitter/follow/opendilab?style=social"></img></a><br><a href="https://github.com/opendilab/PPOxFamily" target="_blank">View code on GitHub</a></div></div><div class="section" id="section1"><div class="docs doc-strings"><h1><a href="https://github.com/opendilab/PPOxFamily">PPO × Family PyTorch 注解文档</a></h1><img alt="logo" src="./imgs/ppof_logo.png"></img><p>作为 PPO × Family 决策智能入门公开课的“算法-代码”注解文档,力求发掘 PPO 算法的每一个细节,帮助读者快速掌握设计决策人工智能的万能钥匙。</p></div></div><div class="section" id="section1"><div class="docs doc-strings"><h2>各章节代码解读示例目录</h2><h4>开启决策 AI 探索之旅</h4><li><a href="./pg_zh.html">策略梯度(PG)算法核心代码</a> | <a href="./pg.html">Policy Gradient core loss function</a></li><li><a href="./a2c_zh.html">A2C 算法核心代码</a> | <a href="./a2c.html">A2C core loss function</a></li><li><a href="./ppo_zh.html">PPO 算法核心代码</a> | <a href="./ppo.html">PPO core loss function</a></li><br><h4>解构复杂动作空间</h4><li><a href="./discrete_zh.html">PPO 建模离散动作空间</a> | <a href="./discrete.html">PPO in discrete action space</a></li><li><a href="./continuous_zh.html">PPO 建模连续动作空间</a> | <a href="./continuous.html">PPO in continuous action space</a></li><li><a href="./hybrid_zh.html">PPO 建模混合动作空间</a> | <a href="./hybrid.html">PPO in hybrid action space</a></li><br><h4>表征多模态观察空间</h4><li><a href="./encoding_zh.html">特征编码的各种技巧</a> | <a href="./encoding.html">Encoding methods for vector obs space</a></li><li><a href="./mario_wrapper_zh.html">图片动作空间的各类环境包装器</a> | <a href="./mario_wrapper.html">Env wrappers for image obs space</a></li><li><a href="./gradient_zh.html">神经网络梯度计算的代码解析</a> | <a href="./gradient.html">Automatic gradient mechanism</a></li><br><h4>统筹多智能体</h4><li><a href="./marl_network.html">Multi-Agent cooperation network</a></li><li><a href="./independentpg.html">Independent policy gradient training</a></li><li><a href="./mapg.html">Multi-Agent policy gradient training</a></li><li><a href="./mappo.html">Multi-Agent PPO training</a></li><br><h4>挖掘黑科技</h4><li><a href="./gae.html">GAE technique used in PPO</a></li><li><a href="./recompute.html">Recompute adv trick used in PPO</a></li><li><a href="./grad_clip_norm_zh.html">PPO 中使用的梯度范数裁剪</a> | <a href="./grad_clip_norm.html">Gradient norm clip trick used in PPO</a></li><li><a href="./grad_clip_value.html">Gradient value clip trick used in PPO</a></li><li><a href="./grad_ignore.html">Gradient ignore trick used in PPO</a></li><li><a href="./orthogonal_init.html">Orthogonal initialization of networks used in PPO</a></li><li><a href="./dual_clip.html">Dual clip trick used in PPO</a></li><li><a href="./value_clip.html">Value clip trick used in PPO</a></li></div></div><div class="section" id="section-final"><div class="docs doc-strings"><p><i>如果读者关于本文档有任何问题和建议,可以在 GitHub 提 issue 或是直接发邮件给我们 ([email protected]) 。</i></p></div></div></body><script type="text/javascript"> | ||
<html><head><meta charset="utf-8"></meta><title>Annonated Algorithm Visualization</title><link rel="stylesheet" href="pylit.css?v=1"></link><link rel="stylesheet" href="solarized.css"></link><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css" integrity="sha384-Juol1FqnotbkyZUT5Z7gUPjQ9gzlwCENvUZTpQBAPxtusdwFLRy382PSDx5UUJ4/" crossorigin="anonymous"></link><script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.js" integrity="sha384-97gW6UIJxnlKemYavrqDHSX3SiygeOwIZhwyOKRfSaf0JWKRVj9hLASHgFTzT+0O" crossorigin="anonymous"></script><script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/contrib/auto-render.min.js" integrity="sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05" crossorigin="anonymous" onload="renderMathInElement(document.body);" defer="True"></script><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/lib/codemirror.min.css"></link><script src="https://cdn.jsdelivr.net/npm/[email protected]/lib/codemirror.min.js"></script><script src="https://cdn.jsdelivr.net/npm/[email protected]/mode/python/python.min.js"></script></head><body><div class="section" id="section0"><div class="docs doc-strings"><p><a href="index.html"><b>HOME</b></a></p><a href="https://github.com/opendilab/PPOxFamily" target="_blank"><img alt="GitHub" style="max-width:100%;" src="https://img.shields.io/github/stars/opendilab/PPOxFamily?style=social"></img></a> <a href="https://space.bilibili.com/1112854351?spm_id_from=333.337.0.0" target="_blank"><img alt="bilibili" style="max-width:100%;" src="https://img.shields.io/badge/bilibili-video%20course-blue"></img></a> <a href="https://twitter.com/OpenDILab" rel="nofollow" target="_blank"><img alt="twitter" style="max-width:100%;" src="https://img.shields.io/twitter/follow/opendilab?style=social"></img></a><br><a href="https://github.com/opendilab/PPOxFamily" target="_blank">View code on GitHub</a></div></div><div class="section" id="section1"><div class="docs doc-strings"><h1><a href="https://github.com/opendilab/PPOxFamily">PPO × Family PyTorch 注解文档</a></h1><img alt="logo" src="./imgs/ppof_logo.png"></img><p>作为 PPO × Family 决策智能入门公开课的“算法-代码”注解文档,力求发掘 PPO 算法的每一个细节,帮助读者快速掌握设计决策人工智能的万能钥匙。</p></div></div><div class="section" id="section1"><div class="docs doc-strings"><h2>各章节代码解读示例目录</h2><h4>开启决策 AI 探索之旅</h4><li><a href="./pg_zh.html">策略梯度(PG)算法核心代码</a> | <a href="./pg.html">Policy Gradient core loss function</a></li><li><a href="./a2c_zh.html">A2C 算法核心代码</a> | <a href="./a2c.html">A2C core loss function</a></li><li><a href="./ppo_zh.html">PPO 算法核心代码</a> | <a href="./ppo.html">PPO core loss function</a></li><br><h4>解构复杂动作空间</h4><li><a href="./discrete_zh.html">PPO 建模离散动作空间</a> | <a href="./discrete.html">PPO in discrete action space</a></li><li><a href="./continuous_zh.html">PPO 建模连续动作空间</a> | <a href="./continuous.html">PPO in continuous action space</a></li><li><a href="./hybrid_zh.html">PPO 建模混合动作空间</a> | <a href="./hybrid.html">PPO in hybrid action space</a></li><br><h4>表征多模态观察空间</h4><li><a href="./encoding_zh.html">特征编码的各种技巧</a> | <a href="./encoding.html">Encoding methods for vector obs space</a></li><li><a href="./mario_wrapper_zh.html">图片动作空间的各类环境包装器</a> | <a href="./mario_wrapper.html">Env wrappers for image obs space</a></li><li><a href="./gradient_zh.html">神经网络梯度计算的代码解析</a> | <a href="./gradient.html">Automatic gradient mechanism</a></li><br><h4>解密稀疏奖励空间</h4><li><a href="./popart.html">Pop-Art normalization trick used in PPO</a></li><br><h4>统筹多智能体</h4><li><a href="./marl_network.html">Multi-Agent cooperation network</a></li><li><a href="./independentpg.html">Independent policy gradient training</a></li><li><a href="./mapg.html">Multi-Agent policy gradient training</a></li><li><a href="./mappo.html">Multi-Agent PPO training</a></li><br><h4>挖掘黑科技</h4><li><a href="./gae.html">GAE technique used in PPO</a></li><li><a href="./recompute.html">Recompute adv trick used in PPO</a></li><li><a href="./grad_clip_norm_zh.html">PPO 中使用的梯度范数裁剪</a> | <a href="./grad_clip_norm.html">Gradient norm clip trick used in PPO</a></li><li><a href="./grad_clip_value.html">Gradient value clip trick used in PPO</a></li><li><a href="./grad_ignore.html">Gradient ignore trick used in PPO</a></li><li><a href="./orthogonal_init.html">Orthogonal initialization of networks used in PPO</a></li><li><a href="./dual_clip.html">Dual clip trick used in PPO</a></li><li><a href="./value_clip.html">Value clip trick used in PPO</a></li></div></div><div class="section" id="section-final"><div class="docs doc-strings"><p><i>如果读者关于本文档有任何问题和建议,可以在 GitHub 提 issue 或是直接发邮件给我们 ([email protected]) 。</i></p></div></div></body><script type="text/javascript"> | ||
window.onload = function(){ | ||
var codeElement = document.getElementsByName('py_code'); | ||
var lineCount = 1; | ||
|
Oops, something went wrong.