Skip to content

Commit

Permalink
Site updated: 2024-02-22 23:20:36
Browse files Browse the repository at this point in the history
  • Loading branch information
lihaibineric committed Feb 22, 2024
1 parent 3d05a20 commit dce1792
Show file tree
Hide file tree
Showing 13 changed files with 156 additions and 24 deletions.
4 changes: 2 additions & 2 deletions 2024/01/23/develop_go_kit/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -494,9 +494,9 @@ <h2 id="实际运行结果">实际运行结果</h2>
<article class="post-prev col-6">


<a href="/2024/01/30/dl_summary/" title="【深度学习】知识汇总">
<a href="/2024/01/30/dl_summary/" title="【深度学习】DeepL知识汇总">
<i class="iconfont icon-arrowleft"></i>
<span class="hidden-mobile">【深度学习】知识汇总</span>
<span class="hidden-mobile">【深度学习】DeepL知识汇总</span>
<span class="visible-mobile">Previous</span>
</a>

Expand Down
83 changes: 75 additions & 8 deletions 2024/01/30/dl_summary/index.html

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions 2024/02/02/develop_consul/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -597,8 +597,8 @@ <h1 id="consul持久性">Consul持久性</h1>
<article class="post-next col-6">


<a href="/2024/01/30/dl_summary/" title="【深度学习】知识汇总">
<span class="hidden-mobile">【深度学习】知识汇总</span>
<a href="/2024/01/30/dl_summary/" title="【深度学习】DeepL知识汇总">
<span class="hidden-mobile">【深度学习】DeepL知识汇总</span>
<span class="visible-mobile">Next</span>
<i class="iconfont icon-arrowright"></i>
</a>
Expand Down
2 changes: 1 addition & 1 deletion archives/2024/01/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@

<a href="/2024/01/30/dl_summary/" class="list-group-item list-group-item-action">
<time>01-30</time>
<div class="list-group-item-title">【深度学习】知识汇总</div>
<div class="list-group-item-title">【深度学习】DeepL知识汇总</div>
</a>


Expand Down
2 changes: 1 addition & 1 deletion archives/2024/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@

<a href="/2024/01/30/dl_summary/" class="list-group-item list-group-item-action">
<time>01-30</time>
<div class="list-group-item-title">【深度学习】知识汇总</div>
<div class="list-group-item-title">【深度学习】DeepL知识汇总</div>
</a>


Expand Down
2 changes: 1 addition & 1 deletion archives/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@

<a href="/2024/01/30/dl_summary/" class="list-group-item list-group-item-action">
<time>01-30</time>
<div class="list-group-item-title">【深度学习】知识汇总</div>
<div class="list-group-item-title">【深度学习】DeepL知识汇总</div>
</a>


Expand Down
4 changes: 2 additions & 2 deletions categories/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -246,10 +246,10 @@



<a href="/2024/01/30/dl_summary/" title="【深度学习】知识汇总"
<a href="/2024/01/30/dl_summary/" title="【深度学习】DeepL知识汇总"
class="list-group-item list-group-item-action
">
<span class="category-post">【深度学习】知识汇总</span>
<span class="category-post">【深度学习】DeepL知识汇总</span>
</a>


Expand Down
2 changes: 1 addition & 1 deletion categories/深度学习/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,7 @@

<a href="/2024/01/30/dl_summary/" class="list-group-item list-group-item-action">
<time>01-30</time>
<div class="list-group-item-title">【深度学习】知识汇总</div>
<div class="list-group-item-title">【深度学习】DeepL知识汇总</div>
</a>


Expand Down
2 changes: 1 addition & 1 deletion index.html
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,7 @@ <h2 class="index-header">
<h2 class="index-header">

<a href="/2024/01/30/dl_summary/" target="_self">
【深度学习】知识汇总
【深度学习】DeepL知识汇总
</a>
</h2>

Expand Down
4 changes: 2 additions & 2 deletions local-search.xml

Large diffs are not rendered by default.

67 changes: 66 additions & 1 deletion search.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3604,7 +3604,7 @@ system,能够产生极高的精度;直接测量速度和距离,这进一
</tags>
</entry>
<entry>
<title>【深度学习】知识汇总</title>
<title>【深度学习】DeepL知识汇总</title>
<url>/2024/01/30/dl_summary/</url>
<content><![CDATA[<h1 id="深度学习知识汇总">深度学习知识汇总</h1>
<p>深度学习八股文,这里将会收集深度学习中的基本概念和常见的问题</p>
Expand All @@ -3624,6 +3624,10 @@ href="https://zhuanlan.zhihu.com/p/560482252">https://zhuanlan.zhihu.com/p/56048
href="https://link.zhihu.com/?target=https%3A//blog.csdn.net/xys430381_1/article/details/80680167">https://link.zhihu.com/?target=https%3A//blog.csdn.net/xys430381_1/article/details/80680167</a></p>
<p>优化器 <a
href="https://zhuanlan.zhihu.com/p/78622301">https://zhuanlan.zhihu.com/p/78622301</a></p>
<p>BN <a
href="https://zhuanlan.zhihu.com/p/93643523">https://zhuanlan.zhihu.com/p/93643523</a></p>
<p>神经网络权重初始化 <a
href="https://blog.csdn.net/kebu12345678/article/details/103084851">https://blog.csdn.net/kebu12345678/article/details/103084851</a></p>
<p><a
href="https://zhuanlan.zhihu.com/p/667048896">https://zhuanlan.zhihu.com/p/667048896</a></p>
<h3 id="逻辑回归和线性回归"><strong>逻辑回归和线性回归</strong></h3>
Expand Down Expand Up @@ -3662,6 +3666,67 @@ id="batch_size的大小对学习率的影响">Batch_size的大小对学习率的
残差网络的出现解决了构建深层神经网络时网络退化即梯度消失/爆炸的问题。残差结构主要设计有两个,快捷连接(shortcut
connection)和恒等映射(identity
mapping),快捷连接使得残差变得可能,而恒等映射使得网络变深,恒等映射主要有两个:跳跃连接和激活函数</p>
<p><strong>Adam与SGD的区别</strong></p>
<p>SGD缺点是其更新方向完全依赖于当前batch计算出的梯度,因而十分不稳定。</p>
<p>Adam的优点主要在于:</p>
<ul>
<li>考虑历史步中的梯度更新信息,能够降低梯度更新噪声。</li>
<li>此外经过偏差校正后,每一次迭代学习率都有个确定范围,使得参数比较平稳。</li>
</ul>
<p>但是Adam也有其自身问题:可能会对前期出现的特征过拟合,后期才出现的特征很难纠正前期的拟合效果。二者似乎都没法很好避免局部最优问题。</p>
<p><strong>softmax如何防止指数上溢</strong></p>
<p>在计算softmax函数时,指数上溢是一个常见的问题,特别是当输入的数值非常大时,指数函数的计算结果可能会溢出。为了解决这个问题,可以采取以下几种方法:</p>
<ol type="1">
<li><p><strong>数值稳定性技巧</strong>:为了避免指数函数的溢出,可以将输入的数值减去一个常数,使得输入相对较小,从而减少指数函数的值。通常,可以通过找到输入向量中的最大值,并将所有元素减去这个最大值来实现数值稳定性。</p>
<p><img src="https://gitee.com/lihaibineric/picgo/raw/master/pic/image-20240222173542613.png" alt="image-20240222173542613" style="zoom:67%;" /></p>
<p>这样做可以保持相对稳定,防止指数函数的溢出。</p></li>
<li><p><strong>利用性质</strong>:softmax函数的分子和分母同时除以一个相同的常数并不会改变函数的值。因此,我们可以在计算softmax时,将所有输入向量的值都减去向量中的最大值,然后进行softmax计算。</p></li>
</ol>
<p>以上两种方法都可以有效地避免指数上溢的问题,并保持softmax函数的数值稳定性。在实际应用中,通常会使用这些技巧来计算softmax函数,以确保模型的稳定性和数值精度。</p>
<p><strong>训练过程中发现loss快速增大应该从哪些方面考虑?</strong></p>
<ol type="1">
<li><ol type="1">
<li>学习率过大</li>
<li>训练样本中有坏数据</li>
</ol></li>
<li></li>
<li><p><strong>model.eval vs和torch.no_grad区别</strong></p></li>
<li><ul>
<li>model.eval():
依然计算梯度,但是不反传;dropout层保留概率为1;batchnorm层使用全局的mean和var</li>
<li>with torch.no_grad: 不计算梯度</li>
</ul></li>
<li></li>
<li></li>
<li><p><strong>Dropout和Batch norm能否一起使用?</strong></p></li>
<li><p>可以,但是只能将Dropout放在Batch
norm之后使用。因为Dropout训练时会改变输入X的方差,从而影响Batch
norm训练过程中统计的滑动方差值;而测试时没有Dropout,输入X的方差和训练时不一致,这就导致Batch
norm测试时期望的方差和训练时统计的有偏差。</p></li>
<li></li>
<li><p><strong>梯度消失和梯度爆炸</strong></p></li>
<li><p><strong>梯度消失的原因和解决办法</strong></p></li>
<li><p>(1)隐藏层的层数过多</p></li>
<li><p>反向传播求梯度时的链式求导法则,某部分梯度小于1,则多层连乘后出现梯度消失</p></li>
<li><p>(2)采用了不合适的激活函数</p></li>
<li><p>如sigmoid函数的最大梯度为1/4,这意味着隐藏层每一层的梯度均小于1(权值小于1时),出现梯度消失。</p></li>
<li><p>解决方法:1、relu激活函数,使导数衡为1 2、batch norm
3、残差结构</p></li>
<li><p><strong>梯度爆炸的原因和解决办法</strong></p></li>
<li><p>(1)隐藏层的层数过多,某部分梯度大于1,则多层连乘后,梯度呈指数增长,产生梯度爆炸。</p></li>
<li><p>(2)权重初始值太大,求导时会乘上权重</p></li>
<li><p>解决方法:1、梯度裁剪 2、权重L1/L2正则化 3、残差结构 4、batch
norm</p></li>
<li></li>
</ol>
<h3
id="pytorch实现自注意力和多头注意力">pytorch实现自注意力和多头注意力</h3>
<p>自注意力</p>
<figure class="highlight python"><table><tr><td class="code"><pre><code class="hljs python"><span class="hljs-keyword">from</span> math <span class="hljs-keyword">import</span> sqrt<br><span class="hljs-keyword">import</span> torch<br><span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn<br><br><span class="hljs-keyword">class</span> <span class="hljs-title class_">SelfAttention</span>(nn.Module):<br> <span class="hljs-keyword">def</span> <span class="hljs-title function_">__init__</span>(<span class="hljs-params">self, dim_in, dim_k, dim_v</span>):<br> <span class="hljs-built_in">super</span>(SelfAttention, self).__init__()<br> self.dim_in = dim_in<br> self.dim_k = dim_k<br> self.dim_v = dim_v<br> self.linear_q = nn.Linear(dim_in, dim_k, bias=<span class="hljs-literal">False</span>)<br> self.linear_k = nn.Linear(dim_in, dim_k, bias=<span class="hljs-literal">False</span>)<br> self.linear_v = nn.Linear(dim_in, dim_v, bias=<span class="hljs-literal">False</span>)<br> self._norm_fact = <span class="hljs-number">1</span>/sqrt(dim_k)<br> <br> <br> <span class="hljs-keyword">def</span> <span class="hljs-title function_">forward</span>(<span class="hljs-params">self, x</span>):<br> batch, n, dim_in = x.shape<br> <span class="hljs-keyword">assert</span> dim_in == self.dim_in<br> <br> q = self.linear_q(x) <span class="hljs-comment">#batch, n, dim_k</span><br> k = self.linear_k(x)<br> v = self.linear_v(x)<br> <br> dist = torch.bmm(q, k.transpose(<span class="hljs-number">1</span>,<span class="hljs-number">2</span>))* self._norm_fact <span class="hljs-comment">#batch, n, n</span><br> dist = torch.softmax(dist, dim=-<span class="hljs-number">1</span>)<br> <br> att = torch.bmm(dist, v)<br> <span class="hljs-keyword">return</span> att<br> <br></code></pre></td></tr></table></figure>
<p>多头注意力机制</p>
<figure class="highlight python"><table><tr><td class="code"><pre><code class="hljs python"><span class="hljs-keyword">from</span> math <span class="hljs-keyword">import</span> sqrt<br><span class="hljs-keyword">import</span> torch<br><span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn<br><br><span class="hljs-keyword">class</span> <span class="hljs-title class_">MultiHeadAttention</span>(nn.Module):<br> <span class="hljs-comment">#dim_in input dimention</span><br> <span class="hljs-comment">#dim_k kq dimention</span><br> <span class="hljs-comment">#dim_v value dimention</span><br> <span class="hljs-comment">#num_heads number of heads</span><br> <br> <span class="hljs-keyword">def</span> <span class="hljs-title function_">__init__</span>(<span class="hljs-params">self, dim_in, dim_k, dim_v, num_heads=<span class="hljs-number">8</span></span>):<br> <span class="hljs-built_in">super</span>(MultiHeadAttention, self).__init__()<br> <span class="hljs-keyword">assert</span> dim_k% num_heads ==<span class="hljs-number">0</span> <span class="hljs-keyword">and</span> dim_v% num_heads ==<span class="hljs-number">0</span><br> <br> self.dim_in = dim_in<br> self.dim_k = dim_k<br> self.dim_v = dim_v<br> self.num_heads = num_heads<br> self.linear_q = nn.Linear(dim_in, dim_k, bias==<span class="hljs-literal">False</span>)<br> self.linear_k = nn.Linear(dim_in, dim_k, bias==<span class="hljs-literal">False</span>)<br> self.linear_v = nn.Linear(dim_in, dim_v, bias==<span class="hljs-literal">False</span>)<br> self._norm_fact = <span class="hljs-number">1</span>/sqrt(dim_k//num_heads)<br> <br> <span class="hljs-keyword">def</span> <span class="hljs-title function_">forwards</span>(<span class="hljs-params">self, x</span>):<br> <span class="hljs-comment"># x: tensor of shape(batch, n, dim_in)</span><br> batch, n, dim_in = x.shape<br> <span class="hljs-keyword">assert</span> dim_in = self.dim_in<br> <br> nh = self.num_heads<br> dk = self.dim_k // nh<br> dv = self.dim_v // nh<br> <br> q = self.linear_q(x).reshape(batch, n, nh, dk).transpose(<span class="hljs-number">1</span>, <span class="hljs-number">2</span>)<br> k = self.linear_k(x).reshape(batch, n, nh, dk).transpose(<span class="hljs-number">1</span>, <span class="hljs-number">2</span>)<br> v = self.linear_v(x).reshape(batch, n, nk, dk).transpose(<span class="hljs-number">1</span>, <span class="hljs-number">2</span>)<br> <br> dist = torch.matmul(q, k.transpose(<span class="hljs-number">2</span>,<span class="hljs-number">3</span>))*self._norm_fact<br> dist = torch.softmax(dist, dim=-<span class="hljs-number">1</span>)<br> <br> att = torch.matmul(dist, v)<br> att = att.transpose(<span class="hljs-number">1</span>,<span class="hljs-number">2</span>).reshape(batch, n, self.dim_v)<br></code></pre></td></tr></table></figure>
<h3 id="batch-normalization">Batch Normalization</h3>
<figure class="highlight python"><table><tr><td class="code"><pre><code class="hljs python"><span class="hljs-keyword">class</span> <span class="hljs-title class_">MyBN</span>:<br> <span class="hljs-keyword">def</span> <span class="hljs-title function_">__init__</span>(<span class="hljs-params">self, momentum=<span class="hljs-number">0.01</span>, eps=<span class="hljs-number">1e-5</span>, feat_dim=<span class="hljs-number">2</span></span>):<br> self._running_mean = np.zeros(shape = (feat_dim,))<br> self._running_var = np.ones(shape = (fear_dim,))<br> self._momentum = momentum<br> <span class="hljs-comment">#防止分母计算为0</span><br> self._eps = eps<br> <br> <span class="hljs-comment">#对应batch norm中需要更新beta 和 gamma, 采用pytorch文档中的初始化</span><br> self._beta = np.zeros(shape=(feat_dim,))<br> self._gamma = np.ones(shape=(feat_dim,))<br> <br> <br> <span class="hljs-keyword">def</span> <span class="hljs-title function_">batch_norm</span>(<span class="hljs-params">self, x</span>):<br> <span class="hljs-keyword">if</span> self.training:<br> x_mean = x.mean(axis=<span class="hljs-number">0</span>)<br> x_var = x.var(axis=<span class="hljs-number">0</span>)<br> <span class="hljs-comment">#对应running_mean的更新公式</span><br> self._running_mean = (<span class="hljs-number">1</span>-self._momentum)*x_mean +self._momentum*self._running_mean<br> self._running_var = (<span class="hljs-number">1</span>-self._momentum)*x_var + self._momentum*self._running_var<br> <span class="hljs-comment">#对应论文中计算BN公式</span><br> x_hat = (x-x_mean)/np.sqrt(x_var+self._eps)<br> <span class="hljs-keyword">else</span>:<br> x_hat = (x-self._running_mean)/np.sqrt(self._running_var+self._eps)<br> <span class="hljs-keyword">return</span> self._gamma*x_hat + self._beta<br></code></pre></td></tr></table></figure>
]]></content>
<categories>
<category>深度学习</category>
Expand Down
2 changes: 1 addition & 1 deletion tags/人工智能/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,7 @@

<a href="/2024/01/30/dl_summary/" class="list-group-item list-group-item-action">
<time>01-30</time>
<div class="list-group-item-title">【深度学习】知识汇总</div>
<div class="list-group-item-title">【深度学习】DeepL知识汇总</div>
</a>


Expand Down
2 changes: 1 addition & 1 deletion tags/深度学习/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,7 @@

<a href="/2024/01/30/dl_summary/" class="list-group-item list-group-item-action">
<time>01-30</time>
<div class="list-group-item-title">【深度学习】知识汇总</div>
<div class="list-group-item-title">【深度学习】DeepL知识汇总</div>
</a>


Expand Down

0 comments on commit dce1792

Please sign in to comment.