Built site for gh-pages

alexchen4ai · Feb 22, 2024 · e83e386 · e83e386
1 parent 0ad974b
commit e83e386
Show file tree

Hide file tree

Showing 6 changed files with 6 additions and 4 deletions.
diff --git a/.nojekyll b/.nojekyll
@@ -1 +1 @@
-878a7746
+8136becc
diff --git a/notes.html b/notes.html
@@ -224,7 +224,7 @@ <h1 class="title">Research notes</h1>
 
 <div class="quarto-listing quarto-listing-container-default" id="listing-listing">
 <div class="list quarto-listing-default">
-<div class="quarto-post image-right" data-index="0" data-categories="Large Language Models" data-listing-date-sort="1708502400000" data-listing-file-modified-sort="1708635048651" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="4" data-listing-word-count-sort="731">
+<div class="quarto-post image-right" data-index="0" data-categories="Large Language Models" data-listing-date-sort="1708502400000" data-listing-file-modified-sort="1708635208669" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="4" data-listing-word-count-sort="737">
 <div class="thumbnail">
 <p><a href="./notes/Large Language Model/moe.html" class="no-external"></a></p><a href="./notes/Large Language Model/moe.html" class="no-external">
 <p class="card-img-top"><img src="images/llama2.png"  class="thumbnail-image card-img"/></p>

diff --git a/notes.xml b/notes.xml
@@ -204,6 +204,7 @@ Tip
 <h2 class="anchored" data-anchor-id="load-balancing-loss">Load balancing loss</h2>
 <p>Since different portion of total tokens will enter different experts, like the unbalanced dataset problem, we need to add a load balancing loss. Given <img src="https://latex.codecogs.com/png.latex?N"> experts indexed by <img src="https://latex.codecogs.com/png.latex?i=1"> to <img src="https://latex.codecogs.com/png.latex?N"> and a batch <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BB%7D"> with <img src="https://latex.codecogs.com/png.latex?T"> tokens, the auxiliary loss is computed as the scaled dot-product between vectors <img src="https://latex.codecogs.com/png.latex?f"> and <img src="https://latex.codecogs.com/png.latex?P">, <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%20%7B%20loss%20%7D=%5Calpha%20%5Ccdot%20N%20%5Ccdot%20%5Csum_%7Bi=1%7D%5EN%20f_i%20%5Ccdot%20P_i%0A"> where <img src="https://latex.codecogs.com/png.latex?f_i"> is the fraction of tokens dispatched to expert <img src="https://latex.codecogs.com/png.latex?i">, <img src="https://latex.codecogs.com/png.latex?%0Af_i=%5Cfrac%7B1%7D%7BT%7D%20%5Csum_%7Bx%20%5Cin%20%5Cmathcal%7BB%7D%7D%20%5Cmathbb%7B1%7D%5C%7B%5Coperatorname%7Bargmax%7D%20p(x)=i%5C%7D%0A"> and <img src="https://latex.codecogs.com/png.latex?P_i"> is the fraction of the router probability allocated for expert <img src="https://latex.codecogs.com/png.latex?i,%7B%20%7D%5E2"> <img src="https://latex.codecogs.com/png.latex?%0AP_i=%5Cfrac%7B1%7D%7BT%7D%20%5Csum_%7Bx%20%5Cin%20%5Cmathcal%7BB%7D%7D%20p_i(x)%0A"></p>
 <p>We add this loss since we want to encourages uniform routing since the loss is minimized when <img src="https://latex.codecogs.com/png.latex?%0Af_i%20=%20P_i%20=%20%5Cfrac%7B1%7D%7BN%7D.%0A"></p>
+<p>You can prove it by <a href="https://en.wikipedia.org/wiki/Cauchy%E2%80%93Schwarz_inequality">Cauchy-Schwarz inequality</a>.</p>
 
 
 </section>

diff --git a/notes/Large Language Model/moe.html b/notes/Large Language Model/moe.html
@@ -409,6 +409,7 @@ <h2 class="anchored" data-anchor-id="load-balancing-loss">Load balancing loss</h
 <p>We add this loss since we want to encourages uniform routing since the loss is minimized when <span class="math display">\[
 f_i = P_i = \frac{1}{N}.
 \]</span></p>
+<p>You can prove it by <a href="https://en.wikipedia.org/wiki/Cauchy%E2%80%93Schwarz_inequality">Cauchy-Schwarz inequality</a>.</p>
 
 
 </section>

diff --git a/search.json b/search.json
@@ -47,7 +47,7 @@
     "href": "notes/Large Language Model/moe.html#load-balancing-loss",
     "title": "Mixture of expert",
     "section": "Load balancing loss",
-    "text": "Load balancing loss\nSince different portion of total tokens will enter different experts, like the unbalanced dataset problem, we need to add a load balancing loss. Given \\(N\\) experts indexed by \\(i=1\\) to \\(N\\) and a batch \\(\\mathcal{B}\\) with \\(T\\) tokens, the auxiliary loss is computed as the scaled dot-product between vectors \\(f\\) and \\(P\\), \\[\n\\text { loss }=\\alpha \\cdot N \\cdot \\sum_{i=1}^N f_i \\cdot P_i\n\\] where \\(f_i\\) is the fraction of tokens dispatched to expert \\(i\\), \\[\nf_i=\\frac{1}{T} \\sum_{x \\in \\mathcal{B}} \\mathbb{1}\\{\\operatorname{argmax} p(x)=i\\}\n\\] and \\(P_i\\) is the fraction of the router probability allocated for expert \\(i,{ }^2\\) \\[\nP_i=\\frac{1}{T} \\sum_{x \\in \\mathcal{B}} p_i(x)\n\\]\nWe add this loss since we want to encourages uniform routing since the loss is minimized when \\[\nf_i = P_i = \\frac{1}{N}.\n\\]",
+    "text": "Load balancing loss\nSince different portion of total tokens will enter different experts, like the unbalanced dataset problem, we need to add a load balancing loss. Given \\(N\\) experts indexed by \\(i=1\\) to \\(N\\) and a batch \\(\\mathcal{B}\\) with \\(T\\) tokens, the auxiliary loss is computed as the scaled dot-product between vectors \\(f\\) and \\(P\\), \\[\n\\text { loss }=\\alpha \\cdot N \\cdot \\sum_{i=1}^N f_i \\cdot P_i\n\\] where \\(f_i\\) is the fraction of tokens dispatched to expert \\(i\\), \\[\nf_i=\\frac{1}{T} \\sum_{x \\in \\mathcal{B}} \\mathbb{1}\\{\\operatorname{argmax} p(x)=i\\}\n\\] and \\(P_i\\) is the fraction of the router probability allocated for expert \\(i,{ }^2\\) \\[\nP_i=\\frac{1}{T} \\sum_{x \\in \\mathcal{B}} p_i(x)\n\\]\nWe add this loss since we want to encourages uniform routing since the loss is minimized when \\[\nf_i = P_i = \\frac{1}{N}.\n\\]\nYou can prove it by Cauchy-Schwarz inequality.",
     "crumbs": [
       "Home",
       "🗣️ **Large language models**",

diff --git a/sitemap.xml b/sitemap.xml
@@ -6,7 +6,7 @@
   </url>
   <url>
     <loc>https://alexchen4ai.github.io/blog/notes/Large Language Model/moe.html</loc>
-    <lastmod>2024-02-22T20:50:48.651Z</lastmod>
+    <lastmod>2024-02-22T20:53:28.669Z</lastmod>
   </url>
   <url>
     <loc>https://alexchen4ai.github.io/blog/about.html</loc>