Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
alexchen4ai committed Mar 11, 2024
1 parent 36758ef commit 0e634e4
Show file tree
Hide file tree
Showing 6 changed files with 20 additions and 17 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
300494ba
a64115be
4 changes: 2 additions & 2 deletions notes.html
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@ <h3 class="no-anchor listing-title">
</a>
</div>
</div>
<div class="quarto-post image-right" data-index="1" data-categories="Math Theories" data-listing-date-sort="1708848000000" data-listing-file-modified-sort="1710136781487" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="4" data-listing-word-count-sort="669">
<div class="quarto-post image-right" data-index="1" data-categories="Math Theories" data-listing-date-sort="1708848000000" data-listing-file-modified-sort="1710141085495" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="5" data-listing-word-count-sort="849">
<div class="thumbnail">
<p><a href="./notes/Math Theories/complexanalysis.html" class="no-external"></a></p><a href="./notes/Math Theories/complexanalysis.html" class="no-external">
<div class="listing-item-img-placeholder card-img-top" >&nbsp;</div>
Expand All @@ -275,7 +275,7 @@ <h3 class="no-anchor listing-title">
<div class="metadata">
<p><a href="./notes/Math Theories/complexanalysis.html" class="no-external"></a></p><a href="./notes/Math Theories/complexanalysis.html" class="no-external">
<div class="listing-reading-time">
4 min
5 min
</div>
</a>
</div>
Expand Down
10 changes: 6 additions & 4 deletions notes.xml
Original file line number Diff line number Diff line change
Expand Up @@ -410,10 +410,12 @@ Tip
<p>Thus, using this information, we have</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0AR_q%5Cleft(%5Cboldsymbol%7Bx%7D_q,%20m%5Cright)%20R_k%5Cleft(%5Cboldsymbol%7Bx%7D_k,%20n%5Cright)%20&amp;%20=R_g%5Cleft(%5Cboldsymbol%7Bx%7D_q,%20%5Cboldsymbol%7Bx%7D_k,%20n-m%5Cright),%20%5C%5C%0A%5CTheta_k%5Cleft(%5Cboldsymbol%7Bx%7D_k,%20n%5Cright)-%5CTheta_q%5Cleft(%5Cboldsymbol%7Bx%7D_q,%20m%5Cright)%20&amp;%20=%5CTheta_g%5Cleft(%5Cboldsymbol%7Bx%7D_q,%20%5Cboldsymbol%7Bx%7D_k,%20n-m%5Cright),%0A%5Cend%7Baligned%7D%0A"></p>
<p>After derivation, we found that if we choose the following expression, we can satisfy the condition above:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0Af_q%5Cleft(%5Cboldsymbol%7Bx%7D_m,%20m%5Cright)%20&amp;%20=%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%20e%5E%7Bi%20m%20%5Ctheta%7D%20%5C%5C%0Af_k%5Cleft(%5Cboldsymbol%7Bx%7D_n,%20n%5Cright)%20&amp;%20=%5Cleft(%5Cboldsymbol%7BW%7D_k%20%5Cboldsymbol%7Bx%7D_n%5Cright)%20e%5E%7Bi%20n%20%5Ctheta%7D%20%5C%5C%0Ag%5Cleft(%5Cboldsymbol%7Bx%7D_m,%20%5Cboldsymbol%7Bx%7D_n,%20m-n%5Cright)%20&amp;%20=%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%5Cleft(%5Cboldsymbol%7BW%7D_k%20%5Cboldsymbol%7Bx%7D_n%5Cright)%5ET%20e%5E%7Bi(m-n)%20%5Ctheta%7D%0A%5Cend%7Baligned%7D%0A"></p>
<p>The derivation is as the following:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Clangle%20f_q,%20f_k%5Crangle%20&amp;=%20f_q%5E*%20f_k%20%5C%5C%0A%20%20%20%20%20%20&amp;=%20%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%5E*%20e%5E%7B-i%20m%20%5Ctheta%7D%20%5Cleft(%5Cboldsymbol%7BW%7D_k%20%5Cboldsymbol%7Bx%7D_n%5Cright)%20e%5E%7Bi%20n%20%5Ctheta%7D%20%5C%5C%0A%20%20%20%20%20%20&amp;=%20%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%5E*%20%5Cleft(%5Cboldsymbol%7BW%7D_k%20%5Cboldsymbol%7Bx%7D_n%5Cright)%20e%5E%7Bi(n-m)%20%5Ctheta%7D%20%5C%5C%0A%20%20%20%20%20%20&amp;=%20%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%5ET%20%5Cleft(%5Cboldsymbol%7BW%7D_k%20%5Cboldsymbol%7Bx%7D_n%5Cright)%20e%5E%7Bi(n-m)%20%5Ctheta%7D.%0A%5Cend%7Baligned%7D%0A"></p>
<p>From the expression of <img src="https://latex.codecogs.com/png.latex?%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%5ET%20e%5E%7B-i%20m%20%5Ctheta%7D">, we can design the rotary embedding setup in the llama2.</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0Af_q%5Cleft(%5Cboldsymbol%7Bx%7D_m,%20m%5Cright)%20&amp;%20=%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%20e%5E%7Bi%20m%20%5Ctheta%7D%20%5C%5C%0Af_k%5Cleft(%5Cboldsymbol%7Bx%7D_n,%20n%5Cright)%20&amp;%20=%5Cleft(%5Cboldsymbol%7BW%7D_k%20%5Cboldsymbol%7Bx%7D_n%5Cright)%20e%5E%7Bi%20n%20%5Ctheta%7D%20%5C%5C%0Ag%5Cleft(%5Cboldsymbol%7Bx%7D_m,%20%5Cboldsymbol%7Bx%7D_n,%20m-n%5Cright)%20&amp;%20=%5Coperatorname%7BRe%7D%5Cleft%5B%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%5Cleft(%5Cboldsymbol%7BW%7D_k%20%5Cboldsymbol%7Bx%7D_n%5Cright)%5E*%20e%5E%7Bi(m-n)%20%5Ctheta%7D%5Cright%5D%0A%5Cend%7Baligned%7D%0A"></p>
<p>The derivation is as the following (<strong>This is not shown in the paper, you can use the derivation below to understand the paper better</strong>):</p>
<p>Note that if we express two vectors as <img src="https://latex.codecogs.com/png.latex?z_1%20=%20a%20+%20bi"> and <img src="https://latex.codecogs.com/png.latex?z_2%20=%20c%20+%20di">, the inner product is <img src="https://latex.codecogs.com/png.latex?ac%20+%20bd">. How would this be related to the multiplication of the two vectors? We actually have: <img src="https://latex.codecogs.com/png.latex?ac-bd%20=%20%5Coperatorname%7BRe%7D(z_1%20*%20%5Coverline%7Bz_2%7D)">.</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Clangle%20f_q,%20f_k%5Crangle%20&amp;=%20%5Coperatorname%7BRe%7D(f_q%20*%20%5Coverline%7Bf_k%7D)%20%5C%5C%0A%20%20%20%20%20%20&amp;=%20%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%20e%5E%7Bi%20m%20%5Ctheta%7D%20%5Coverline%7B%5Cleft(%5Cboldsymbol%7BW%7D_k%20%5Cboldsymbol%7Bx%7D_n%5Cright)%20e%5E%7Bi%20n%20%5Ctheta%7D%7D%20%5C%5C%0A%20%20%20%20%20%20&amp;=%20%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%20%5Coverline%7B%5Cleft(%5Cboldsymbol%7BW%7D_k%20%5Cboldsymbol%7Bx%7D_n%5Cright)%7D%20e%5E%7Bi(n-m)%20%5Ctheta%7D%0A%5Cend%7Baligned%7D%0A"></p>
<p>From the expression of <img src="https://latex.codecogs.com/png.latex?f_q%5Cleft(%5Cboldsymbol%7Bx%7D_m,%20m%5Cright)%20=%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%20e%5E%7Bi%20m%20%5Ctheta%7D">, we can design the rotary embedding setup in the llama2. It is important to note here that we introduce the complex number since we wish to integrate the meaning of the magnitude and the angle. The embedding itself represents the magnitude, and the angle is from the position. In real matrix multiplication, we can only do the real calculation, thus, we need the mapping above.</p>
<p>We can put it another way. The real and imaginary part of the complex number are useful information for us. We can express the vectors using complex theory, so that we can incorporate the angle or phase information from the vectors. Finally, we still need to map back to the real operations and proceed it with useful information. It is like a auxiliary method to help process information using some intermediate state.</p>
</section>
Expand Down
15 changes: 8 additions & 7 deletions notes/Math Theories/complexanalysis.html
Original file line number Diff line number Diff line change
Expand Up @@ -360,19 +360,20 @@ <h2 class="anchored" data-anchor-id="consider-the-rotary-embedding-using-complex
\begin{aligned}
f_q\left(\boldsymbol{x}_m, m\right) &amp; =\left(\boldsymbol{W}_q \boldsymbol{x}_m\right) e^{i m \theta} \\
f_k\left(\boldsymbol{x}_n, n\right) &amp; =\left(\boldsymbol{W}_k \boldsymbol{x}_n\right) e^{i n \theta} \\
g\left(\boldsymbol{x}_m, \boldsymbol{x}_n, m-n\right) &amp; =\left(\boldsymbol{W}_q \boldsymbol{x}_m\right)\left(\boldsymbol{W}_k \boldsymbol{x}_n\right)^T e^{i(m-n) \theta}
g\left(\boldsymbol{x}_m, \boldsymbol{x}_n, m-n\right) &amp; =\operatorname{Re}\left[\left(\boldsymbol{W}_q \boldsymbol{x}_m\right)\left(\boldsymbol{W}_k \boldsymbol{x}_n\right)^* e^{i(m-n) \theta}\right]
\end{aligned}
\]</span></p>
<p>The derivation is as the following:</p>
<p>The derivation is as the following (<strong>This is not shown in the paper, you can use the derivation below to understand the paper better</strong>):</p>
<p>Note that if we express two vectors as <span class="math inline">\(z_1 = a + bi\)</span> and <span class="math inline">\(z_2 = c + di\)</span>, the inner product is <span class="math inline">\(ac + bd\)</span>. How would this be related to the multiplication of the two vectors? We actually have: <span class="math inline">\(ac-bd = \operatorname{Re}(z_1 * \overline{z_2})\)</span>.</p>
<p><span class="math display">\[
\begin{aligned}
\langle f_q, f_k\rangle &amp;= f_q^* f_k \\
&amp;= \left(\boldsymbol{W}_q \boldsymbol{x}_m\right)^* e^{-i m \theta} \left(\boldsymbol{W}_k \boldsymbol{x}_n\right) e^{i n \theta} \\
&amp;= \left(\boldsymbol{W}_q \boldsymbol{x}_m\right)^* \left(\boldsymbol{W}_k \boldsymbol{x}_n\right) e^{i(n-m) \theta} \\
&amp;= \left(\boldsymbol{W}_q \boldsymbol{x}_m\right)^T \left(\boldsymbol{W}_k \boldsymbol{x}_n\right) e^{i(n-m) \theta}.
\langle f_q, f_k\rangle &amp;= \operatorname{Re}(f_q * \overline{f_k}) \\
&amp;= \left(\boldsymbol{W}_q \boldsymbol{x}_m\right) e^{i m \theta} \overline{\left(\boldsymbol{W}_k \boldsymbol{x}_n\right) e^{i n \theta}} \\
&amp;= \left(\boldsymbol{W}_q \boldsymbol{x}_m\right) \overline{\left(\boldsymbol{W}_k \boldsymbol{x}_n\right)} e^{i(n-m) \theta}
\end{aligned}
\]</span></p>
<p>From the expression of <span class="math inline">\(\left(\boldsymbol{W}_q \boldsymbol{x}_m\right)^T e^{-i m \theta}\)</span>, we can design the rotary embedding setup in the llama2.</p>
<p>From the expression of <span class="math inline">\(f_q\left(\boldsymbol{x}_m, m\right) =\left(\boldsymbol{W}_q \boldsymbol{x}_m\right) e^{i m \theta}\)</span>, we can design the rotary embedding setup in the llama2. It is important to note here that we introduce the complex number since we wish to integrate the meaning of the magnitude and the angle. The embedding itself represents the magnitude, and the angle is from the position. In real matrix multiplication, we can only do the real calculation, thus, we need the mapping above.</p>
<p>We can put it another way. The real and imaginary part of the complex number are useful information for us. We can express the vectors using complex theory, so that we can incorporate the angle or phase information from the vectors. Finally, we still need to map back to the real operations and proceed it with useful information. It is like a auxiliary method to help process information using some intermediate state.</p>


</section>
Expand Down
Loading

0 comments on commit 0e634e4

Please sign in to comment.