Built site for gh-pages

alexchen4ai · Mar 11, 2024 · 0e634e4 · 0e634e4
1 parent 36758ef
commit 0e634e4
Show file tree

Hide file tree

Showing 6 changed files with 20 additions and 17 deletions.
diff --git a/.nojekyll b/.nojekyll
@@ -1 +1 @@
-300494ba
+a64115be
diff --git a/notes.html b/notes.html
@@ -250,7 +250,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="1" data-categories="Math Theories" data-listing-date-sort="1708848000000" data-listing-file-modified-sort="1710136781487" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="4" data-listing-word-count-sort="669">
+<div class="quarto-post image-right" data-index="1" data-categories="Math Theories" data-listing-date-sort="1708848000000" data-listing-file-modified-sort="1710141085495" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="5" data-listing-word-count-sort="849">
 <div class="thumbnail">
 <p><a href="./notes/Math Theories/complexanalysis.html" class="no-external"></a></p><a href="./notes/Math Theories/complexanalysis.html" class="no-external">
 <div class="listing-item-img-placeholder card-img-top" >&nbsp;</div>
@@ -275,7 +275,7 @@ <h3 class="no-anchor listing-title">
 <div class="metadata">
 <p><a href="./notes/Math Theories/complexanalysis.html" class="no-external"></a></p><a href="./notes/Math Theories/complexanalysis.html" class="no-external">
 <div class="listing-reading-time">
-4 min
+5 min
 </div>
 </a>
 </div>

diff --git a/notes.xml b/notes.xml
@@ -410,10 +410,12 @@ Tip
 <p>Thus, using this information, we have</p>
 <p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0AR_q%5Cleft(%5Cboldsymbol%7Bx%7D_q,%20m%5Cright)%20R_k%5Cleft(%5Cboldsymbol%7Bx%7D_k,%20n%5Cright)%20&amp;%20=R_g%5Cleft(%5Cboldsymbol%7Bx%7D_q,%20%5Cboldsymbol%7Bx%7D_k,%20n-m%5Cright),%20%5C%5C%0A%5CTheta_k%5Cleft(%5Cboldsymbol%7Bx%7D_k,%20n%5Cright)-%5CTheta_q%5Cleft(%5Cboldsymbol%7Bx%7D_q,%20m%5Cright)%20&amp;%20=%5CTheta_g%5Cleft(%5Cboldsymbol%7Bx%7D_q,%20%5Cboldsymbol%7Bx%7D_k,%20n-m%5Cright),%0A%5Cend%7Baligned%7D%0A"></p>
 <p>After derivation, we found that if we choose the following expression, we can satisfy the condition above:</p>
-<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0Af_q%5Cleft(%5Cboldsymbol%7Bx%7D_m,%20m%5Cright)%20&amp;%20=%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%20e%5E%7Bi%20m%20%5Ctheta%7D%20%5C%5C%0Af_k%5Cleft(%5Cboldsymbol%7Bx%7D_n,%20n%5Cright)%20&amp;%20=%5Cleft(%5Cboldsymbol%7BW%7D_k%20%5Cboldsymbol%7Bx%7D_n%5Cright)%20e%5E%7Bi%20n%20%5Ctheta%7D%20%5C%5C%0Ag%5Cleft(%5Cboldsymbol%7Bx%7D_m,%20%5Cboldsymbol%7Bx%7D_n,%20m-n%5Cright)%20&amp;%20=%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%5Cleft(%5Cboldsymbol%7BW%7D_k%20%5Cboldsymbol%7Bx%7D_n%5Cright)%5ET%20e%5E%7Bi(m-n)%20%5Ctheta%7D%0A%5Cend%7Baligned%7D%0A"></p>
-<p>The derivation is as the following:</p>
-<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Clangle%20f_q,%20f_k%5Crangle%20&amp;=%20f_q%5E*%20f_k%20%5C%5C%0A%20%20%20%20%20%20&amp;=%20%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%5E*%20e%5E%7B-i%20m%20%5Ctheta%7D%20%5Cleft(%5Cboldsymbol%7BW%7D_k%20%5Cboldsymbol%7Bx%7D_n%5Cright)%20e%5E%7Bi%20n%20%5Ctheta%7D%20%5C%5C%0A%20%20%20%20%20%20&amp;=%20%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%5E*%20%5Cleft(%5Cboldsymbol%7BW%7D_k%20%5Cboldsymbol%7Bx%7D_n%5Cright)%20e%5E%7Bi(n-m)%20%5Ctheta%7D%20%5C%5C%0A%20%20%20%20%20%20&amp;=%20%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%5ET%20%5Cleft(%5Cboldsymbol%7BW%7D_k%20%5Cboldsymbol%7Bx%7D_n%5Cright)%20e%5E%7Bi(n-m)%20%5Ctheta%7D.%0A%5Cend%7Baligned%7D%0A"></p>
-<p>From the expression of <img src="https://latex.codecogs.com/png.latex?%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%5ET%20e%5E%7B-i%20m%20%5Ctheta%7D">, we can design the rotary embedding setup in the llama2.</p>
+<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0Af_q%5Cleft(%5Cboldsymbol%7Bx%7D_m,%20m%5Cright)%20&amp;%20=%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%20e%5E%7Bi%20m%20%5Ctheta%7D%20%5C%5C%0Af_k%5Cleft(%5Cboldsymbol%7Bx%7D_n,%20n%5Cright)%20&amp;%20=%5Cleft(%5Cboldsymbol%7BW%7D_k%20%5Cboldsymbol%7Bx%7D_n%5Cright)%20e%5E%7Bi%20n%20%5Ctheta%7D%20%5C%5C%0Ag%5Cleft(%5Cboldsymbol%7Bx%7D_m,%20%5Cboldsymbol%7Bx%7D_n,%20m-n%5Cright)%20&amp;%20=%5Coperatorname%7BRe%7D%5Cleft%5B%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%5Cleft(%5Cboldsymbol%7BW%7D_k%20%5Cboldsymbol%7Bx%7D_n%5Cright)%5E*%20e%5E%7Bi(m-n)%20%5Ctheta%7D%5Cright%5D%0A%5Cend%7Baligned%7D%0A"></p>
+<p>The derivation is as the following (<strong>This is not shown in the paper, you can use the derivation below to understand the paper better</strong>):</p>
+<p>Note that if we express two vectors as <img src="https://latex.codecogs.com/png.latex?z_1%20=%20a%20+%20bi"> and <img src="https://latex.codecogs.com/png.latex?z_2%20=%20c%20+%20di">, the inner product is <img src="https://latex.codecogs.com/png.latex?ac%20+%20bd">. How would this be related to the multiplication of the two vectors? We actually have: <img src="https://latex.codecogs.com/png.latex?ac-bd%20=%20%5Coperatorname%7BRe%7D(z_1%20*%20%5Coverline%7Bz_2%7D)">.</p>
+<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Clangle%20f_q,%20f_k%5Crangle%20&amp;=%20%5Coperatorname%7BRe%7D(f_q%20*%20%5Coverline%7Bf_k%7D)%20%5C%5C%0A%20%20%20%20%20%20&amp;=%20%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%20e%5E%7Bi%20m%20%5Ctheta%7D%20%5Coverline%7B%5Cleft(%5Cboldsymbol%7BW%7D_k%20%5Cboldsymbol%7Bx%7D_n%5Cright)%20e%5E%7Bi%20n%20%5Ctheta%7D%7D%20%5C%5C%0A%20%20%20%20%20%20&amp;=%20%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%20%5Coverline%7B%5Cleft(%5Cboldsymbol%7BW%7D_k%20%5Cboldsymbol%7Bx%7D_n%5Cright)%7D%20e%5E%7Bi(n-m)%20%5Ctheta%7D%0A%5Cend%7Baligned%7D%0A"></p>
+<p>From the expression of <img src="https://latex.codecogs.com/png.latex?f_q%5Cleft(%5Cboldsymbol%7Bx%7D_m,%20m%5Cright)%20=%5Cleft(%5Cboldsymbol%7BW%7D_q%20%5Cboldsymbol%7Bx%7D_m%5Cright)%20e%5E%7Bi%20m%20%5Ctheta%7D">, we can design the rotary embedding setup in the llama2. It is important to note here that we introduce the complex number since we wish to integrate the meaning of the magnitude and the angle. The embedding itself represents the magnitude, and the angle is from the position. In real matrix multiplication, we can only do the real calculation, thus, we need the mapping above.</p>
+<p>We can put it another way. The real and imaginary part of the complex number are useful information for us. We can express the vectors using complex theory, so that we can incorporate the angle or phase information from the vectors. Finally, we still need to map back to the real operations and proceed it with useful information. It is like a auxiliary method to help process information using some intermediate state.</p>
 
 
 </section>

diff --git a/notes/Math Theories/complexanalysis.html b/notes/Math Theories/complexanalysis.html
@@ -360,19 +360,20 @@ <h2 class="anchored" data-anchor-id="consider-the-rotary-embedding-using-complex
 \begin{aligned}
 f_q\left(\boldsymbol{x}_m, m\right) &amp; =\left(\boldsymbol{W}_q \boldsymbol{x}_m\right) e^{i m \theta} \\
 f_k\left(\boldsymbol{x}_n, n\right) &amp; =\left(\boldsymbol{W}_k \boldsymbol{x}_n\right) e^{i n \theta} \\
-g\left(\boldsymbol{x}_m, \boldsymbol{x}_n, m-n\right) &amp; =\left(\boldsymbol{W}_q \boldsymbol{x}_m\right)\left(\boldsymbol{W}_k \boldsymbol{x}_n\right)^T e^{i(m-n) \theta}
+g\left(\boldsymbol{x}_m, \boldsymbol{x}_n, m-n\right) &amp; =\operatorname{Re}\left[\left(\boldsymbol{W}_q \boldsymbol{x}_m\right)\left(\boldsymbol{W}_k \boldsymbol{x}_n\right)^* e^{i(m-n) \theta}\right]
 \end{aligned}
 \]</span></p>
-<p>The derivation is as the following:</p>
+<p>The derivation is as the following (<strong>This is not shown in the paper, you can use the derivation below to understand the paper better</strong>):</p>
+<p>Note that if we express two vectors as <span class="math inline">\(z_1 = a + bi\)</span> and <span class="math inline">\(z_2 = c + di\)</span>, the inner product is <span class="math inline">\(ac + bd\)</span>. How would this be related to the multiplication of the two vectors? We actually have: <span class="math inline">\(ac-bd = \operatorname{Re}(z_1 * \overline{z_2})\)</span>.</p>
 <p><span class="math display">\[
 \begin{aligned}
-\langle f_q, f_k\rangle &amp;= f_q^* f_k \\
-      &amp;= \left(\boldsymbol{W}_q \boldsymbol{x}_m\right)^* e^{-i m \theta} \left(\boldsymbol{W}_k \boldsymbol{x}_n\right) e^{i n \theta} \\
-      &amp;= \left(\boldsymbol{W}_q \boldsymbol{x}_m\right)^* \left(\boldsymbol{W}_k \boldsymbol{x}_n\right) e^{i(n-m) \theta} \\
-      &amp;= \left(\boldsymbol{W}_q \boldsymbol{x}_m\right)^T \left(\boldsymbol{W}_k \boldsymbol{x}_n\right) e^{i(n-m) \theta}.
+\langle f_q, f_k\rangle &amp;= \operatorname{Re}(f_q * \overline{f_k}) \\
+      &amp;= \left(\boldsymbol{W}_q \boldsymbol{x}_m\right) e^{i m \theta} \overline{\left(\boldsymbol{W}_k \boldsymbol{x}_n\right) e^{i n \theta}} \\
+      &amp;= \left(\boldsymbol{W}_q \boldsymbol{x}_m\right) \overline{\left(\boldsymbol{W}_k \boldsymbol{x}_n\right)} e^{i(n-m) \theta}
 \end{aligned}
 \]</span></p>
-<p>From the expression of <span class="math inline">\(\left(\boldsymbol{W}_q \boldsymbol{x}_m\right)^T e^{-i m \theta}\)</span>, we can design the rotary embedding setup in the llama2.</p>
+<p>From the expression of <span class="math inline">\(f_q\left(\boldsymbol{x}_m, m\right) =\left(\boldsymbol{W}_q \boldsymbol{x}_m\right) e^{i m \theta}\)</span>, we can design the rotary embedding setup in the llama2. It is important to note here that we introduce the complex number since we wish to integrate the meaning of the magnitude and the angle. The embedding itself represents the magnitude, and the angle is from the position. In real matrix multiplication, we can only do the real calculation, thus, we need the mapping above.</p>
+<p>We can put it another way. The real and imaginary part of the complex number are useful information for us. We can express the vectors using complex theory, so that we can incorporate the angle or phase information from the vectors. Finally, we still need to map back to the real operations and proceed it with useful information. It is like a auxiliary method to help process information using some intermediate state.</p>
 
 
 </section>