Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
alexchen4ai committed Nov 12, 2024
1 parent 409c06c commit 702dfe4
Show file tree
Hide file tree
Showing 6 changed files with 14 additions and 14 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
e177b579
11a1aa0f
4 changes: 2 additions & 2 deletions notes.html
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ <h1 class="title">Research notes</h1>

<div class="quarto-listing quarto-listing-container-default" id="listing-listing">
<div class="list quarto-listing-default">
<div class="quarto-post image-right" data-index="0" data-categories="Large Language Models" data-listing-date-sort="1713596400000" data-listing-file-modified-sort="1731398578483" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="4" data-listing-word-count-sort="734">
<div class="quarto-post image-right" data-index="0" data-categories="Large Language Models" data-listing-date-sort="1713596400000" data-listing-file-modified-sort="1731433677233" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="5" data-listing-word-count-sort="810">
<div class="thumbnail">
<p><a href="./notes/Large Language Model/inference_optimize.html" class="no-external"></a></p><a href="./notes/Large Language Model/inference_optimize.html" class="no-external">
<div class="listing-item-img-placeholder card-img-top" >&nbsp;</div>
Expand All @@ -245,7 +245,7 @@ <h3 class="no-anchor listing-title">
<div class="metadata">
<p><a href="./notes/Large Language Model/inference_optimize.html" class="no-external"></a></p><a href="./notes/Large Language Model/inference_optimize.html" class="no-external">
<div class="listing-reading-time">
4 min
5 min
</div>
</a>
</div>
Expand Down
8 changes: 4 additions & 4 deletions notes.xml
Original file line number Diff line number Diff line change
Expand Up @@ -57,14 +57,14 @@ Tip
</div>
</div>
<div class="callout-body-container callout-body">
<p>Quantization is a model compression technique that converts the weights and activations within an LLM from a high-precision data representation to a lower-precision data representation, i.e., from a data type that can hold more information to one that holds less. A typical example of this is the conversion of data from a 32-bit floating-point number (FP32) to an 8-bit or 4-bit integer (INT4 or INT8). A good blog from internet is <a href="https://symbl.ai/developers/blog/a-guide-to-quantization-in-llms/">here</a>.</p>
<p>Quantization is a model compression technique that converts the weights and activations within an LLM from a high-precision data representation to a lower-precision data representation, i.e., from a data type that can hold more information to one that holds less. A typical example of this is the conversion of data from a <code>32-bit</code> floating-point number (<code>FP32</code>) to an <code>8-bit</code> or <code>4-bit</code> integer (<code>INT4</code> or <code>INT8</code>). A good blog from internet is <a href="https://symbl.ai/developers/blog/a-guide-to-quantization-in-llms/">here</a>. We note that the conversion will decrease the memory and disk usage considerably. We note that for real calculation, we still need to <strong>dequantize</strong> the data to the original data type like <code>float32</code> or <code>bfloat16</code>. The trick is that we only dequantize the data when we need to calculate the data while keeping the most of data in the quantized format. Therefore, we still save the memory and disk usage.</p>
</div>
</div>
<p>Let’s first revisit the representation of data in computer. We mainly study the <code>float32</code>, <code>float16</code> and <code>bfloat16</code> type.</p>
<ul>
<li><strong>float32</strong>: 32 bits. We have 1 bit for the sign, 8 bits for the exponent and 23 bits for the mantissa. To form a float number in computer, we need the sign, the number before the exponent and the exponent number over 2. For example, we have <img src="https://latex.codecogs.com/png.latex?6.75=+1.1011%5Ctimes%202%5E2">. Thus, we can conclude that the range of the representation is between <img src="https://latex.codecogs.com/png.latex?1e%5E%7B-38%7D"> and <img src="https://latex.codecogs.com/png.latex?3e%5E%7B38%7D"> (you can add sign freely, though).</li>
<li><strong>float16</strong>: 16 bits. We have 1 bit for the sign, 5 bits for the exponent and 10 bits for the mantissa. The range of the representation is between <img src="https://latex.codecogs.com/png.latex?6e%5E%7B-8%7D"> and <img src="https://latex.codecogs.com/png.latex?6e%5E%7B4%7D">.</li>
<li><strong>bfloat16</strong>: 16 bits. We have 1 bit for the sign, 8 bits for the exponent and 7 bits for the mantissa. The range of the representation is between <img src="https://latex.codecogs.com/png.latex?1e%5E%7B-38%7D"> and <img src="https://latex.codecogs.com/png.latex?3e%5E%7B38%7D">.</li>
<li><strong>float32</strong>: 32 bits. We have 1 bit for the sign, 8 bits for the exponent and 23 bits for the mantissa. To form a float number in computer, we need the sign, the number before the exponent and the exponent number over 2. For example, we have <img src="https://latex.codecogs.com/png.latex?6.75=+1.1011%5Ctimes%202%5E2">. Thus, we can conclude that the range of the representation is between <img src="https://latex.codecogs.com/png.latex?1%5Ctimes%2010%5E%7B-38%7D"> and <img src="https://latex.codecogs.com/png.latex?3%5Ctimes%2010%5E%7B38%7D"> (you can add sign freely, though).</li>
<li><strong>float16</strong>: 16 bits. We have 1 bit for the sign, 5 bits for the exponent and 10 bits for the mantissa. The range of the representation is between <img src="https://latex.codecogs.com/png.latex?6%5Ctimes%2010%5E%7B-8%7D"> and <img src="https://latex.codecogs.com/png.latex?6%5Ctimes%2010%5E%7B4%7D">.</li>
<li><strong>bfloat16</strong>: 16 bits. We have 1 bit for the sign, 8 bits for the exponent and 7 bits for the mantissa. The range of the representation is between <img src="https://latex.codecogs.com/png.latex?1%5Ctimes%2010%5E%7B-38%7D"> and <img src="https://latex.codecogs.com/png.latex?3%5Ctimes%2010%5E%7B38%7D">.</li>
</ul>
<p>We can see that <code>float16</code> and <code>bfloat16</code> take up the same memory space. But they are different in the bits allocation. The <code>float16</code> has better precision than <code>bfloat16</code>, but the <code>bfloat16</code> has better range than <code>float16</code>. For deep neural network, we may need to consider the use of the <code>bfloat16</code> type since the range is more important than the precision for the deep neural network. The common quantization type are <code>INT8</code> and <code>INT4</code>. Note that <code>INT8</code> and <code>INT4</code> can only represent the integer numbers, not for the float numbers. Thus, <code>INT8</code> can only represent the numbers between <img src="https://latex.codecogs.com/png.latex?-128"> and <img src="https://latex.codecogs.com/png.latex?127">, and <code>INT4</code> can only represent the numbers between <img src="https://latex.codecogs.com/png.latex?-8"> and <img src="https://latex.codecogs.com/png.latex?7">.</p>
<p>We use the <em>affine quantization scheme</em> to convert the model:</p>
Expand Down
8 changes: 4 additions & 4 deletions notes/Large Language Model/inference_optimize.html
Original file line number Diff line number Diff line change
Expand Up @@ -329,14 +329,14 @@ <h2 class="anchored" data-anchor-id="quantization">Quantization</h2>
</div>
</div>
<div class="callout-body-container callout-body">
<p>Quantization is a model compression technique that converts the weights and activations within an LLM from a high-precision data representation to a lower-precision data representation, i.e., from a data type that can hold more information to one that holds less. A typical example of this is the conversion of data from a 32-bit floating-point number (FP32) to an 8-bit or 4-bit integer (INT4 or INT8). A good blog from internet is <a href="https://symbl.ai/developers/blog/a-guide-to-quantization-in-llms/">here</a>.</p>
<p>Quantization is a model compression technique that converts the weights and activations within an LLM from a high-precision data representation to a lower-precision data representation, i.e., from a data type that can hold more information to one that holds less. A typical example of this is the conversion of data from a <code>32-bit</code> floating-point number (<code>FP32</code>) to an <code>8-bit</code> or <code>4-bit</code> integer (<code>INT4</code> or <code>INT8</code>). A good blog from internet is <a href="https://symbl.ai/developers/blog/a-guide-to-quantization-in-llms/">here</a>. We note that the conversion will decrease the memory and disk usage considerably. We note that for real calculation, we still need to <strong>dequantize</strong> the data to the original data type like <code>float32</code> or <code>bfloat16</code>. The trick is that we only dequantize the data when we need to calculate the data while keeping the most of data in the quantized format. Therefore, we still save the memory and disk usage.</p>
</div>
</div>
<p>Let’s first revisit the representation of data in computer. We mainly study the <code>float32</code>, <code>float16</code> and <code>bfloat16</code> type.</p>
<ul>
<li><strong>float32</strong>: 32 bits. We have 1 bit for the sign, 8 bits for the exponent and 23 bits for the mantissa. To form a float number in computer, we need the sign, the number before the exponent and the exponent number over 2. For example, we have <span class="math inline">\(6.75=+1.1011\times 2^2\)</span>. Thus, we can conclude that the range of the representation is between <span class="math inline">\(1e^{-38}\)</span> and <span class="math inline">\(3e^{38}\)</span> (you can add sign freely, though).</li>
<li><strong>float16</strong>: 16 bits. We have 1 bit for the sign, 5 bits for the exponent and 10 bits for the mantissa. The range of the representation is between <span class="math inline">\(6e^{-8}\)</span> and <span class="math inline">\(6e^{4}\)</span>.</li>
<li><strong>bfloat16</strong>: 16 bits. We have 1 bit for the sign, 8 bits for the exponent and 7 bits for the mantissa. The range of the representation is between <span class="math inline">\(1e^{-38}\)</span> and <span class="math inline">\(3e^{38}\)</span>.</li>
<li><strong>float32</strong>: 32 bits. We have 1 bit for the sign, 8 bits for the exponent and 23 bits for the mantissa. To form a float number in computer, we need the sign, the number before the exponent and the exponent number over 2. For example, we have <span class="math inline">\(6.75=+1.1011\times 2^2\)</span>. Thus, we can conclude that the range of the representation is between <span class="math inline">\(1\times 10^{-38}\)</span> and <span class="math inline">\(3\times 10^{38}\)</span> (you can add sign freely, though).</li>
<li><strong>float16</strong>: 16 bits. We have 1 bit for the sign, 5 bits for the exponent and 10 bits for the mantissa. The range of the representation is between <span class="math inline">\(6\times 10^{-8}\)</span> and <span class="math inline">\(6\times 10^{4}\)</span>.</li>
<li><strong>bfloat16</strong>: 16 bits. We have 1 bit for the sign, 8 bits for the exponent and 7 bits for the mantissa. The range of the representation is between <span class="math inline">\(1\times 10^{-38}\)</span> and <span class="math inline">\(3\times 10^{38}\)</span>.</li>
</ul>
<p>We can see that <code>float16</code> and <code>bfloat16</code> take up the same memory space. But they are different in the bits allocation. The <code>float16</code> has better precision than <code>bfloat16</code>, but the <code>bfloat16</code> has better range than <code>float16</code>. For deep neural network, we may need to consider the use of the <code>bfloat16</code> type since the range is more important than the precision for the deep neural network. The common quantization type are <code>INT8</code> and <code>INT4</code>. Note that <code>INT8</code> and <code>INT4</code> can only represent the integer numbers, not for the float numbers. Thus, <code>INT8</code> can only represent the numbers between <span class="math inline">\(-128\)</span> and <span class="math inline">\(127\)</span>, and <code>INT4</code> can only represent the numbers between <span class="math inline">\(-8\)</span> and <span class="math inline">\(7\)</span>.</p>
<p>We use the <em>affine quantization scheme</em> to convert the model:</p>
Expand Down
Loading

0 comments on commit 702dfe4

Please sign in to comment.