Skip to content

Commit

Permalink
update docs for llm feature (#1801)
Browse files Browse the repository at this point in the history
  • Loading branch information
jingxu10 authored Jul 19, 2023
1 parent 3b9d8f2 commit 3ed344a
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 4 deletions.
4 changes: 2 additions & 2 deletions llm/cpu/_sources/index.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ Single Instance Performance
## GPT-NEOX quantization
python run_gpt-neox_int8.py --ipex-weight-only-quantization --lambada --output-dir "saved_results" --jit --int8 -m <GPT-NEOX MODEL_ID>
## (2) Run int8 performance test
## (2) Run int8 performance test (note that GPT-NEOX uses --int8 instead of --int8-bf16-mixed)
OMP_NUM_THREADS=<physical cores num> numactl -m <node N> -C <cpu list> python run_<MODEL>_int8.py -m <MODEL_ID> --quantized-model-path "./saved_results/best_model.pt" --benchmark --jit --int8-bf16-mixed
Single Instance Accuracy
Expand All @@ -212,7 +212,7 @@ Single Instance Accuracy
.. code:: shell
# bfloat16
OMP_NUM_THREADS=<physical cores num> numactl -m <node N> -C <physical cores list> python run_generation.py --accuracy-only -m <MODEL_ID> --dtype bfloat16 --ipex --jit
OMP_NUM_THREADS=<physical cores num> numactl -m <node N> -C <physical cores list> python run_generation.py --accuracy-only -m <MODEL_ID> --dtype bfloat16 --ipex --jit --lambada
# Quantization as a performance part
# (1) Do quantization to get the quantized model as mentioned above
Expand Down
4 changes: 2 additions & 2 deletions llm/cpu/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -283,15 +283,15 @@ <h4>Single Instance Performance<a class="headerlink" href="#single-instance-perf
<span class="c1">## GPT-NEOX quantization</span>
python<span class="w"> </span>run_gpt-neox_int8.py<span class="w"> </span>--ipex-weight-only-quantization<span class="w"> </span>--lambada<span class="w"> </span>--output-dir<span class="w"> </span><span class="s2">&quot;saved_results&quot;</span><span class="w"> </span>--jit<span class="w"> </span>--int8<span class="w"> </span>-m<span class="w"> </span>&lt;GPT-NEOX<span class="w"> </span>MODEL_ID&gt;

<span class="c1">## (2) Run int8 performance test</span>
<span class="c1">## (2) Run int8 performance test (note that GPT-NEOX uses --int8 instead of --int8-bf16-mixed)</span>
<span class="nv">OMP_NUM_THREADS</span><span class="o">=</span>&lt;physical<span class="w"> </span>cores<span class="w"> </span>num&gt;<span class="w"> </span>numactl<span class="w"> </span>-m<span class="w"> </span>&lt;node<span class="w"> </span>N&gt;<span class="w"> </span>-C<span class="w"> </span>&lt;cpu<span class="w"> </span>list&gt;<span class="w"> </span>python<span class="w"> </span>run_&lt;MODEL&gt;_int8.py<span class="w"> </span>-m<span class="w"> </span>&lt;MODEL_ID&gt;<span class="w"> </span>--quantized-model-path<span class="w"> </span><span class="s2">&quot;./saved_results/best_model.pt&quot;</span><span class="w"> </span>--benchmark<span class="w"> </span>--jit<span class="w"> </span>--int8-bf16-mixed
</pre></div>
</div>
</section>
<section id="single-instance-accuracy">
<h4>Single Instance Accuracy<a class="headerlink" href="#single-instance-accuracy" title="Permalink to this heading"></a></h4>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span><span class="c1"># bfloat16</span>
<span class="nv">OMP_NUM_THREADS</span><span class="o">=</span>&lt;physical<span class="w"> </span>cores<span class="w"> </span>num&gt;<span class="w"> </span>numactl<span class="w"> </span>-m<span class="w"> </span>&lt;node<span class="w"> </span>N&gt;<span class="w"> </span>-C<span class="w"> </span>&lt;physical<span class="w"> </span>cores<span class="w"> </span>list&gt;<span class="w"> </span>python<span class="w"> </span>run_generation.py<span class="w"> </span>--accuracy-only<span class="w"> </span>-m<span class="w"> </span>&lt;MODEL_ID&gt;<span class="w"> </span>--dtype<span class="w"> </span>bfloat16<span class="w"> </span>--ipex<span class="w"> </span>--jit
<span class="nv">OMP_NUM_THREADS</span><span class="o">=</span>&lt;physical<span class="w"> </span>cores<span class="w"> </span>num&gt;<span class="w"> </span>numactl<span class="w"> </span>-m<span class="w"> </span>&lt;node<span class="w"> </span>N&gt;<span class="w"> </span>-C<span class="w"> </span>&lt;physical<span class="w"> </span>cores<span class="w"> </span>list&gt;<span class="w"> </span>python<span class="w"> </span>run_generation.py<span class="w"> </span>--accuracy-only<span class="w"> </span>-m<span class="w"> </span>&lt;MODEL_ID&gt;<span class="w"> </span>--dtype<span class="w"> </span>bfloat16<span class="w"> </span>--ipex<span class="w"> </span>--jit<span class="w"> </span>--lambada

<span class="c1"># Quantization as a performance part</span>
<span class="c1"># (1) Do quantization to get the quantized model as mentioned above</span>
Expand Down

0 comments on commit 3ed344a

Please sign in to comment.