Skip to content

Commit

Permalink
Cleaned up code, added info to table.
Browse files Browse the repository at this point in the history
  • Loading branch information
MaanavD committed Apr 23, 2024
1 parent b22a021 commit db4a42c
Showing 1 changed file with 13 additions and 7 deletions.
20 changes: 13 additions & 7 deletions src/routes/blogs/accelerating-phi-3/+page.svx
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ You can now run Microsoft's latest home-grown [Phi-3 models](https://aka.ms/phi3

Many language models are too large to run locally on most devices, but Phi-3 represents a significant exception to this rule: this small but mighty suite of models achieves comparable performance to models 10 times larger! Phi-3-mini is also the first model in its weight class to support long contexts of up to 128K tokens. To learn more about how Microsoft's strategic data curation and innovative scaling achieved these remarkable results, see [here](https://aka.ms/phi3-tech-report).

You can easily get started with Phi-3 with our newly introduced ONNX runtime Generate() API, found [here](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi-3-tutorial.md)!

## DirectML and ONNX Runtime scales Phi-3 Mini on Windows

By itself, Phi-3 is already small enough to run on many Windows devices, but why stop there? Making Phi-3 even smaller with quantization would dramatically expand the model's reach on Windows, but not all quantization techniques are created equal. We wanted to ensure scalability while also maintaining model accuracy.
Expand Down Expand Up @@ -50,18 +52,21 @@ We are pleased to announce our new Generate() API, which makes it easier to run
This API makes it easy to drag and drop LLMs straight into your app. To run the early version of these models with ONNX, follow the steps [here](http://aka.ms/generate-tutorial).

Example:
<pre>
<code>
python model-qa.py -m /***YourModelPath***/onnx/cpu_and_mobile/phi-3-mini-4k-instruct-int4-cpu -k 40 -p 0.95 -t 0.8 -r 1.0

\- python model-qa.py -m /***YourModelPath***/onnx/cpu_and_mobile/phi-3-mini-4k-instruct-int4-cpu -k 40 -p 0.95 -t 0.8 -r 1.0

\- Input: &lt;user&gt;Tell me a joke&lt;end&gt;&lt;assistant&gt;
Input: &lt;user&gt; Tell me a joke &lt;end&gt;

\- Output: Why don't scientists trust atoms?
Output: &lt;assistant&gt; Why don't scientists trust atoms?

Because they make up everything!

\- This joke plays on the double meaning of "make up." In science, atoms are the fundamental building blocks of matter, literally making up everything. However, in a colloquial sense, "to make up" can mean to fabricate or lie, hence the humor.

Please watch this space for more updates on AMD, and additional optimization with ORT 1.18. Also, Check out our [Build Talk](https://build.microsoft.com/en-US/sessions/e6d21a49-2efb-4a39-8c26-f6eef1410c7a?source=sessions) in late May to learn more about this API!
This joke plays on the double meaning of "make up." In science, atoms are the fundamental building blocks of matter,
literally making up everything. However, in a colloquial sense, "to make up" can mean to fabricate or lie, hence the humor. &lt;end&gt;
</code>
</pre>
Please watch this space for more updates on AMD, and additional optimization with ORT 1.18. Also, check out our [Build Talk](https://build.microsoft.com/en-US/sessions/e6d21a49-2efb-4a39-8c26-f6eef1410c7a?source=sessions) in late May to learn more about this API!

## Performance Metrics

Expand Down Expand Up @@ -91,6 +96,7 @@ Stay tuned for additional performance improvements in the coming weeks thanks to
<tr><td>64</td><td>512</td><td>272.47</td></tr>
<tr><td>64</td><td>1024</td><td>245.67</td></tr>
<tr><td>64</td><td>2048</td><td>220.55</td></tr>
<i>Results computed with Batch size = 1</i>
</table>

</div>
Expand Down

0 comments on commit db4a42c

Please sign in to comment.