Skip to content

Commit

Permalink
Update src/routes/blogs/accelerating-phi-2/+page.svx
Browse files Browse the repository at this point in the history
Co-authored-by: Sophie Schoenmeyer <[email protected]>
  • Loading branch information
MaanavD and sophies927 authored Feb 27, 2024
1 parent cef1ea7 commit 1a3b85a
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/routes/blogs/accelerating-phi-2/+page.svx
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ url: 'https://onnxruntime.ai/blogs/accelerating-phi-2'

In a fastmoving landscape where speed and efficiency are paramount, [ONNX Runtime](https://onnxruntime.ai/blogs/accelerating-llama-2) (ORT) allows users to easily integrate the power of generative AI models into their apps and services with improved optimizations that yield faster inferencing speeds and effectively lowers costs. These include state-of-the-art fusion and kernel optimizations to help improve model performance. The recent [ONNX Runtime 1.17 release](https://github.com/microsoft/onnxruntime/releases/tag/v1.15.0) improves inference performance of several Gen AI models including Phi-2, Mistral, CodeLlama, Orca-2 and more. ONNX Runtime is a complete solution for small language models (SLMs) from training to inference, showing significant speedups compared to other frameworks. With support for float32, float16, and int4, ONNX Runtime's inference enhancements provide maximum flexibility and performance.

In this blog we will cover significant optimization speed up for both training and inference for latest GenAI models like Phi-2, Mistral, CodeLlama, SD-Turbo, SDXL-Turbo, Llama2, and Orca-2. For these model architectures ONNX Runtime significantly improves performance across a spectrum of batch size and prompt length when compared against other frameworks like PyTorch, and Llama.cpp. These optimizations using ONNX Runtime is now also available using [Olive](https://github.com/microsoft/Olive/tree/main/examples/).
In this blog, we will cover significant optimization speed up for both training and inference for the latest GenAI models like Phi-2, Mistral, CodeLlama, SD-Turbo, SDXL-Turbo, Llama2, and Orca-2. For these model architectures, ONNX Runtime significantly improves performance across a spectrum of batch sizes and prompt lengths when compared against other frameworks like PyTorch, and Llama.cpp. These optimizations using ONNX Runtime are now also available using [Olive](https://github.com/microsoft/Olive/tree/main/examples/).
# Quick Links
- [Phi-2](#phi-2)
- [Mistral](#mistral)
Expand Down

0 comments on commit 1a3b85a

Please sign in to comment.