Skip to content

Commit

Permalink
Added ToC.
Browse files Browse the repository at this point in the history
  • Loading branch information
MaanavD committed Feb 26, 2024
1 parent e345d37 commit 2736a9b
Showing 1 changed file with 12 additions and 2 deletions.
14 changes: 12 additions & 2 deletions src/routes/blogs/accelerating-phi-2/+page.svx
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: 'Improvements with ONNX Runtime for inferencing popular Gen AI mode
keywords: 'GenAI , LLM, SLM, ONNXRuntime, ORT, Phi, Mistral, Mixtral, LLama, Gemma, Stable Diffusion, Orca'
authors:
[
'Parinita Rah',
'Parinita Rahi',
'Sunghoon Choi',
'Yufeng Li',
'Kshama Pawar',
Expand All @@ -25,9 +25,19 @@ image: 'accelerating-phi-2/Phi2_Int4_TokenGenerationTP.png'
url: 'https://onnxruntime.ai/blogs/accelerating-phi-2'
---

In a fastmoving landscape where speed and efficiency are paramount, [ONNX Runtime](https://onnxruntime.ai/blogs/accelerating-llama-2) (ORT) allows users to easily integrate the power of generative AI models into their apps and services with improved optimizations that yield faster inferencing speeds and effectively lowers costs. These include state-of-the-art fusion and kernel optimizations to help improve model performance. The recent [ONNX Runtime 1.17 release](https://github.com/microsoft/onnxruntime/releases/tag/v1.15.0) improves inference performance of several Gen AI models including Phi-2, Mistral, CodeLlama, Orca-2 and more. ONNX Runtime is a complete solution for small language models (SLMs) from training to inference, showing significant speedups compared to other frameworks. **With support for float32, float16, and int4, ONNX Runtime's inference enhancements provide maximum flexibility and performance.**

In a fastmoving landscape where speed and efficiency are paramount, [ONNX Runtime](https://onnxruntime.ai/blogs/accelerating-llama-2) (ORT) allows users to easily integrate the power of generative AI models into their apps and services with improved optimizations that yield faster inferencing speeds and effectively lowers costs. These include state-of-the-art fusion and kernel optimizations to help improve model performance. The recent [ONNX Runtime 1.17 release](https://github.com/microsoft/onnxruntime/releases/tag/v1.15.0) improves inference performance of several Gen AI models including Phi-2, Mistral, CodeLlama, Orca-2 and more. ONNX Runtime is a complete solution for small language models (SLMs) from training to inference, showing significant speedups compared to other frameworks. With support for float32, float16, and int4, ONNX Runtime's inference enhancements provide maximum flexibility and performance.

In this blog we will cover significant optimization speed up for both training and inference for latest GenAI models like Phi-2, Mistral, CodeLlama, SD-Turbo, SDXL-Turbo, Llama2, and Orca-2. For these model architectures ONNX Runtime significantly improves performance across a spectrum of batch size and prompt length when compared against other frameworks like PyTorch, and Llama.cpp. These optimizations using ONNX Runtime is now also available using [Olive](https://github.com/microsoft/Olive/tree/main/examples/).
# Quick Links
- [Phi-2](#phi-2)
- [Mistral](#mistral)
- [CodeLlama](#codellama)
- [SD-Turbo and SDXL-Turbo](#sd-turbo-and-sdxl-turbo)
- [Llama-2](#llama-2)
- [Orca-2](#orca-2)
- [Gemma](#gemma)
- [Conclusion](#conclusion)

# Phi-2

Expand Down

0 comments on commit 2736a9b

Please sign in to comment.