diff --git a/src/routes/blogs/accelerating-phi-2/+page.svx b/src/routes/blogs/accelerating-phi-2/+page.svx index e4d408fa3c37f..3adb1361b4c1a 100644 --- a/src/routes/blogs/accelerating-phi-2/+page.svx +++ b/src/routes/blogs/accelerating-phi-2/+page.svx @@ -28,7 +28,7 @@ url: 'https://onnxruntime.ai/blogs/accelerating-phi-2' In a fastmoving landscape where speed and efficiency are paramount, [ONNX Runtime](https://onnxruntime.ai/blogs/accelerating-llama-2) (ORT) allows users to easily integrate the power of generative AI models into their apps and services with improved optimizations that yield faster inferencing speeds and effectively lowers costs. These include state-of-the-art fusion and kernel optimizations to help improve model performance. The recent [ONNX Runtime 1.17 release](https://github.com/microsoft/onnxruntime/releases/tag/v1.15.0) improves inference performance of several Gen AI models including Phi-2, Mistral, CodeLlama, Orca-2 and more. ONNX Runtime is a complete solution for small language models (SLMs) from training to inference, showing significant speedups compared to other frameworks. With support for float32, float16, and int4, ONNX Runtime's inference enhancements provide maximum flexibility and performance. -In this blog we will cover significant optimization speed up for both training and inference for latest GenAI models like Phi-2, Mistral, CodeLlama, SD-Turbo, SDXL-Turbo, Llama2, and Orca-2. For these model architectures ONNX Runtime significantly improves performance across a spectrum of batch size and prompt length when compared against other frameworks like PyTorch, and Llama.cpp. These optimizations using ONNX Runtime is now also available using [Olive](https://github.com/microsoft/Olive/tree/main/examples/). +In this blog, we will cover significant optimization speed up for both training and inference for the latest GenAI models like Phi-2, Mistral, CodeLlama, SD-Turbo, SDXL-Turbo, Llama2, and Orca-2. For these model architectures, ONNX Runtime significantly improves performance across a spectrum of batch sizes and prompt lengths when compared against other frameworks like PyTorch, and Llama.cpp. These optimizations using ONNX Runtime are now also available using [Olive](https://github.com/microsoft/Olive/tree/main/examples/). # Quick Links - [Phi-2](#phi-2) - [Mistral](#mistral)