Added ToC.

microsoft · Feb 26, 2024 · 2736a9b · 2736a9b
1 parent e345d37
commit 2736a9b
Showing 1 changed file with 12 additions and 2 deletions.
diff --git a/src/routes/blogs/accelerating-phi-2/+page.svx b/src/routes/blogs/accelerating-phi-2/+page.svx
@@ -5,7 +5,7 @@ description: 'Improvements with ONNX Runtime for inferencing popular Gen AI mode
 keywords: 'GenAI , LLM, SLM, ONNXRuntime, ORT, Phi, Mistral, Mixtral, LLama, Gemma, Stable Diffusion, Orca'
 authors:
   [
-    'Parinita Rah',
+    'Parinita Rahi',
     'Sunghoon Choi',
     'Yufeng Li',
     'Kshama Pawar',
@@ -25,9 +25,19 @@ image: 'accelerating-phi-2/Phi2_Int4_TokenGenerationTP.png'
 url: 'https://onnxruntime.ai/blogs/accelerating-phi-2'
 ---
 
-In a fastmoving landscape where speed and efficiency are paramount, [ONNX Runtime](https://onnxruntime.ai/blogs/accelerating-llama-2) (ORT) allows users to easily integrate the power of generative AI models into their apps and services with improved optimizations that yield faster inferencing speeds and effectively lowers costs. These include state-of-the-art fusion and kernel optimizations to help improve model performance. The recent [ONNX Runtime 1.17 release](https://github.com/microsoft/onnxruntime/releases/tag/v1.15.0) improves inference performance of several Gen AI models including Phi-2, Mistral, CodeLlama, Orca-2 and more. ONNX Runtime is a complete solution for small language models (SLMs) from training to inference, showing significant speedups compared to other frameworks. **With support for float32, float16, and int4, ONNX Runtime's inference enhancements provide maximum flexibility and performance.**
+
+In a fastmoving landscape where speed and efficiency are paramount, [ONNX Runtime](https://onnxruntime.ai/blogs/accelerating-llama-2) (ORT) allows users to easily integrate the power of generative AI models into their apps and services with improved optimizations that yield faster inferencing speeds and effectively lowers costs. These include state-of-the-art fusion and kernel optimizations to help improve model performance. The recent [ONNX Runtime 1.17 release](https://github.com/microsoft/onnxruntime/releases/tag/v1.15.0) improves inference performance of several Gen AI models including Phi-2, Mistral, CodeLlama, Orca-2 and more. ONNX Runtime is a complete solution for small language models (SLMs) from training to inference, showing significant speedups compared to other frameworks. With support for float32, float16, and int4, ONNX Runtime's inference enhancements provide maximum flexibility and performance.
 
 In this blog we will cover significant optimization speed up for both training and inference for latest GenAI models like Phi-2, Mistral, CodeLlama, SD-Turbo, SDXL-Turbo, Llama2, and Orca-2. For these model architectures ONNX Runtime significantly improves performance across a spectrum of batch size and prompt length when compared against other frameworks like PyTorch, and Llama.cpp. These optimizations using ONNX Runtime is now also available using [Olive](https://github.com/microsoft/Olive/tree/main/examples/).
+# Quick Links
+- [Phi-2](#phi-2)
+- [Mistral](#mistral)
+- [CodeLlama](#codellama)
+- [SD-Turbo and SDXL-Turbo](#sd-turbo-and-sdxl-turbo)
+- [Llama-2](#llama-2)
+- [Orca-2](#orca-2)
+- [Gemma](#gemma)
+- [Conclusion](#conclusion)
 
 # Phi-2