From faf0809e25be8dee45e9f52cb0f000bce767ac9e Mon Sep 17 00:00:00 2001
From: Maanav Dalal <maanavdalal@gmail.com>
Date: Tue, 23 Apr 2024 15:15:33 -0700
Subject: [PATCH] fixed repeated text. (#20446)

### Description
<!-- Describe your changes. -->


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: MaanavD <maanavdalal@microsoft.com>
---
 src/routes/blogs/accelerating-phi-3/+page.svx | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/src/routes/blogs/accelerating-phi-3/+page.svx b/src/routes/blogs/accelerating-phi-3/+page.svx
index 8d65a479741dd..0523dfdc904da 100644
--- a/src/routes/blogs/accelerating-phi-3/+page.svx
+++ b/src/routes/blogs/accelerating-phi-3/+page.svx
@@ -33,7 +33,7 @@ See below for dedicated performance numbers.
 
 ## ONNX Runtime for Mobile
 
-In addition to supporting both Phi-3 Mini models on various GPUs, ONNX Runtime can help run these models on Mobile, Windows, and Mac CPUs, making it a truly cross-platform framework. ONNX Runtime also supports quantization techniques like RTN to enable these models to run across many different hardware.
+In addition to supporting both Phi-3 Mini models on Windows, ONNX Runtime can help run these models on other client devices including Mobile and Mac CPUs, making it a truly cross-platform framework. ONNX Runtime also supports quantization techniques like RTN to enable these models to run across many different types of hardware.
 
 ONNX Runtime Mobile empowers developers to perform on-device inference with AI models on mobile and edge devices. By removing client-server communications, ORT Mobile provides privacy protection and has zero cost. Using RTN INT4 quantization, we significantly reduce the size of the state-of-the-art Phi-3 Mini models and can run both on a Samsung Galaxy S21 at a moderate speed. When applying RTN INT4 quantization, there is a tuning parameter for the int4 accuracy level. This parameter specifies the minimum accuracy level required for the activation of MatMul in int4 quantization, balancing performance and accuracy trade-offs. Two versions of RTN quantized models have been released with int4_accuracy_level=1, optimized for accuracy, and int4_accuracy_level=4, optimized for performance. If you prefer better performance with a slight trade-off in accuracy, we recommend using the model with int4_accuracy_level=4.
 
@@ -46,10 +46,6 @@ For FP16 CUDA and INT4 CUDA, Phi-3 Mini-128K-Instruct with ORT performs up to 5X
 
 For FP16 and INT4 CUDA, Phi-3 Mini-4K-Instruct with ORT performs up to 5X faster and up to 10X faster than PyTorch, respectively. Phi-3 Mini-4K-Instruct is also up to 3X faster than Llama.cpp for large sequence lengths.
 
-In addition to supporting both Phi-3 Mini models on various GPUs, ONNX Runtime can help run these models on mobile, Windows, and Mac CPUs, making it a truly cross-platform framework. ONNX Runtime also supports quantization techniques like RTN to enable these models to run across many different hardware.
-
-ONNX Runtime Mobile empowers developers to perform on-device inference with AI models on mobile and edge devices. By removing client-server communications, ORT Mobile provides privacy protection and has zero cost. Using RTN INT4 quantization, we significantly reduce the size of the state-of-the-art Phi-3 Mini models and can run both on a Samsung Galaxy S21 at a moderate speed. When applying RTN INT4 quantization, there is a tuning parameter for the INT4 accuracy level. This parameter specifies the minimum accuracy level required for the activation of MatMul in INT4 quantization, balancing performance and accuracy trade-offs. Two versions of RTN quantized models have been released: (1) the model optimized for accuracy with int4_accuracy_level=1 and (2) the model optimized for performance with int4_accuracy_level=4. If you prefer better performance with a slight trade-off in accuracy, we recommend using the model with int4_accuracy_level=4.
-
 Whether it's Windows, Linux, Android, or Mac, there's a path to infer models efficiently with ONNX Runtime!
 
 ## Try the ONNX Runtime Generate() API