From a45e81bab4942c3058c6a16f61c53d136c59ea42 Mon Sep 17 00:00:00 2001 From: Maanav Dalal Date: Mon, 26 Feb 2024 18:07:06 -0800 Subject: [PATCH] Update src/routes/blogs/accelerating-phi-2/+page.svx Co-authored-by: Sophie Schoenmeyer <107952697+sophies927@users.noreply.github.com> --- src/routes/blogs/accelerating-phi-2/+page.svx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/routes/blogs/accelerating-phi-2/+page.svx b/src/routes/blogs/accelerating-phi-2/+page.svx index 218ec173e5439..802f71991d657 100644 --- a/src/routes/blogs/accelerating-phi-2/+page.svx +++ b/src/routes/blogs/accelerating-phi-2/+page.svx @@ -85,7 +85,7 @@ Here is an example of [Phi-2 optimizations with Olive](https://github.com/micros ## Training -In addition to inference, ONNX Runtime also provides training speedup for Phi-2 and other LLMs. ORT Training is part of the PyTorch Ecosystem and is available via the torch-ort python package, as part of the Azure Container for PyTorch (ACPT). It provides flexible and extensible hardware support, where the same model and APIs works with both NVIDIA and AMD GPUs. ORT accelerates training through optimized kernels and memory optimizations which show significant gains in reducing end-to-end training time for large model training. This involves changing a few lines of code in the model to wrap it with the ORTModule API. It is also composable with popular acceleration libraries like DeepSpeed and Megatron for faster and more efficient training. +In addition to inference, ONNX Runtime also provides training speedup for Phi-2 and other LLMs. ORT training is part of the PyTorch Ecosystem and is available via the torch-ort python package as part of the Azure Container for PyTorch (ACPT). It provides flexible and extensible hardware support, where the same model and APIs works with both NVIDIA and AMD GPUs. ORT accelerates training through optimized kernels and memory optimizations which show significant gains in reducing end-to-end training time for large model training. This involves changing a few lines of code in the model to wrap it with the ORTModule API. It is also composable with popular acceleration libraries like DeepSpeed and Megatron for faster and more efficient training. Open AI's Triton is a domain specific language and compiler to write highly efficient custom deep learning primitives. ORT supports Open AI Triton integration (ORT+Triton), where all element wise operators are converted to Triton ops and ORT creates custom fused kernels in Triton.