From 8e1c0409bbd5b135b4880c62ff84c56e9b76ccdd Mon Sep 17 00:00:00 2001 From: Prasanth Pulavarthi Date: Thu, 12 Oct 2023 19:00:03 -0700 Subject: [PATCH] Update pytorch-on-the-edge.html --- blogs/pytorch-on-the-edge.html | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/blogs/pytorch-on-the-edge.html b/blogs/pytorch-on-the-edge.html index 0b4b3f3b8c18d..a44d1187fad3b 100644 --- a/blogs/pytorch-on-the-edge.html +++ b/blogs/pytorch-on-the-edge.html @@ -161,7 +161,7 @@

Stable Diffusion on Windows

This is the output of the model pipeline, running with 50 inference iterations:

- Two golden retriever puppies playing in the grass + Two golden retriever puppies playing in the grass

You can build the application and run it on Windows with the detailed steps shown in this tutorial.

@@ -212,19 +212,19 @@

Text generation in the browser

-

You can also embed the call to the transformers pipeline using vanilla JS, or in a web application, with React, or Next.js, or write a browser extension.

+

You can also embed the call to the transformers pipeline using vanilla JavaScript, or in a web application, with React or Next.js, or write a browser extension.

-

Transformers.js currently uses web assembly to execute the model. Support for WebGPU, which will increase performance significantly, is coming *very* soon.

+

ONNX Runtime Web currently uses web assembly to execute the model on the CPU. This is fine for many models but leveraging the GPU, if one exists on the device, can improve the user experience. ONNX Runtime Web support for WebGPU is coming *very* soon and enables you to tap into the GPU while use the same inference APIs.

- Text generation in the browser using transformers.js. The prompt is Two golden retriever puppies are playing in the grass, and the response is playing in the grasslands. They are known for their playful nature and they have a playful face. + Text generation in the browser using transformers.js. The prompt is Two golden retriever puppies are playing in the grass, and the response is playing in the grasslands. They are known for their playful nature and they have a playful face.

Speech recognition with Whisper on mobile

-

Whisper from OpenAI is a transformer-based speech recognition PyTorch model. Whisper has a number of different size variants, the smallest: Whisper Tiny, is suitable to run on mobile. All components of the Whisper Tiny model (audio decoder, encoder, decoder and text sequence generation) can be composed and exported to a single ONNX model using the Olive framework. To run this model as part of a mobile application, you can use ONNX Runtime mobile, which has support for Android, iOS, react-native and MAUI/Xamarin.

+

Whisper from OpenAI is a PyTorch speech recognition model. Whisper comes in a number of different size variants - the smallest, Whisper Tiny, is suitable to run on mobile devices. All components of the Whisper Tiny model (audio decoder, encoder, decoder, and text sequence generation) can be composed and exported to a single ONNX model using the Olive framework. To run this model as part of a mobile application, you can use ONNX Runtime Mobile, which supports Android, iOS, react-native, and MAUI/Xamarin.

-

ONNX Runtime mobile supports hardware acceleration via NNAPI (on Android) and CoreML (on iOS) and XNNPACK (both iOS and Android).

+

ONNX Runtime Mobile supports hardware acceleration via NNAPI (on Android), CoreML (on iOS), and XNNPACK (both iOS and Android).

-

As an example, the relevant snippet of an Android mobile app to perform speech transcription on short samples of audio, is shown below.

+

The relevant snippet of a example Android mobile app that performs speech transcription on short samples of audio is shown below:


 init {
@@ -323,9 +323,9 @@ 

Train a model to recognize your voice on mobile

Where to next?

-

In this article we’ve shown why you would run PyTorch models on the edge and what aspects to consider. We also shared several examples with code that you can use for running state-of-the-art PyTorch model on the edge with ONNX Runtime. We also showed how ONNX Runtime was built for performance and cross-platform execution, making it the ideal way to run PyTorch models on the edge. You may have noticed that we didn’t include a Llama2 example even though ONNX Runtime is optimized to run it. That’s because the amazing Llama2 model deserves its own article, so stay tuned for that!

+

In this article we’ve shown why you would run PyTorch models on the edge and what aspects to consider. We also shared several examples with code that you can use for running state-of-the-art PyTorch models on the edge with ONNX Runtime. We also showed how ONNX Runtime was built for performance and cross-platform execution, making it the ideal way to run PyTorch models on the edge. You may have noticed that we didn’t include a Llama2 example even though ONNX Runtime is optimized to run it. That’s because the amazing Llama2 model deserves its own article, so stay tuned for that!

-

You can read more about how to run your PyTorch model on the edge here: https://onnxruntime.ai/docs/

+

You can read more about how to run your PyTorch models on the edge here: https://onnxruntime.ai/docs/