diff --git a/blogs/pytorch-on-the-edge.html b/blogs/pytorch-on-the-edge.html index 90eb65a8e5667..89532f8d5af44 100644 --- a/blogs/pytorch-on-the-edge.html +++ b/blogs/pytorch-on-the-edge.html @@ -90,30 +90,30 @@

Run PyTorch models on the edge

  • High cost of cloud resources (especially when device capabilities are underutilized)
  • Application requirements to operate without internet connectivity
  • -

    In this article, we’ll demystify running PyTorch models on the edge. We define ‘edge’ as anywhere that is outside of the cloud, ranging from large, well-resourced personal computers to small footprint devices such as mobile phones. This has been a challenging task to accomplish in the past, but new advances in model optimization and software like ONNX Runtime make it more feasible – even for new generative AI and large language models like Stable Diffusion, Whisper, and Llama2.

    +

    In this article, we'll demystify running PyTorch models on the edge. We define 'edge' as anywhere that is outside of the cloud, ranging from large, well-resourced personal computers to small footprint devices such as mobile phones. This has been a challenging task to accomplish in the past, but new advances in model optimization and software like ONNX Runtime make it more feasible – even for new generative AI and large language models like Stable Diffusion, Whisper, and Llama2.

    Diagram showing the PyTorch logo representing a PyTorch model, fanning out to icons for web, mobile and browser devices running ONNX Runtime

    Considerations for PyTorch models on the edge

    There are several factors to keep in mind when thinking about running a PyTorch model on the edge:

    Tools for PyTorch models on the edge

    -

    We mentioned ONNX Runtime several times above. ONNX Runtime is a compact, standards-based engine that has deep integration with PyTorch. By using PyTorch’s ONNX APIs, your PyTorch models can run on a spectrum of edge devices with ONNX Runtime.

    +

    We mentioned ONNX Runtime several times above. ONNX Runtime is a compact, standards-based engine that has deep integration with PyTorch. By using PyTorch's ONNX APIs, your PyTorch models can run on a spectrum of edge devices with ONNX Runtime.

    -

    The first step for running PyTorch models on the edge is to get them into a lightweight format that doesn’t require the PyTorch framework and its gigabytes of dependencies. PyTorch has thought about this and includes an API that enables exactly this - torch.onnx. ONNX is an open standard that defines the operators that make up models. The PyTorch ONNX APIs take the Pythonic PyTorch code and turn it into a functional graph that captures the operators that are needed to run the model without Python. As with everything in machine learning, there are some limitations to be aware of. Some PyTorch models cannot be represented as a single graph – in this case you may need to output several graphs and stitch them together in your own pipeline.

    +

    The first step for running PyTorch models on the edge is to get them into a lightweight format that doesn't require the PyTorch framework and its gigabytes of dependencies. PyTorch has thought about this and includes an API that enables exactly this - torch.onnx. ONNX is an open standard that defines the operators that make up models. The PyTorch ONNX APIs take the Pythonic PyTorch code and turn it into a functional graph that captures the operators that are needed to run the model without Python. As with everything in machine learning, there are some limitations to be aware of. Some PyTorch models cannot be represented as a single graph – in this case you may need to output several graphs and stitch them together in your own pipeline.

    The popular Hugging Face library also has APIs that build on top of this torch.onnx functionality to export models to the ONNX format. Over 130,000 models are supported making it very likely that the model you care about is one of them.

    -

    In this article, we’ll show you several examples involving state-of-the-art PyTorch models (like Whisper and Stable Diffusion) on popular devices (like Windows laptops, mobile phones, and web browsers) via various languages (from C# to Javascript to Swift).

    +

    In this article, we'll show you several examples involving state-of-the-art PyTorch models (like Whisper and Stable Diffusion) on popular devices (like Windows laptops, mobile phones, and web browsers) via various languages (from C# to Javascript to Swift).

    PyTorch models on the edge

    @@ -129,9 +129,9 @@

    Stable Diffusion on Windows

    pipeline.save_pretrained("./onnx-stable-diffusion") -

    You don’t have to export the fifth model, ClipTokenizer, as it is available in ONNX Runtime extensions, a library for pre and post processing PyTorch models.

    +

    You don't have to export the fifth model, ClipTokenizer, as it is available in ONNX Runtime extensions, a library for pre and post processing PyTorch models.

    -

    To run this pipeline of models as a .NET application, we built the pipeline code in C#. This code can be run on CPU, GPU, or NPU, if they are available on your machine, using ONNX Runtime’s device-specific hardware accelerators. This is configured with the ExecutionProviderTarget below.

    +

    To run this pipeline of models as a .NET application, we built the pipeline code in C#. This code can be run on CPU, GPU, or NPU, if they are available on your machine, using ONNX Runtime's device-specific hardware accelerators. This is configured with the ExecutionProviderTarget below.

    
     static void Main(string[] args)
    @@ -301,7 +301,7 @@ 

    Train a model to recognize your voice on mobile

    )
    -

    This set of artifacts is now ready to be loaded by the mobile application, shown here as iOS Swift code. Within the application, a number of samples of the speaker’s audio are provided to the application and the model is trained with the samples.

    +

    This set of artifacts is now ready to be loaded by the mobile application, shown here as iOS Swift code. Within the application, a number of samples of the speaker's audio are provided to the application and the model is trained with the samples.

    
     func trainStep(inputData: [Data], labels: [Int64]) throws  {
    @@ -323,9 +323,7 @@ 

    Train a model to recognize your voice on mobile

    Where to next?

    -

    In this article we’ve shown why you would run PyTorch models on the edge and what aspects to consider. We also shared several examples with code that you can use for running state-of-the-art PyTorch model on the edge with ONNX Runtime. We also showed how ONNX Runtime was built for performance and cross-platform execution, making it the ideal way to run PyTorch models on the edge. You may have noticed that we didn’t include a Llama2 example even though ONNX Runtime is optimized to run it. That’s because the amazing Llama2 model deserves its own article, so stay tuned for that!

    - -

    You can read more about how to run your PyTorch model on the edge here: https://onnxruntime.ai/docs/

    +

    In this article we've shown why you would run PyTorch models on the edge and what aspects to consider. We also shared several examples with code that you can use for running state-of-the-art PyTorch model on the edge with ONNX Runtime. We also showed how ONNX Runtime was built for performance and cross-platform execution, making it the ideal way to run PyTorch models on the edge. You may have noticed that we didn't include a Llama2 example even though ONNX Runtime is optimized to run it. That's because the amazing Llama2 model deserves its own article, so stay tuned for that!