-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
diff --git a/README.md b/README.md
new file mode 100644
index 0000000000000..5c91169b0ca65
--- /dev/null
+++ b/README.md
@@ -0,0 +1,38 @@
+# create-svelte
+
+Everything you need to build a Svelte project, powered by [`create-svelte`](https://github.com/sveltejs/kit/tree/master/packages/create-svelte).
+
+## Creating a project
+
+If you're seeing this, you've probably already done this step. Congrats!
+
+```bash
+# create a new project in the current directory
+npm create svelte@latest
+
+# create a new project in my-app
+npm create svelte@latest my-app
+```
+
+## Developing
+
+Once you've created a project and installed dependencies with `npm install` (or `pnpm install` or `yarn`), start a development server:
+
+```bash
+npm run dev
+
+# or start the server and open the app in a new browser tab
+npm run dev -- --open
+```
+
+## Building
+
+To create a production version of your app:
+
+```bash
+npm run build
+```
+
+You can preview the production build with `npm run preview`.
+
+> To deploy your app, you may need to install an [adapter](https://kit.svelte.dev/docs/adapters) for your target environment.
diff --git a/_config.yml b/_config.yml
index 5084e8272173d..0620f40ac8af4 100644
--- a/_config.yml
+++ b/_config.yml
@@ -10,28 +10,28 @@ plugins:
- jekyll-redirect-from
kramdown:
parse_block_html: true
- toc_levels: "2"
-logo: "/images/ONNX-Runtime-logo.svg"
-aux_links:
- "ONNX Runtime":
- - "/"
- "Install":
- - "/docs/install/"
- "Get Started":
- - "/docs/get-started/"
- "Tutorials":
- - "/docs/tutorials/"
- "API Docs":
- - "/docs/api/"
- "YouTube":
- - "https://www.youtube.com/onnxruntime"
- "GitHub":
- - "https://github.com/microsoft/onnxruntime"
-ga_tracking: UA-156955408-1
+ toc_levels: '2'
+logo: '/images/ONNX-Runtime-logo.svg'
+aux_links:
+ 'ONNX Runtime':
+ - '/'
+ 'Install':
+ - '/docs/install/'
+ 'Get Started':
+ - '/docs/get-started/'
+ 'Tutorials':
+ - '/docs/tutorials/'
+ 'API Docs':
+ - '/docs/api/'
+ 'YouTube':
+ - 'https://www.youtube.com/onnxruntime'
+ 'GitHub':
+ - 'https://github.com/microsoft/onnxruntime'
+ga_tracking: UA-156955408-1
# Footer "Edit this page on GitHub" link text
gh_edit_link: true # show or hide edit this page link
-gh_edit_link_text: "Edit this page on GitHub"
-gh_edit_repository: "https://github.com/microsoft/onnxruntime" # the github URL for your repo
-gh_edit_branch: "gh-pages" # the branch that your docs is served from
+gh_edit_link_text: 'Edit this page on GitHub'
+gh_edit_repository: 'https://github.com/microsoft/onnxruntime' # the github URL for your repo
+gh_edit_branch: 'gh-pages' # the branch that your docs is served from
# gh_edit_source: docs # the source that your files originate from
-gh_edit_view_mode: "tree" # "tree" or "edit" if you want the user to jump into the editor immediately
\ No newline at end of file
+gh_edit_view_mode: 'tree' # "tree" or "edit" if you want the user to jump into the editor immediately
diff --git a/_includes/footer_custom.html b/_includes/footer_custom.html
deleted file mode 100644
index 5b0c6763e6de2..0000000000000
--- a/_includes/footer_custom.html
+++ /dev/null
@@ -1,3 +0,0 @@
-{%- assign url = page.url -%}
-
-
- ONNX Runtime is an open source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms. It enables acceleration of machine learning inferencing across all of your deployment targets using a single set of API. ONNX Runtime automatically parses through your model to identify optimization opportunities and provides access to the best hardware acceleration available.
-
-
- ONNX Runtime also offers training acceleration, which incorporates innovations from Microsoft Research and is proven across production workloads like Office 365, Bing and Visual Studio.
-
- At Microsoft, ONNX Runtime is used as the primary Machine Learning inferencing solution for products groups. ONNX Runtime serves over 1 trillion daily inferences across over 150 production models covering all task domains.
-
- Run any ONNX model using a single set of inference APIs that provide access to the best hardware acceleration available. Built-in optimization features trim and consolidate nodes without impacting model accuracy. Additionally, full backwards compatibility for ONNX and ONNX-ML ensures all ONNX models can be inferenced.
-
-
-
-
-
-
-
-
-
API and platform support
-
- Take advantage of the benefits of ONNX Runtime without changing your technology stack. Access ONNX Runtime using your preferred API — C#, C++, C, Python, or Java. Support for Linux, Windows and Mac allows you to build and deploy applications without worry.
-
-
-
-
-
-
-
-
-
Continuous community innovation
-
- Our community of partners and contributors drives constant innovation. Partners provide ONNX compatible compilers and accelerators to ensure models are as efficient as possible. Our contributor community improves ONNX Runtime by contributing code, ideas and feedback. Join us on GitHub.
-
-
-
-
-
-
-
-
-
-
-
-
-
Design principles
-
- ONNX Runtime abstracts custom accelerators and runtimes to maximize their benefits across an ONNX model. To do this, ONNX Runtime partitions the ONNX model graph into subgraphs that align with available custom accelerators and runtimes. When operators are not supported by custom accelerators or runtimes, ONNX Runtime provides a default runtime that is used as the fallback execution — ensuring that any model will run. Learn more.
-
- Most modern ML models are developed with PyTorch. The agility and flexibility that PyTorch provides for creating and training models has made it the most popular deep learning framework today. The typical workflow is to train these models in the cloud and run them from the cloud as well. However, many scenarios are arising that make it more attractive – or in some cases, required – to run locally on device. These include:
-
-
-
Avoiding network round-trips to the cloud (for example in audio and video processing)
-
Keeping user data on device (for privacy protection or regulatory requirements)
-
High cost of cloud resources (especially when device capabilities are underutilized)
-
Application requirements to operate without internet connectivity
-
-
-
-
-
In this article, we'll demystify running PyTorch models on the edge. We define 'edge' as anywhere that is outside of the cloud, ranging from large, well-resourced personal computers to small footprint devices such as mobile phones. This has been a challenging task to accomplish in the past, but new advances in model optimization and software like ONNX Runtime make it more feasible – even for new generative AI and large language models like Stable Diffusion, Whisper, and Llama2.
-
-
Considerations for PyTorch models on the edge
-
-
There are several factors to keep in mind when thinking about running a PyTorch model on the edge:
-
-
Size: modern models can be several gigabytes (hence the name Large Language Models!). On the cloud, size is usually not a consideration until it becomes too large to fit on a single GPU. At that point there are various well-known solutions for running across multiple GPUs. For edge devices, we need to find models that can fit within the constraints of the device. This sometimes requires a tradeoff with quality. Most modern models come in several sizes (1 billion parameters, 13 billion parameters, 70 billion parameters, etc) so you can select a variant that fits on your device. Techniques such as quantization are usually applied to reduce the number of bits representing parameters, further reducing the model size. The size of the application is also constrained by the app stores, so bringing in gigabytes of libraries won't work on the edge.
-
API for application integration: on the cloud, models are usually packaged as Docker containers that expose an endpoint that is called by an application or service. On edge devices, Docker containers may take up too many resources or may not even be supported. By using an optimized engine, like ONNX Runtime, the dependency on Python and Docker containers can be eliminated. ONNX Runtime also has APIs in many languages including C, C++, C#, Rust, Java, JavaScript, Objective-C and Swift for making it easier to integrate natively with the hosting application.
-
Performance: with large amounts of memory, no power restrictions, and hefty compute capabilities, running non-optimized models on the cloud is possible. On edge devices, these luxuries do not exist and optimization is crucial. For example, ONNX Runtime optimizes memory allocations, fuses model operators, reduces kernel launch times, minimizes tensor transfers between processing units, and applies tuned matrix math algorithms. It’s also able to make use of compilers and engines that are device-specific, providing a common interface for your application while harnessing the best approach on each device.
-
Maintainability: on the cloud, updating a model is as simple as deploying a new container image and ramping up traffic. On the edge, you need to consider how you will distribute model updates. Sometimes this involves publishing updates to an app store, sometimes it might be possible to implement a data update mechanism within your app and download new model files or maybe even deltas. There are many possible paths, so we won’t go into much depth on this topic in this article but it’s an aspect to keep in mind as you plan for production.
-
Hybrid: instead of cloud versus device, you can choose to utilize both. There are several hybrid patterns that are used in production today by applications such as Office. One pattern is to dynamically decide whether to run on the device or in the cloud based on network conditions or input characteristics. Another pattern is to run part of the model pipeline on the device and part on the cloud. This is especially useful with modern model pipelines that have separate encoder and decoder stages. Using an engine like ONNX Runtime that works on both cloud and device simplifies development. We’ll discuss hybrid scenarios in more detail in a forthcoming article.
-
Personalization: in many cases, the PyTorch model is simply being run on the device. However, you may also have scenarios where you need to personalize the model on the device without sending data to the cloud. Recommendation and content targeting are example scenarios that can improve their quality by updating models based on activity on the device. Fine tuning and training with PyTorch on the device may not feasible (due to performance and size concerns) but using an engine like ONNX Runtime allows PyTorch models to be updated and personalized locally. The same mechanism also enabled federated learning, which can help mitigate user data exposure.
-
-
-
Tools for PyTorch models on the edge
-
-
We mentioned ONNX Runtime several times above. ONNX Runtime is a compact, standards-based engine that has deep integration with PyTorch. By using PyTorch's ONNX APIs, your PyTorch models can run on a spectrum of edge devices with ONNX Runtime.
-
-
The first step for running PyTorch models on the edge is to get them into a lightweight format that doesn't require the PyTorch framework and its gigabytes of dependencies. PyTorch has thought about this and includes an API that enables exactly this - torch.onnx. ONNX is an open standard that defines the operators that make up models. The PyTorch ONNX APIs take the Pythonic PyTorch code and turn it into a functional graph that captures the operators that are needed to run the model without Python. As with everything in machine learning, there are some limitations to be aware of. Some PyTorch models cannot be represented as a single graph – in this case you may need to output several graphs and stitch them together in your own pipeline.
-
-
The popular Hugging Face library also has APIs that build on top of this torch.onnx functionality to export models to the ONNX format. Over 130,000 models are supported making it very likely that the model you care about is one of them.
-
-
In this article, we'll show you several examples involving state-of-the-art PyTorch models (like Whisper and Stable Diffusion) on popular devices (like Windows laptops, mobile phones, and web browsers) via various languages (from C# to JavaScript to Swift).
-
-
Examples of PyTorch models on the edge
-
-
Stable Diffusion on Windows
-
-
The Stable Diffusion pipeline consists of five PyTorch models that build an image from a text description. The diffusion process iterates on random pixels until the output image matches the description.
-
-
To run on the edge, four of the models can be exported to ONNX format from HuggingFace.
You don't have to export the fifth model, ClipTokenizer, as it is available in ONNX Runtime extensions, a library for pre and post processing PyTorch models.
-
-
To run this pipeline of models as a .NET application, we build the pipeline code in C#. This code can be run on CPU, GPU, or NPU, if they are available on your machine, using ONNX Runtime's device-specific hardware accelerators. This is configured with the ExecutionProviderTarget below.
-
-
-static void Main(string[] args)
-{
- var prompt = "Two golden retriever puppies playing in the grass.";
- var config = new StableDiffusionConfig
- {
- NumInferenceSteps = 50,
- GuidanceScale = 7.5,
- ExecutionProviderTarget = StableDiffusionConfig.ExecutionProvider.Cpu,
- DeviceId = 0,
- TokenizerOnnxPath = @".\models\tokenizer\model.onnx",
- TextEncoderOnnxPath = @".\models\text_encoder\model.onnx",
- UnetOnnxPath = @".\models\unet\model.onnx",
- VaeDecoderOnnxPath = @".\models\vae_decoder\model.onnx",
- SafetyModelPath = @".\models\safety_checker\model.onnx",
- };
-
- var image = UNet.Inference(prompt, config);
-
- if (image == null)
- {
- Console.WriteLine("Unable to create image, please try again.");
- }
-}
-
-
-
This is the output of the model pipeline, running with 50 inference iterations:
-
-
-
-
You can build the application and run it on Windows with the detailed steps shown in this tutorial.
-
-
Text generation in the browser
-
-
Running a PyTorch model locally in the browser is not only possible but super simple with the transformers.js library. Transformers.js uses ONNX Runtime Web as its backend. Many models are already converted to ONNX and served by the tranformers.js CDN, making inference in the browser a matter of writing a few lines of HTML:
You can also embed the call to the transformers pipeline using vanilla JavaScript, or in a web application, with React or Next.js, or write a browser extension.
-
-
ONNX Runtime Web currently uses web assembly to execute the model on the CPU. This is fine for many models but leveraging the GPU, if one exists on the device, can improve the user experience. ONNX Runtime Web support for WebGPU is coming *very* soon and enables you to tap into the GPU while use the same inference APIs.
-
-
-
-
Speech recognition with Whisper on mobile
-
-
Whisper from OpenAI is a PyTorch speech recognition model. Whisper comes in a number of different size variants - the smallest, Whisper Tiny, is suitable to run on mobile devices. All components of the Whisper Tiny model (audio decoder, encoder, decoder, and text sequence generation) can be composed and exported to a single ONNX model using the Olive framework. To run this model as part of a mobile application, you can use ONNX Runtime Mobile, which supports Android, iOS, react-native, and MAUI/Xamarin.
-
-
ONNX Runtime Mobile supports hardware acceleration via NNAPI (on Android), CoreML (on iOS), and XNNPACK (both iOS and Android).
-
-
The relevant snippet of a example Android mobile app that performs speech transcription on short samples of audio is shown below:
-
-
-init {
- val env = OrtEnvironment.getEnvironment()
- val sessionOptions = OrtSession.SessionOptions()
- sessionOptions.registerCustomOpLibrary(OrtxPackage.getLibraryPath())
-
- session = env.createSession(modelBytes, sessionOptions)
-
- val nMels: Long = 80
- val nFrames: Long = 3000
-
- baseInputs = mapOf(
- "min_length" to createIntTensor(env, intArrayOf(1), tensorShape(1)),
- "max_length" to createIntTensor(env, intArrayOf(200), tensorShape(1)),
- "num_beams" to createIntTensor(env, intArrayOf(1), tensorShape(1)),
- "num_return_sequences" to createIntTensor(env, intArrayOf(1), tensorShape(1)),
- "length_penalty" to createFloatTensor(env, floatArrayOf(1.0f), tensorShape(1)),
- "repetition_penalty" to createFloatTensor(env, floatArrayOf(1.0f), tensorShape(1)),
- )
-}
-
-data class Result(val text: String, val inferenceTimeInMs: Long)
-
-fun run(audioTensor: OnnxTensor): Result {
- val inputs = mutableMapOf()
- baseInputs.toMap(inputs)
- inputs["audio_pcm"] = audioTensor
- val startTimeInMs = SystemClock.elapsedRealtime()
- val outputs = session.run(inputs)
- val elapsedTimeInMs = SystemClock.elapsedRealtime() - startTimeInMs
- val recognizedText = outputs.use {
- @Suppress("UNCHECKED_CAST")
- (outputs[0].value as Array>)[0][0]
- }
- return Result(recognizedText, elapsedTimeInMs)
-}
-
-
-
-
You can record a short audio clip to transcribe.
-
-
-
-
-
-
Train a model to recognize your voice on mobile
-
-
ONNX Runtime can also take a pre-trained model and adapt it to new data. It can do this on the edge - on mobile specifically where it is easy to record your voice, access your photos and other personalized data. Importantly, your data does not leave the device during training.
-
-
For example, you can train a PyTorch model to recognize just your own voice on your mobile phone, for authentication scenarios.
-
-
The PyTorch model is obtained from HuggingFace in your development environment, and extra layers are added to perform the speaker classification:
The model and other components necessary for training (a loss function to measure the quality of the model and an optimizer to instruct how the weights are adjusted during training) are exported with ONNX Runtime Training:
This set of artifacts is now ready to be loaded by the mobile app, shown here as iOS Swift code. The app asks the user for samples of their voice and the model is trained with the samples.
In this article we've shown why you would run PyTorch models on the edge and what aspects to consider. We also shared several examples with code that you can use for running state-of-the-art PyTorch models on the edge with ONNX Runtime. We also showed how ONNX Runtime was built for performance and cross-platform execution, making it the ideal way to run PyTorch models on the edge. Have fun running PyTorch models on the edge with ONNX Runtime!
-
You may have noticed that we didn't include a Llama2 example even though ONNX Runtime is optimized to run it. That's because the amazing Llama2 model deserves its own article, so stay tuned for that!
“With ONNX Runtime, Adobe Target got flexibility and standardization in one package: flexibility for our customers to train ML models in the frameworks of their choice, and standardization to robustly deploy those models at scale for fast inference, to deliver true, real-time personalized experiences.”
“The ONNX Runtime integration with AMD’s ROCm open software ecosystem helps our customers leverage the power of AMD Instinct GPUs to accelerate and scale their large machine learning models with flexibility across multiple frameworks.”
-
–Andrew Dieckmann, Corporate Vice President and General Manager, AMD Data Center GPU & Accelerated Processing
-
-
-
-
-
-
-
“Using ONNX Runtime, we have improved the inference performance of many computer vision (CV) and natural language processing (NLP) models trained by multiple deep learning frameworks. These are part of the Alipay production system. We plan to use ONNX Runtime as the high-performance inference backend for more deep learning models in broad applications, such as click-through rate prediction and cross-modal prediction.”
-
–Xiaoming Zhang, Head of Inference Team, Ant Group
-
-
-
-
-
-
-
“At CERN in the ATLAS experiment, we have integrated the C++ API of ONNX Runtime into our software framework: Athena. We are currently performing inferences using ONNX models especially in the reconstruction of electrons and muons. We are benefiting from its C++ compatibility, platform*-to-ONNX converters (* Keras, TensorFlow, PyTorch, etc) and its thread safety.”
-
–ATLAS Experiment team, CERN (European Organization for Nuclear Research)
-
-
-
-
-
-
-
“Building and deploying AI solutions to the cloud at scale is complex. With massive datasets and performance considerations, finding a harmonious balance is crucial. ONNX Runtime provided us with the flexibility to package a scikit-learn model built with Python, deploy it serverlessly to a Node.js environment, and run it in the cloud with impressive performance.”
-
–Matthew Leyburn, Software Engineer, Bazaarvoice
-
-
-
-
-
-
-
“ClearBlade’s integration of ONNX Runtime with our Enterprise loT and Edge Platforms enables customers and partners to build Al models using any industry Al tool they want to use. Using this solution, our customers can use the ONNX Runtime Go language APIs to seamlessly deploy any model to
- run on equipment in remote locations or on the factory floor!”
-
–Aaron Allsbrook, CTO & Founder, ClearBlade
-
-
-
-
-
-
-
“At Deezer, we use ONNX Runtime for machine learning powered features for music recommendations in our streaming service. ONNX Runtime's C API is easy to integrate with our software stack and enables us to run and deploy transformer models with great performance for real-time use cases.”
-
–Mathieu Morlon, Software Engineer, Deezer
-
-
-
-
-
-
-
“We integrate AI models in various markets and regulated industries using many stacks and frameworks, merging R&D and Ethics. With ONNX Runtime, we provide maximum performance and flexibility to use the customers' preferred technology, from cloud to embedded systems.”
-
–Mauro Bennici, AI Architect and AI Ethicist, Intelligenza Etica
-
-
-
-
-
-
-
“We use ONNX Runtime to easily deploy thousands of open-source state-of-the-art models in the Hugging Face model hub and accelerate private models for customers of the Accelerated Inference API on CPU and GPU.”
-
–Morgan Funtowicz, Machine Learning Engineer, Hugging Face
-
-
-
-
-
-
-
“ONNX Runtime powers many of our Natural Language Processing (NLP) and Computer Vision (CV) models that crunch the global media landscape in real-time. It is our go-to framework for scaling our production workload, providing important features ranging from built-in quantization tools to easy GPU and VNNI acceleration.”
-
–Viet Yen Nguyen, CTO, Hypefactors
-
-
-
-
-
-
-
“InFarm delivers machine-learning powered solutions for intelligent farming, running computer vision models on a variety of hardware, including on-premise GPU clusters, edge computing devices like NVIDIA Jetsons, and cloud-based CPU and GPU clusters. ONNX Runtime enables InFarm to standardise the model formats and outputs of models generated across multiple teams to simplify deployment while also providing the best performance on all hardware targets.”
-
–Ashley Walker, Chief Information and Technology Officer, InFarm
-
-
-
-
-
-
-
“We are excited to support ONNX Runtime on the Intel® Distribution of OpenVINO™. This accelerates machine learning inference across Intel hardware and gives developers the flexibility to choose the combination of Intel hardware that best meets their needs from CPU to VPU or FPGA.”
-
–Jonathan Ballon, Vice President and General Manager, Intel Internet of Things Group
-
-
-
-
-
-
-
“With customers around the globe, we’re seeing increased interest in deploying more effective models to power pricing solutions via ONNX Runtime. ONNX Runtime’s performance has given us the confidence to use this solution with our customers with more extreme transaction volume requirements.”
-
–Jason Coverston, Product Director, Navitaire
-
-
-
-
-
-
-
“ONNX Runtime enables our customers to easily apply NVIDIA TensorRT’s powerful optimizations to machine learning models, irrespective of the training framework, and deploy across NVIDIA GPUs and edge devices.”
-
– Kari Ann Briski, Sr. Director, Accelerated Computing Software and AI Product, NVIDIA
-
-
-
-
-
-
-
“The integration of ONNX Runtime into Apache OpenNLP 2.0 enables easy use of state-of-the-art Natural Language Processing (NLP) models in the Java ecosystem. For libraries and applications already using OpenNLP, such as Apache Lucene and Apache Solr, using ONNX Runtime via OpenNLP provides exciting new possibilities.”
-
–Jeff Zemerick, Search Relevance Engineer at OpenSource Connections and Chair of the Apache OpenNLP project
-
-
-
-
-
-
-
“The ONNX Runtime API for Java enables Java developers and Oracle customers to seamlessly consume and execute ONNX machine-learning models, while taking advantage of the expressive power, high performance, and scalability of Java.”
-
–Stephen Green, Director of Machine Learning Research Group, Oracle
-
-
-
-
-
-
-
“Using a common model and code base, the ONNX Runtime allows Peakspeed to easily flip between platforms to help our customers choose the most cost-effective solution based on their infrastructure and requirements.”
“ONNX Runtime provides us with a lightweight runtime that focuses on performance, yet allows our ML engineers to choose the best frameworks and models for the task at hand.”
“The mission of PTW is to guarantee radiation therapy safely. Bringing an AI model from research into the clinic can be a challenge, however. These are very different software and hardware environments. ONNX Runtime bridges the gap and allows us to choose the best possible tools for research and be sure deployment into any environment will just work.”
-
–Jan Weidner, Research Software Engineer, PTW Dosimetry
-
-
-
-
-
-
-
“ONNX Runtime underpins RedisAI's distinctive capability to run machine-learning and deep-learning model inference seamlessly inside of Redis. This integration allows data scientists to train models in their preferred ML framework (PyTorch, TensorFlow, etc), and serve those models from Redis for low-latency inference.”
-
–Sam Partee, Principal Engineer, Applied AI, Redis
-
-
-
-
-
-
-
“With support for ONNX Runtime, our customers and developers can cross the boundaries of the model training framework, easily deploy ML models in Rockchip NPU powered devices.”
-
–Feng Chen, Senior Vice President, Rockchip
-
-
-
-
-
-
-
“We needed a runtime engine to handle the transition from data science land to a high-performance production runtime system. ONNX Runtime (ORT) simply ‘just worked’. Having no previous experience with ORT, I was able to easily convert my models, and had prototypes running inference in multiple languages within just a few hours. ORT will be my go-to runtime engine for the foreseeable future.”
-
–Bill McCrary, Application Architect, Samtec
-
-
-
-
-
-
-
“The unique combination of ONNX Runtime and SAS Event Stream Processing changes the game for developers and systems integrators by supporting flexible pipelines and enabling them to target multiple hardware platforms for the same AI models without bundling and packaging changes. This is crucial considering the additional build and test effort saved on an ongoing basis.”
-
–Saurabh Mishra, Senior Manager, Product Management, Internet of Things, SAS
-
-
-
-
-
-
-
“Teradata provides a highly extensible framework that enables importation and inference of previously trained Machine Learning (ML) and Deep Learning (DL) models. ONNX Runtime enables us to expand the capabilities of Vantage Bring Your Own Model (BYOM) and gives data scientists more options for ML and DL models integration, inference and production deployment within Teradata Vantage ecosystem.”
-
–Michael Riordan, Director, Vantage Data Science and Analytics Products, Teradata
-
-
-
-
-
-
-
“ONNX Runtime’s simple C API with DirectML provider enabled Topaz Labs to add support for AMD GPUs and NVIDIA Tensor Cores in just a couple of days. Furthermore, our models load many times faster on GPU than any other frameworks. Even our larger models with about 100 million parameters load within seconds.”
-
–Suraj Raghuraman, Head of AI Engine, Topaz Labs
-
-
-
-
-
-
-
“We selected ONNX Runtime as the backend of Unreal Engine’s Neural Network Interface (NNI) plugin inference system because of its extensibility to support the platforms that Unreal Engine runs on, while enabling ML practitioners to develop ML models in the frameworks of their choice. NNI evaluates neural networks in real time in Unreal Engine and acts as the foundation for game developers to use and deploy ML models to solve many development challenges, including animation, ML-based AI, camera tracking, and more.”
-
–Francisco Vicente Carrasco, Research Engineering Lead, Epic Games
-
-
-
-
-
-
-
“At the USDA we use ONNX Runtime in GuideMaker, a program we developed to design pools of guide RNAs needed for large-scale gene editing experiments with CRISPR-Cas. ONNX allowed us to make an existing model more interoperable and ONNX Runtime speeds up predictions of guide RNA binding.”
-
–Adam Rivers, Computational Biologist, United States Department of Agriculture, Agricultural Research Service
-
-
-
-
-
-
-
“ONNX Runtime has vastly increased Vespa.ai’s capacity for evaluating large models, both in performance and model types we support.”
-
–Lester Solbakken, Principal Engineer, Vespa.ai, Verizon Media
-
-
-
-
-
-
-
“ONNX Runtime has been very helpful to us at Writer in optimizing models for production. It lets us deploy more powerful models and still deliver results to our customers with the latency they expect.”
-
–Dave Buchanan, Director of AI and NLP, Writer
-
-
-
-
-
-
-
“Xilinx is excited that Microsoft has announced Vitis™ AI interoperability and runtime support for ONNX Runtime, enabling developers to deploy machine learning models for inference to FPGA IaaS such as Azure NP series VMs and Xilinx edge devices.”
-
–Sudip Nag, Corporate Vice President, Software & AI Products, Xilinx
-import onnxruntime as ort
-
-# Load the model and create InferenceSession
-model_path = "path/to/your/onnx/model"
-session = ort.InferenceSession(model_path)
-
-# Load and preprocess the input image inputTensor
-...
-
-# Run inference
-outputs = session.run(None, {"input": inputTensor})
-print(outputs)
- Learn more
-
-
-
-
-import ai.onnxruntime.*;
-
-// Load the model and create InferenceSession
-String modelPath = "path/to/your/onnx/model";
-OrtEnvironment env = OrtEnvironment.getEnvironment();
-OrtSession session = env.createSession(modelPath);
-
-// Load and preprocess the input image inputTensor
-...
-
-// Run inference
-OrtSession.Result outputs = session.run(inputTensor);
-System.out.println(outputs.get(0).getTensor().getFloatBuffer().get(0));
- Learn more
-
-
-
-
-import * as ort from "onnxruntime-web";
-
-// Load the model and create InferenceSession
-const modelPath = "path/to/your/onnx/model";
-const session = await ort.InferenceSession.create(modelPath);
-
-// Load and preprocess the input image to inputTensor
-...
-
-// Run inference
-const outputs = await session.run({ input: inputTensor });
-console.log(outputs);
- Learn more
-
-
-
-
-#include "onnxruntime_cxx_api.h"
-
-// Load the model and create InferenceSession
-Ort::Env env;
-std::string model_path = "path/to/your/onnx/model";
-Ort::Session session(env, model_path, Ort::SessionOptions{ nullptr });
-
-// Load and preprocess the input image to
-// inputTensor, inputNames, and outputNames
-...
-
-// Run inference
-std::vector outputTensors =
- session.Run(Ort::RunOptions{nullptr},
- inputNames.data(),
- &inputTensor,
- inputNames.size(),
- outputNames.data(),
- outputNames.size());
-
-const float* outputDataPtr = outputTensors[0].GetTensorMutableData();
-std::cout << outputDataPtr[0] << std::endl;
- Learn more
-
-
-
-
-using Microsoft.ML.OnnxRuntime;
-
-// Load the model and create InferenceSession
-string model_path = "path/to/your/onnx/model";
-var session = new InferenceSession(model_path);
-
-// Load and preprocess the input image to inputTensor
-...
-
-// Run inference
-var outputs = session.Run(inputTensor).ToList();
-Console.WriteLine(outputs[0].AsTensor()[0]);
- Learn more
-
Select the configuration you want to use and run the corresponding
- installation script.
- ONNX Runtime supports a variety of hardware and architectures to fit any need.
-
-
-
-
-
Optimize
- Inferencing
-
Optimize
- Training
-
-
-
-
-
-
-
Platform
-
- Platform list contains six items
-
-
-
-
-
- Windows
-
-
- Linux
-
-
- Mac
-
-
- Android
-
-
- iOS
-
-
- Web Browser
-
-
-
-
-
-
-
-
API
-
- API list contains eight items
-
-
-
-
-
- Python
-
-
- C++
-
-
- C#
-
-
- C
-
-
- Java
-
-
- JS
-
-
- Obj-C
-
-
- WinRT
-
-
-
-
-
-
-
-
- Architecture
-
-
- Architecture list contains five items
-
-
-
-
-
- X64
-
-
- X86
-
-
- ARM64
-
-
- ARM32
-
-
- IBM Power
-
-
-
-
-
-
-
-
- Hardware Acceleration
-
-
- Hardware Acceleration list contains seventeen
- items
-
-
-
-
-
- Default CPU
-
-
- CoreML
-
-
- CUDA
-
-
- DirectML
-
-
- MIGraphX
-
-
- NNAPI
-
-
- oneDNN
-
-
- OpenVINO
-
-
- ROCm
-
-
- QNN
-
-
- TensorRT
-
-
- ACL (Preview)
-
-
- ArmNN (Preview)
-
-
- Azure (Preview)
-
-
- CANN (Preview)
-
-
- Rockchip NPU (Preview)
-
-
- TVM (Preview)
-
-
- Vitis AI (Preview)
-
-
- XNNPACK (Preview)
-
-
-
-
-
-
-
-
- Installation Instructions
-
-
-
-
-
-
- Please select a combination of resources
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Scenario
-
- Scenario list contains two items
-
-
-
-
-
- Large Model Training
-
-
- On-Device Training
-
-
-
-
-
-
-
-
Platform
-
- Platform list contains five items
-
-
-
-
-
- Linux
-
-
- Windows
-
-
- Mac
-
-
- Android
-
-
- iOS
-
-
-
-
-
-
-
-
API
-
- API list contains six items
-
-
-
-
-
- Python
-
-
- C
-
-
- C++
-
-
- C#
-
-
- Java
-
-
- Obj-C
-
-
-
-
-
-
-
-
- Hardware Acceleration
-
-
- Hardware Acceleration list contains three items
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
diff --git a/js/blogs.json b/js/blogs.json
deleted file mode 100644
index 9b9d00ac308f0..0000000000000
--- a/js/blogs.json
+++ /dev/null
@@ -1,166 +0,0 @@
-{
- "blogs": [
- {
- "title": "Run PyTorch models on the edge",
- "date": "October 12th, 2023",
- "blurb": "Everything you need to know about running PyTorch models on the edge with ONNX Runtime.",
- "link": "./pytorch-on-the-edge"
- },
- {
- "title": "Accelerating over 130,000 Hugging Face models with ONNX Runtime",
- "date": "October 4th, 2023",
- "blurb": "Learn how ONNX Runtime helps users accelerate open source machine learning models from Hugging Face.",
- "link": "https://cloudblogs.microsoft.com/opensource/2023/10/04/accelerating-over-130000-hugging-face-models-with-onnx-runtime/"
- },
- {
- "title": "On-Device Training with ONNX Runtime: A deep dive",
- "date": "July 5th, 2023",
- "blurb": "This blog presents technical details of On-Device training with ONNX Runtime. It explains how On-Device Training works and what are the different steps and artifacts involved in the training process. This information will help you train your models on edge devices.",
- "link": "https://cloudblogs.microsoft.com/opensource/2023/07/05/on-device-training-with-onnx-runtime-a-deep-dive/"
- },
- {
- "title": "Build and deploy fast and portable speech recognition applications with ONNX Runtime and Whisper",
- "date": "June 7th, 2023",
- "blurb": "Learn how ONNX Runtime accelerates Whisper and makes it easy to deploy on desktop, mobile, in the cloud, and even in the browser.",
- "link": "https://medium.com/microsoftazure/build-and-deploy-fast-and-portable-speech-recognition-applications-with-onnx-runtime-and-whisper-5bf0969dd56b"
- },
- {
- "title": "On-Device Training: Efficient training on the edge with ONNX Runtime",
- "date": "May 31st, 2023",
- "blurb": "This blog introduces On-Device Training to enable training models on edge devices with the data available on-edge. It extends ORT Inference on edge to include federated learning and personalization scenarios.",
- "link": "https://cloudblogs.microsoft.com/opensource/2023/05/31/on-device-training-efficient-training-on-the-edge-with-onnx-runtime/"
- },
- {
- "title": "Unlocking the end-to-end Windows AI developer experience using ONNX runtime and Olive",
- "date": "May 23th, 2023",
- "blurb": "This blog reviews the new capabilities of ONNX Runtime and the Olive toolchain to support hybrid inferencing, NPU EPs, and hardware aware model optimizations on Windows and other platforms",
- "link": "https://blogs.windows.com/windowsdeveloper/2023/05/23/unlocking-the-end-to-end-windows-ai-developer-experience-using-onnx-runtime-and-olive"
- },
- {
- "title": "Bringing the power of AI to Windows 11 - unlocking a new era of productivity for customers and developers with Windows Copilot and Dev Home",
- "date": "May 23th, 2023",
- "blurb": "This blog reviews AI in Windows 11, including ONNX Runtime as the gateway to Windows AI and new ONNX Runtime capabilities on Windows",
- "link": "https://blogs.windows.com/windowsdeveloper/2023/05/23/bringing-the-power-of-ai-to-windows-11-unlocking-a-new-era-of-productivity-for-customers-and-developers-with-windows-copilot-and-dev-home"
- },
- {
- "title": "Optimize DirectML performance with Olive",
- "date": "May 23th, 2023",
- "blurb": "This blog shows how to use Olive to optimize models for DML EP in ONNX Runtime",
- "link": "https://devblogs.microsoft.com/windowsai/optimize-directml-performance-with-olive"
- },
- {
- "title": "DirectML ❤ Stable Diffusion",
- "date": "May 23th, 2023",
- "blurb": "This blog shows how to use the Stable Diffusion model on DML EP using Olive to optimize the Stable Diffusion model",
- "link": "https://devblogs.microsoft.com/windowsai/dml-stable-diffusion/"
- },
- {
- "title": "Accelerating Stable Diffusion Inference with ONNX Runtime",
- "date": "May 10th, 2023",
- "blurb": "This blog shows how to accelerate the Stable Diffusion models from Hugging Face on NVIDIA and AMD GPUs with ONNX Runtime. It includes benchmark results obtained on A100 and RTX3060 and MI250X.",
- "link": "https://medium.com/microsoftazure/accelerating-stable-diffusion-inference-with-onnx-runtime-203bd7728540"
- },
- {
- "title": "Azure Container for PyTorch is now Generally Available in Azure Machine Learning!",
- "date": "March 22nd, 2023",
- "blurb": "ACPT provides a ready-to-use distributed training environment for users to run on the latest multi-node GPU infrastructure offered in Azure. With Nebula, a new fast checkpointing capability in ACPT, you can save your checkpoints 1000 times faster with a simple API that works asynchronously with your training process.",
- "link": "https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/azure-container-for-pytorch-is-now-generally-available-in-azure/ba-p/3774616"
- },
- {
- "title": "High-performance deep learning in Oracle Cloud with ONNX Runtime",
- "date": "March 15th, 2023",
- "blurb": "Enabling scenarios through the usage of Deep Neural Network (DNN) models is critical to our AI strategy at Oracle, and our Cloud AI Services team has built a solution to serve DNN models for customers in the healthcare sector. In this blog post, we’ll share challenges our team faced, and how ONNX Runtime solves these as the backbone of success for high-performance inferencing.",
- "link": "https://cloudblogs.microsoft.com/opensource/2023/03/15/high-performance-deep-learning-in-oracle-cloud-with-onnx-runtime/"
- },
- {
- "title": "Inference Stable Diffusion with C# and ONNX Runtime",
- "date": "March 9th, 2023",
- "blurb": "In this tutorial we will learn how to do inferencing for the popular Stable Diffusion deep learning model in C#. Stable Diffusion models take a text prompt and create an image that represents the text. ",
- "link": "https://onnxruntime.ai/docs/tutorials/csharp/stable-diffusion-csharp.html"
- },
- {
- "title": "Video super resolution in Microsoft Edge",
- "date": "March 8th, 2023",
- "blurb": "VSR in Microsoft Edge builds on top of ONNX Runtime and DirectML making our solution portable across GPU vendors and allowing VSR to be available to more users. Additional graphics cards which support these technologies and have sufficient computing power will receive support in the future. The ONNX Runtime and DirectML teams have fine-tuned their technology over many years, resulting in VSR making the most of the performance and capabilities of your graphics card’s processing power.",
- "link": "https://blogs.windows.com/msedgedev/2023/03/08/video-super-resolution-in-microsoft-edge/"
- },
- {
- "title": "OctoML drives down production AI inference costs at Microsoft through new integration with ONNX Runtime ecosystem",
- "date": "March 2nd, 2023",
- "blurb": "Over the past year, OctoML engineers worked closely with Watch For to design and implement the TVM Execution Provider (EP) for ONNX Runtime - bringing the model optimization potential of Apache TVM to all ONNX Runtime users. This builds upon the collaboration we began in 2021, to bring the benefits of TVM’s code generation and flexible quantization support to production scale at Microsoft.",
- "link": "https://octoml.ai/blog/octoml-drives-down-costs-at-microsoft-through-new-integration-with-onnx-runtime/"
- },
- {
- "title": "Performant on-device inferencing with ONNX Runtime",
- "date": "February 8th, 2023",
- "blurb": "On-device machine learning model serving is a difficult task, especially given the limited bandwidth of early-stage startups. This guest post from the team at Pieces shares the problems and solutions evaluated for their on-device model serving stack and how ONNX Runtime serves as their backbone of success.",
- "link": "https://cloudblogs.microsoft.com/opensource/2023/02/08/performant-on-device-inferencing-with-onnx-runtime/"
- },
- {
- "title": "Improve BERT inference speed by combining the power of Optimum, OpenVINO™, ONNX Runtime, and Azure",
- "date": "January 25th, 2023",
- "blurb": "In this blog, we will discuss one of the ways to make huge models like BERT smaller and faster with OpenVINO™ Neural Networks Compression Framework (NNCF) and ONNX Runtime with OpenVINO™ Execution Provider through Azure Machine Learning.",
- "link": "https://cloudblogs.microsoft.com/opensource/2023/01/25/improve-bert-inference-speed-by-combining-the-power-of-optimum-openvino-onnx-runtime-and-azure/"
- },
- {
- "title": "Optimum + ONNX Runtime: Easier, Faster training for your Hugging Face models",
- "date": "January 24th, 2023",
- "blurb": "Hugging Face’s Optimum library, through its integration with ONNX Runtime for training, provides an open solution to improve training times by 35% or more for many popular Hugging Face models. We present details of both Hugging Face Optimum and the ONNX Runtime Training ecosystem, with performance numbers highlighting the benefits of using the Optimum library.",
- "link": "https://huggingface.co/blog/optimum-onnxruntime-training/"
- },
- {
- "title": "Live demos of machine learning models with ONNX and Hugging Face Spaces",
- "date": "June 6, 2022",
- "blurb": "Choosing which machine learning model to use, sharing a model with a colleague, and quickly trying out a model are all reasons why you may find yourself wanting to quickly run inference on a model. You can configure your environment and download Jupyter notebooks, but it would be nicer if there was a way to run a model with even less effort...",
- "link": "https://cloudblogs.microsoft.com/opensource/2022/06/06/live-demos-of-machine-learning-models-with-onnx-and-hugging-face-spaces/"
- },
- {
- "title": "Optimizing and deploying transformer INT8 inference with ONNX Runtime-TensorRT on NVIDIA GPUs",
- "date": "May 2, 2022",
- "blurb": "Transformer-based models have revolutionized the natural language processing (NLP) domain. Ever since its inception, transformer architecture has been integrated into models like Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformer (GPT) for performing tasks such as text generation or summarization and question and answering to name a few...",
- "link": "https://cloudblogs.microsoft.com/opensource/2022/05/02/optimizing-and-deploying-transformer-int8-inference-with-onnx-runtime-tensorrt-on-nvidia-gpus/"
- },
- {
- "title": "Scaling-up PyTorch inference: Serving billions of daily NLP inferences with ONNX Runtime",
- "date": "April 19, 2022",
- "blurb": "Scale, performance, and efficient deployment of state-of-the-art Deep Learning models are ubiquitous challenges as applied machine learning grows across the industry. We’re happy to see that the ONNX Runtime Machine Learning model inferencing solution we’ve built and use in high-volume Microsoft products and services also resonates with our open source community, enabling new capabilities that drive content relevance and productivity...",
- "link": "https://cloudblogs.microsoft.com/opensource/2022/04/19/scaling-up-pytorch-inference-serving-billions-of-daily-nlp-inferences-with-onnx-runtime/"
- },
- {
- "title": "Add AI to mobile applications with Xamarin and ONNX Runtime",
- "date": "December 14, 2021",
- "blurb": "ONNX Runtime now supports building mobile applications in C# with Xamarin. Support for Android and iOS is included in the ONNX Runtime release 1.10 NuGet package. This enables C# developers to build AI applications for Android and iOS to execute ONNX models on mobile devices with ONNX Runtime...",
- "link": "https://cloudblogs.microsoft.com/opensource/2021/12/14/add-ai-to-mobile-applications-with-xamarin-and-onnx-runtime/"
- },
- {
- "title": "ONNX Runtime Web—running your machine learning model in browser",
- "date": "September 2, 2021",
- "blurb": "We are introducing ONNX Runtime Web (ORT Web), a new feature in ONNX Runtime to enable JavaScript developers to run and deploy machine learning models in browsers. It also helps enable new classes of on-device computation. ORT Web will be replacing the soon to be deprecated onnx.js...",
- "link": "https://cloudblogs.microsoft.com/opensource/2021/09/02/onnx-runtime-web-running-your-machine-learning-model-in-browser/"
- },
- {
- "title": "Accelerate PyTorch transformer model training with ONNX Runtime – a deep dive",
- "date": "July 13, 2021",
- "blurb": "ONNX Runtime (ORT) for PyTorch accelerates training large scale models across multiple GPUs with up to 37% increase in training throughput over PyTorch and up to 86% speed up when combined with DeepSpeed...",
- "link": "https://techcommunity.microsoft.com/t5/azure-ai/accelerate-pytorch-transformer-model-training-with-onnx-runtime/ba-p/2540471"
- },
- {
- "title": "Accelerate PyTorch training with torch-ort",
- "date": "July 13, 2021",
- "blurb": "With a simple change to your PyTorch training script, you can now speed up training large language models with torch_ort.ORTModule, running on the target hardware of your choice. Training deep learning models requires ever-increasing compute and memory resources. Today we release torch_ort.ORTModule, to accelerate distributed training of PyTorch models, reducing the time and resources needed for training...",
- "link": "https://cloudblogs.microsoft.com/opensource/2021/07/13/accelerate-pytorch-training-with-torch-ort/"
- },
- {
- "title": "ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm™ Open Software Platform",
- "date": "July 13, 2021",
- "blurb": "ONNX Runtime is an open-source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms. Today, we are excited to announce a preview version of ONNX Runtime in release 1.8.1 featuring support for AMD Instinct™ GPUs facilitated by the AMD ROCm™ open software platform...",
- "link": "https://cloudblogs.microsoft.com/opensource/2021/07/13/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform/"
- },
- {
- "title": "Journey to optimize large scale transformer model inference with ONNX Runtime",
- "date": "June 30, 2021",
- "blurb": "Large-scale transformer models, such as GPT-2 and GPT-3, are among the most useful self-supervised transformer language models for natural language processing tasks such as language translation, question answering, passage summarization, text generation, and so on...",
- "link": "https://cloudblogs.microsoft.com/opensource/2021/06/30/journey-to-optimize-large-scale-transformer-model-inference-with-onnx-runtime/"
- }
-]
-}
diff --git a/js/bootstrap.min.js b/js/bootstrap.min.js
deleted file mode 100644
index c4c0d1f95cd3c..0000000000000
--- a/js/bootstrap.min.js
+++ /dev/null
@@ -1,7 +0,0 @@
-/*!
- * Bootstrap v4.3.1 (https://getbootstrap.com/)
- * Copyright 2011-2019 The Bootstrap Authors (https://github.com/twbs/bootstrap/graphs/contributors)
- * Licensed under MIT (https://github.com/twbs/bootstrap/blob/master/LICENSE)
- */
-!function(t,e){"object"==typeof exports&&"undefined"!=typeof module?e(exports,require("jquery"),require("popper.js")):"function"==typeof define&&define.amd?define(["exports","jquery","popper.js"],e):e((t=t||self).bootstrap={},t.jQuery,t.Popper)}(this,function(t,g,u){"use strict";function i(t,e){for(var n=0;nthis._items.length-1||t<0))if(this._isSliding)g(this._element).one(Q.SLID,function(){return e.to(t)});else{if(n===t)return this.pause(),void this.cycle();var i=ndocument.documentElement.clientHeight;!this._isBodyOverflowing&&t&&(this._element.style.paddingLeft=this._scrollbarWidth+"px"),this._isBodyOverflowing&&!t&&(this._element.style.paddingRight=this._scrollbarWidth+"px")},t._resetAdjustments=function(){this._element.style.paddingLeft="",this._element.style.paddingRight=""},t._checkScrollbar=function(){var t=document.body.getBoundingClientRect();this._isBodyOverflowing=t.left+t.right
',trigger:"hover focus",title:"",delay:0,html:!1,selector:!1,placement:"top",offset:0,container:!1,fallbackPlacement:"flip",boundary:"scrollParent",sanitize:!0,sanitizeFn:null,whiteList:Ee},je="show",He="out",Re={HIDE:"hide"+De,HIDDEN:"hidden"+De,SHOW:"show"+De,SHOWN:"shown"+De,INSERTED:"inserted"+De,CLICK:"click"+De,FOCUSIN:"focusin"+De,FOCUSOUT:"focusout"+De,MOUSEENTER:"mouseenter"+De,MOUSELEAVE:"mouseleave"+De},xe="fade",Fe="show",Ue=".tooltip-inner",We=".arrow",qe="hover",Me="focus",Ke="click",Qe="manual",Be=function(){function i(t,e){if("undefined"==typeof u)throw new TypeError("Bootstrap's tooltips require Popper.js (https://popper.js.org/)");this._isEnabled=!0,this._timeout=0,this._hoverState="",this._activeTrigger={},this._popper=null,this.element=t,this.config=this._getConfig(e),this.tip=null,this._setListeners()}var t=i.prototype;return t.enable=function(){this._isEnabled=!0},t.disable=function(){this._isEnabled=!1},t.toggleEnabled=function(){this._isEnabled=!this._isEnabled},t.toggle=function(t){if(this._isEnabled)if(t){var e=this.constructor.DATA_KEY,n=g(t.currentTarget).data(e);n||(n=new this.constructor(t.currentTarget,this._getDelegateConfig()),g(t.currentTarget).data(e,n)),n._activeTrigger.click=!n._activeTrigger.click,n._isWithActiveTrigger()?n._enter(null,n):n._leave(null,n)}else{if(g(this.getTipElement()).hasClass(Fe))return void this._leave(null,this);this._enter(null,this)}},t.dispose=function(){clearTimeout(this._timeout),g.removeData(this.element,this.constructor.DATA_KEY),g(this.element).off(this.constructor.EVENT_KEY),g(this.element).closest(".modal").off("hide.bs.modal"),this.tip&&g(this.tip).remove(),this._isEnabled=null,this._timeout=null,this._hoverState=null,(this._activeTrigger=null)!==this._popper&&this._popper.destroy(),this._popper=null,this.element=null,this.config=null,this.tip=null},t.show=function(){var e=this;if("none"===g(this.element).css("display"))throw new Error("Please use show on visible elements");var t=g.Event(this.constructor.Event.SHOW);if(this.isWithContent()&&this._isEnabled){g(this.element).trigger(t);var n=_.findShadowRoot(this.element),i=g.contains(null!==n?n:this.element.ownerDocument.documentElement,this.element);if(t.isDefaultPrevented()||!i)return;var o=this.getTipElement(),r=_.getUID(this.constructor.NAME);o.setAttribute("id",r),this.element.setAttribute("aria-describedby",r),this.setContent(),this.config.animation&&g(o).addClass(xe);var s="function"==typeof this.config.placement?this.config.placement.call(this,o,this.element):this.config.placement,a=this._getAttachment(s);this.addAttachmentClass(a);var l=this._getContainer();g(o).data(this.constructor.DATA_KEY,this),g.contains(this.element.ownerDocument.documentElement,this.tip)||g(o).appendTo(l),g(this.element).trigger(this.constructor.Event.INSERTED),this._popper=new u(this.element,o,{placement:a,modifiers:{offset:this._getOffset(),flip:{behavior:this.config.fallbackPlacement},arrow:{element:We},preventOverflow:{boundariesElement:this.config.boundary}},onCreate:function(t){t.originalPlacement!==t.placement&&e._handlePopperPlacementChange(t)},onUpdate:function(t){return e._handlePopperPlacementChange(t)}}),g(o).addClass(Fe),"ontouchstart"in document.documentElement&&g(document.body).children().on("mouseover",null,g.noop);var c=function(){e.config.animation&&e._fixTransition();var t=e._hoverState;e._hoverState=null,g(e.element).trigger(e.constructor.Event.SHOWN),t===He&&e._leave(null,e)};if(g(this.tip).hasClass(xe)){var h=_.getTransitionDurationFromElement(this.tip);g(this.tip).one(_.TRANSITION_END,c).emulateTransitionEnd(h)}else c()}},t.hide=function(t){var e=this,n=this.getTipElement(),i=g.Event(this.constructor.Event.HIDE),o=function(){e._hoverState!==je&&n.parentNode&&n.parentNode.removeChild(n),e._cleanTipClass(),e.element.removeAttribute("aria-describedby"),g(e.element).trigger(e.constructor.Event.HIDDEN),null!==e._popper&&e._popper.destroy(),t&&t()};if(g(this.element).trigger(i),!i.isDefaultPrevented()){if(g(n).removeClass(Fe),"ontouchstart"in document.documentElement&&g(document.body).children().off("mouseover",null,g.noop),this._activeTrigger[Ke]=!1,this._activeTrigger[Me]=!1,this._activeTrigger[qe]=!1,g(this.tip).hasClass(xe)){var r=_.getTransitionDurationFromElement(n);g(n).one(_.TRANSITION_END,o).emulateTransitionEnd(r)}else o();this._hoverState=""}},t.update=function(){null!==this._popper&&this._popper.scheduleUpdate()},t.isWithContent=function(){return Boolean(this.getTitle())},t.addAttachmentClass=function(t){g(this.getTipElement()).addClass(Ae+"-"+t)},t.getTipElement=function(){return this.tip=this.tip||g(this.config.template)[0],this.tip},t.setContent=function(){var t=this.getTipElement();this.setElementContent(g(t.querySelectorAll(Ue)),this.getTitle()),g(t).removeClass(xe+" "+Fe)},t.setElementContent=function(t,e){"object"!=typeof e||!e.nodeType&&!e.jquery?this.config.html?(this.config.sanitize&&(e=Se(e,this.config.whiteList,this.config.sanitizeFn)),t.html(e)):t.text(e):this.config.html?g(e).parent().is(t)||t.empty().append(e):t.text(g(e).text())},t.getTitle=function(){var t=this.element.getAttribute("data-original-title");return t||(t="function"==typeof this.config.title?this.config.title.call(this.element):this.config.title),t},t._getOffset=function(){var e=this,t={};return"function"==typeof this.config.offset?t.fn=function(t){return t.offsets=l({},t.offsets,e.config.offset(t.offsets,e.element)||{}),t}:t.offset=this.config.offset,t},t._getContainer=function(){return!1===this.config.container?document.body:_.isElement(this.config.container)?g(this.config.container):g(document).find(this.config.container)},t._getAttachment=function(t){return Pe[t.toUpperCase()]},t._setListeners=function(){var i=this;this.config.trigger.split(" ").forEach(function(t){if("click"===t)g(i.element).on(i.constructor.Event.CLICK,i.config.selector,function(t){return i.toggle(t)});else if(t!==Qe){var e=t===qe?i.constructor.Event.MOUSEENTER:i.constructor.Event.FOCUSIN,n=t===qe?i.constructor.Event.MOUSELEAVE:i.constructor.Event.FOCUSOUT;g(i.element).on(e,i.config.selector,function(t){return i._enter(t)}).on(n,i.config.selector,function(t){return i._leave(t)})}}),g(this.element).closest(".modal").on("hide.bs.modal",function(){i.element&&i.hide()}),this.config.selector?this.config=l({},this.config,{trigger:"manual",selector:""}):this._fixTitle()},t._fixTitle=function(){var t=typeof this.element.getAttribute("data-original-title");(this.element.getAttribute("title")||"string"!==t)&&(this.element.setAttribute("data-original-title",this.element.getAttribute("title")||""),this.element.setAttribute("title",""))},t._enter=function(t,e){var n=this.constructor.DATA_KEY;(e=e||g(t.currentTarget).data(n))||(e=new this.constructor(t.currentTarget,this._getDelegateConfig()),g(t.currentTarget).data(n,e)),t&&(e._activeTrigger["focusin"===t.type?Me:qe]=!0),g(e.getTipElement()).hasClass(Fe)||e._hoverState===je?e._hoverState=je:(clearTimeout(e._timeout),e._hoverState=je,e.config.delay&&e.config.delay.show?e._timeout=setTimeout(function(){e._hoverState===je&&e.show()},e.config.delay.show):e.show())},t._leave=function(t,e){var n=this.constructor.DATA_KEY;(e=e||g(t.currentTarget).data(n))||(e=new this.constructor(t.currentTarget,this._getDelegateConfig()),g(t.currentTarget).data(n,e)),t&&(e._activeTrigger["focusout"===t.type?Me:qe]=!1),e._isWithActiveTrigger()||(clearTimeout(e._timeout),e._hoverState=He,e.config.delay&&e.config.delay.hide?e._timeout=setTimeout(function(){e._hoverState===He&&e.hide()},e.config.delay.hide):e.hide())},t._isWithActiveTrigger=function(){for(var t in this._activeTrigger)if(this._activeTrigger[t])return!0;return!1},t._getConfig=function(t){var e=g(this.element).data();return Object.keys(e).forEach(function(t){-1!==Oe.indexOf(t)&&delete e[t]}),"number"==typeof(t=l({},this.constructor.Default,e,"object"==typeof t&&t?t:{})).delay&&(t.delay={show:t.delay,hide:t.delay}),"number"==typeof t.title&&(t.title=t.title.toString()),"number"==typeof t.content&&(t.content=t.content.toString()),_.typeCheckConfig(be,t,this.constructor.DefaultType),t.sanitize&&(t.template=Se(t.template,t.whiteList,t.sanitizeFn)),t},t._getDelegateConfig=function(){var t={};if(this.config)for(var e in this.config)this.constructor.Default[e]!==this.config[e]&&(t[e]=this.config[e]);return t},t._cleanTipClass=function(){var t=g(this.getTipElement()),e=t.attr("class").match(Ne);null!==e&&e.length&&t.removeClass(e.join(""))},t._handlePopperPlacementChange=function(t){var e=t.instance;this.tip=e.popper,this._cleanTipClass(),this.addAttachmentClass(this._getAttachment(t.placement))},t._fixTransition=function(){var t=this.getTipElement(),e=this.config.animation;null===t.getAttribute("x-placement")&&(g(t).removeClass(xe),this.config.animation=!1,this.hide(),this.show(),this.config.animation=e)},i._jQueryInterface=function(n){return this.each(function(){var t=g(this).data(Ie),e="object"==typeof n&&n;if((t||!/dispose|hide/.test(n))&&(t||(t=new i(this,e),g(this).data(Ie,t)),"string"==typeof n)){if("undefined"==typeof t[n])throw new TypeError('No method named "'+n+'"');t[n]()}})},s(i,null,[{key:"VERSION",get:function(){return"4.3.1"}},{key:"Default",get:function(){return Le}},{key:"NAME",get:function(){return be}},{key:"DATA_KEY",get:function(){return Ie}},{key:"Event",get:function(){return Re}},{key:"EVENT_KEY",get:function(){return De}},{key:"DefaultType",get:function(){return ke}}]),i}();g.fn[be]=Be._jQueryInterface,g.fn[be].Constructor=Be,g.fn[be].noConflict=function(){return g.fn[be]=we,Be._jQueryInterface};var Ve="popover",Ye="bs.popover",ze="."+Ye,Xe=g.fn[Ve],$e="bs-popover",Ge=new RegExp("(^|\\s)"+$e+"\\S+","g"),Je=l({},Be.Default,{placement:"right",trigger:"click",content:"",template:'
If you're external to Microsoft and the issue contains information that cannot be disclosed publicly, use the following link to template an email, and send it to the below DRI:
-
-
-
-
-
-
\ No newline at end of file
diff --git a/ort-at-microsoft.html b/ort-at-microsoft.html
deleted file mode 100644
index b359591836c43..0000000000000
--- a/ort-at-microsoft.html
+++ /dev/null
@@ -1,126 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ONNX Runtime | ORT at Microsoft
-
-
-
-
-
-
-
-
-
- Skip to main content
-
-
-
-
-
-
-
-
-
-
-
-
ONNX Runtime usage at Microsoft
- ONNX Runtime powers Machine Learning inferencing for most of Microsoft's products and services,
- providing high performance and deployment versatility to support a large range of device types
- across cloud, mobile, and edge.
-
Run PyTorch models on cloud, desktop, mobile,
- IoT, and even in the browser
-
-
-
-
-
-
-
-
-
-
-
-
-
Boost performance
-
Accelerate PyTorch models to improve user
- experience and reduce costs
-
-
-
-
-
-
-
-
-
-
-
-
-
Improve time to market
-
Used by Microsoft and many others for their
- production PyTorch workloads
-
-
-
-
-
-
-
-
- Please help us improve ONNX Runtime by participating in our customer survey.
-
-
-
-
-
-
-
-
-
-
-
-
Native support in PyTorch
-
PyTorch includes support for ONNX through the torch.onnx APIs to simplify exporting your PyTorch model to
- the portable ONNX format.
- The ONNX Runtime team maintains these exporter APIs to ensure a high level of
- compatibility with PyTorch models.
-
Training PyTorch models requires Python but that can be a significant obstacle to
- deploying PyTorch models to many production environments, especially Android and
- iOS mobile devices.
- ONNX Runtime is designed for production and provides APIs in C/C++, C#, Java,
- and Objective-C, helping create a bridge from your PyTorch training environment
- to a successful PyTorch production deployment.
-
Better performance can help improve your user experience and lower your operating
- costs.
- A wide range of models from computer vision (ResNet, MobileNet, Inception, YOLO,
- super resolution, etc) to speech and NLP (BERT, RoBERTa, GPT-2, T5, etc) can
- benefit from ONNX Runtime's optimized performance.
- The ONNX Runtime team regularly benchmarks and optimizes top models for
- performance.
- ONNX Runtime also integrates with top hardware accelerator libraries like
- TensorRT and OpenVINO so you can get the best performance on the hardware
- available while using the same common APIs across all your target platforms.
-
Development agility is a key factor in overall costs.
- ONNX Runtime was built on the experience of taking PyTorch models to production
- in high scale services like Microsoft Office, Bing, and Azure. It used to take
- weeks and months to take a model from R&D to production.
- With ONNX Runtime, models can be ready to be deployed at scale in hours or days.
-
+ Most modern ML models are developed with PyTorch. The agility and flexibility that PyTorch
+ provides for creating and training models has made it the most popular deep learning
+ framework today. The typical workflow is to train these models in the cloud and run them
+ from the cloud as well. However, many scenarios are arising that make it more attractive -
+ or in some cases, required - to run locally on device. These include:
+
+
+
+ Avoiding network round-trips to the cloud (for example in audio and video processing)
+
+
Keeping user data on device (for privacy protection or regulatory requirements)
+
+ High cost of cloud resources (especially when device capabilities are underutilized)
+
+
Application requirements to operate without internet connectivity
+
+
+
+
+
+ In this article, we'll demystify running PyTorch models on the edge. We define 'edge' as
+ anywhere that is outside of the cloud, ranging from large, well-resourced personal computers
+ to small footprint devices such as mobile phones. This has been a challenging task to
+ accomplish in the past, but new advances in model optimization and software like
+ ONNX Runtime
+ make it more feasible - even for new generative AI and large language models like Stable Diffusion,
+ Whisper, and Llama2.
+
+
+
Considerations for PyTorch models on the edge
+
+
+ There are several factors to keep in mind when thinking about running a PyTorch model on the
+ edge:
+
+
+
+ Size: modern models can be several gigabytes (hence the name Large
+ Language Models!). On the cloud, size is usually not a consideration until it becomes too
+ large to fit on a single GPU. At that point there are various well-known solutions for
+ running across multiple GPUs. For edge devices, we need to find models that can fit within
+ the constraints of the device. This sometimes requires a tradeoff with quality. Most
+ modern models come in several sizes (1 billion parameters, 13 billion parameters, 70
+ billion parameters, etc) so you can select a variant that fits on your device. Techniques
+ such as quantization are usually applied to reduce the number of bits representing
+ parameters, further reducing the model size. The size of the application is also
+ constrained by the app stores, so bringing in gigabytes of libraries won't work on the
+ edge.
+
+
+ API for application integration: on the cloud, models are usually
+ packaged as Docker containers that expose an endpoint that is called by an application or
+ service. On edge devices, Docker containers may take up too many resources or may not even
+ be supported. By using an optimized engine, like ONNX Runtime, the dependency on Python
+ and Docker containers can be eliminated. ONNX Runtime also has APIs in many languages
+ including C, C++, C#, Rust, Java, JavaScript, Objective-C and Swift for making it easier
+ to integrate natively with the hosting application.
+
+
+ Performance: with large amounts of memory, no power restrictions, and
+ hefty compute capabilities, running non-optimized models on the cloud is possible. On edge
+ devices, these luxuries do not exist and optimization is crucial. For example, ONNX
+ Runtime optimizes memory allocations, fuses model operators, reduces kernel launch times,
+ minimizes tensor transfers between processing units, and applies tuned matrix math
+ algorithms. It's also able to make use of compilers and engines that are device-specific,
+ providing a common interface for your application while harnessing the best approach on
+ each device.
+
+
+ Maintainability: on the cloud, updating a model is as simple as deploying
+ a new container image and ramping up traffic. On the edge, you need to consider how you
+ will distribute model updates. Sometimes this involves publishing updates to an app store,
+ sometimes it might be possible to implement a data update mechanism within your app and
+ download new model files or maybe even deltas. There are many possible paths, so we won't
+ go into much depth on this topic in this article but it's an aspect to keep in mind as you
+ plan for production.
+
+
+ Hybrid: instead of cloud versus device, you can choose to utilize both.
+ There are several hybrid patterns that are used in production today by applications such
+ as Office. One pattern is to dynamically decide whether to run on the device or in the
+ cloud based on network conditions or input characteristics. Another pattern is to run part
+ of the model pipeline on the device and part on the cloud. This is especially useful with
+ modern model pipelines that have separate encoder and decoder stages. Using an engine like
+ ONNX Runtime that works on both cloud and device simplifies development. We'll discuss
+ hybrid scenarios in more detail in a forthcoming article.
+
+
+ Personalization: in many cases, the PyTorch model is simply being run on
+ the device. However, you may also have scenarios where you need to personalize the model
+ on the device without sending data to the cloud. Recommendation and content targeting are
+ example scenarios that can improve their quality by updating models based on activity on
+ the device. Fine tuning and training with PyTorch on the device may not feasible (due to
+ performance and size concerns) but using an engine like ONNX Runtime allows PyTorch models
+ to be updated and personalized locally. The same mechanism also enabled federated
+ learning, which can help mitigate user data exposure.
+
+
+
+
Tools for PyTorch models on the edge
+
+
+ We mentioned ONNX Runtime several times above. ONNX Runtime is a compact, standards-based
+ engine that has deep integration with PyTorch. By using PyTorch's ONNX APIs, your PyTorch
+ models can run on a spectrum of edge devices with ONNX Runtime.
+
+
+
+ The first step for running PyTorch models on the edge is to get them into a lightweight
+ format that doesn't require the PyTorch framework and its gigabytes of dependencies. PyTorch
+ has thought about this and includes an API that enables exactly this - torch.onnx. ONNX is an open standard that defines the operators that make
+ up models. The PyTorch ONNX APIs take the Pythonic PyTorch code and turn it into a functional
+ graph that captures the operators that are needed to run the model without Python. As with everything
+ in machine learning, there are some limitations to be aware of. Some PyTorch models cannot be
+ represented as a single graph - in this case you may need to output several graphs and stitch
+ them together in your own pipeline.
+
+
+
+ The popular Hugging Face library also has APIs that build on top of this torch.onnx
+ functionality to export models to the ONNX format. Over 130,000 models are supported making it very likely that the model you care about is one of them.
+
+
+
+ In this article, we'll show you several examples involving state-of-the-art PyTorch models
+ (like Whisper and Stable Diffusion) on popular devices (like Windows laptops, mobile phones,
+ and web browsers) via various languages (from C# to JavaScript to Swift).
+
+
+
Examples of PyTorch models on the edge
+
+
Stable Diffusion on Windows
+
+
+ The Stable Diffusion pipeline consists of five PyTorch models that build an image from a
+ text description. The diffusion process iterates on random pixels until the output image
+ matches the description.
+
+
+
+ To run on the edge, four of the models can be exported to ONNX format from HuggingFace.
+
+
+
+ You don't have to export the fifth model, ClipTokenizer, as it is available in ONNX Runtime extensions, a library for pre and post processing PyTorch models.
+
+
+
+ To run this pipeline of models as a .NET application, we build the pipeline code in C#. This
+ code can be run on CPU, GPU, or NPU, if they are available on your machine, using ONNX
+ Runtime's device-specific hardware accelerators. This is configured with the ExecutionProviderTarget below.
+
+
+
+ This is the output of the model pipeline, running with 50 inference iterations:
+
+
+
+
+
+ You can build the application and run it on Windows with the detailed steps shown in this tutorial.
+
+
+
Text generation in the browser
+
+
+ Running a PyTorch model locally in the browser is not only possible but super simple with
+ the transformers.js library. Transformers.js uses ONNX Runtime Web as its backend. Many models are already converted
+ to ONNX and served by the tranformers.js CDN, making inference in the browser a matter of writing
+ a few lines of HTML:
+
+
+
+ You can also embed the call to the transformers pipeline using vanilla JavaScript, or in a
+ web application, with React or Next.js, or write a browser extension.
+
+
+
+ ONNX Runtime Web currently uses web assembly to execute the model on the CPU. This is fine
+ for many models but leveraging the GPU, if one exists on the device, can improve the user
+ experience. ONNX Runtime Web support for WebGPU is coming *very* soon and enables you to tap
+ into the GPU while use the same inference APIs.
+
+
+
+
+
Speech recognition with Whisper on mobile
+
+
+ Whisper from OpenAI is a PyTorch speech recognition model. Whisper comes in a number of
+ different size variants - the smallest, Whisper Tiny, is suitable to run on mobile devices.
+ All components of the Whisper Tiny model (audio decoder, encoder, decoder, and text sequence
+ generation) can be composed and exported to a single ONNX model using the Olive framework. To run this model as part of a mobile application, you can use ONNX Runtime Mobile, which
+ supports Android, iOS, react-native, and MAUI/Xamarin.
+
+
+
+ ONNX Runtime Mobile supports hardware acceleration via NNAPI (on Android), CoreML (on iOS),
+ and XNNPACK (both iOS and Android).
+
+
+
+ The relevant snippet of a example Android mobile app that performs speech transcription on short samples of audio is shown below:
+
+
+
You can record a short audio clip to transcribe.
+
+
+
+
Train a model to recognize your voice on mobile
+
+
+ ONNX Runtime can also take a pre-trained model and adapt it to new data. It can do this on
+ the edge - on mobile specifically where it is easy to record your voice, access your photos
+ and other personalized data. Importantly, your data does not leave the device during
+ training.
+
+
+
+ For example, you can train a PyTorch model to recognize just your own voice on your mobile
+ phone, for authentication scenarios.
+
+
+
+ The PyTorch model is obtained from HuggingFace in your development environment, and extra
+ layers are added to perform the speaker classification:
+
+
+
+ The model and other components necessary for training (a loss function to measure the
+ quality of the model and an optimizer to instruct how the weights are adjusted during
+ training) are exported with ONNX Runtime Training:
+
+
+
+ This set of artifacts is now ready to be loaded by the mobile app, shown here as iOS Swift
+ code. The app asks the user for samples of their voice and the model is trained with the
+ samples.
+
+
+
+ Once the model is trained, you can run it to verify that a voice sample is you!
+
+ In this article we've shown why you would run PyTorch models on the edge and what aspects to
+ consider. We also shared several examples with code that you can use for running
+ state-of-the-art PyTorch models on the edge with ONNX Runtime. We also showed how ONNX
+ Runtime was built for performance and cross-platform execution, making it the ideal way to
+ run PyTorch models on the edge. Have fun running PyTorch models on the edge with ONNX
+ Runtime!
+
+
+
+ You may have noticed that we didn't include a Llama2 example even though ONNX Runtime is
+ optimized to run it. That's because the amazing Llama2 model deserves its own article, so
+ stay tuned for that!
+
+ Do you program in Python? C#? C++? Java? JavaScript? Rust? No problem. ONNX Runtime has you covered with support for many languages. And it runs on Linux, Windows, Mac, iOS, Android, and even in web browsers.
+
+ Integrate the power of Generative AI and Large language Models (LLMs) in your apps and services with ONNX Runtime. No matter what language you develop in or what platform you need to run on, you can make use of state-of-the-art models
+ for image synthesis, text generation, and more.
+
+ CPU, GPU, NPU - no matter what hardware you run on, ONNX Runtime optimizes for latency, throughput, memory utilization, and binary size. In addition to excellent out-of-the-box performance for common usage patterns, additional
+ model optimization techniques and runtime configurations are available to further improve performance for specific use cases and models.
+
+ ONNX Runtime is the same tech that powers AI in Microsoft products like Office, Azure, and Bing,
+ as well as in thousands of other projects across the world.
+
+
+
+
+
+
+
Web Browsers
+
+ Run PyTorch and other ML models locally in the web browser with the cross-platform ONNX
+ Runtime Web.
+
+
+
+
+
+
+
+
+
+
+
Mobile Devices
+
+ Infuse your Android and iOS mobile apps with AI and take advantage of ML accelerator
+ hardware with ONNX Runtime Mobile.
+
+
+
+
+
+
+
+
+
+
+
ONNX Runtime Training
+
+ ONNX Runtime reduces costs for large model training and enables on-device training.
+
+ ORT Training can be used to accelerate training for a large number of popular models,
+ including Hugging Face models like Llama-2-7b and curated models from the Azure AI |
+ Machine Learning Studio model catalog.
+
+
+
+
+
+
+
+
+
+
+
On-Device Training
+
+ On-device training with ONNX Runtime lets developers take an inference model and train
+ it locally to deliver a more personalized and privacy-respecting experience for
+ customers.
+
+
+
+
+
+
+
+
+
diff --git a/src/routes/components/videos.svelte b/src/routes/components/videos.svelte
new file mode 100644
index 0000000000000..3034b7b6d5a04
--- /dev/null
+++ b/src/routes/components/videos.svelte
@@ -0,0 +1,55 @@
+
+
Videos
+
Check out some of our videos to help you get started!
\ No newline at end of file
diff --git a/src/routes/components/winarm.svelte b/src/routes/components/winarm.svelte
new file mode 100644
index 0000000000000..fa121e6c3ea94
--- /dev/null
+++ b/src/routes/components/winarm.svelte
@@ -0,0 +1,49 @@
+
+
+
+
ONNX Runtime + Windows Dev Kit 2023 = NPU powered AI
+
Delivering NPU powered AI capabilities in your apps
+
+ Windows Dev Kit 2023, aka Project Volterra, enables developers to build apps that unlock the
+ power of the NPU hardware to accelerate AI/ML workloads delivering AI-enhanced features &
+ experiences without compromising app performance. You can get started now and access the power
+ of the NPU through the open source and cross-platform ONNX Runtime inference engine making it
+ easy to run AI/ML models from popular machine learning frameworks like PyTorch and TensorFlow.
+
+
+
+
+
Get started on your Windows Dev Kit 2023 today
+ Follow these steps to setup your device to use ONNX Runtime (ORT) with the built in NPU:
+
+
+ Download the Qualcomm AI Engine Direct SDK (QNN SDK)
+
+
Download and install the ONNX Runtime with QNN package
+
Start using the ONNX Runtime API in your application.
+
+
+
Optimizing models for the NPU
+ ONNX is a standard format for representing ML models authored in frameworks like PyTorch,
+ TensorFlow, and others. ONNX Runtime can run any ONNX model, however to make use of the NPU,
+ you currently need to use the following steps:
+
+
Run the tools provided in the SNPE SDK on your model to generate a binary file.
+
Include the contents of the binary file as a node in the ONNX graph.
+
+ See our C# tutorial for an example of how this is done.
+
+
+ Many models can be optimized for the NPU using this process. Even if a model cannot be optimized
+ for NPU by the SNPE SDK, it can still be run by ONNX Runtime on the CPU.
+
The average latency in seconds on Stable Diffusion v1.5 and v2.1 models:
+
+
+
+
+
+
+
+
+
Large Language Models + ONNX Runtime
+
+ ONNX Runtime supports many popular large language model (LLM) families in the Hugging Face Model
+ Hub. These, along with thousands of other models, are easily convertible to ONNX using the
+ Optimum API.
+
+ If you are interested in joining the ONNX Runtime open source community, you might want to join us on GitHub where you can
+ interact with other users and developers, participate in discussions, and get help with any
+ issues you encounter. You can also contribute to the project by reporting bugs, suggesting features,
+ or submitting pull requests.
+
+ The top 30 most popular model families on Hugging Face are all supported by ONNX Runtime,
+ and over 80 Hugging Face model families in total boast ORT support. The table outlines the models:
+
+
+
+ ONNX models can be found directly from the Hugging Face Model Hub in its ONNX model library.
+
+ Hugging Face also provides ONNX support for a variety of other models not listed in the ONNX
+ model library. With Hugging Face Optimum, you can easily convert pretrained models to ONNX,
+ and Transformers.js lets you run Hugging Face Transformers directly from your browser!
+
+ ONNX Runtime also supports many increasingly popular large language model (LLM) families,
+ including most of those available in the HF Model Hub. These model families are showcased in the table.
+
+
+
+ Hugging Face also provides a leaderboard with more detailed tracking and evaluation of
+ recently releases LLMs from the community.
+
+ Models accelerated by ONNX Runtime can be easily deployed to the cloud through Azure ML, which
+ improves time to value, streamlines MLOps, and provides built-in security.
+
+
+
+ Azure ML also publishes a curated model list that is updated regularly and includes some of the
+ most popular models at the moment. Of the models on this list that are available on Hugging
+ Face, there is currently Optimum ONNX support for over 85%.
+
Improve inference performance for a wide variety of ML models
+
+
+
+
+
Run on different hardware and operating systems
+
+
+
+
+
Train in Python but deploy into a C#/C++/Java app
+
+
+
+
+
+ Train and perform inference with models created in different frameworks
+
+
+
+
+
+
+
Interested in inferencing on edge? Additional benefits include:
+
+
+
+
Cost savings vs. running models in the cloud
+
+
+
+
+
Better latency and availability than request in the cloud
+
+
+
+
+
More privacy since data stays on device
+
+
+
+
+
+
+
+ Easily enable cross-platform portability with the same implementation through the browser
+
+
+
+
+
+
+ Simplify the distribution experience without needing any additional libraries and driver
+ installations
+
+
+
+
+
+
+
+
+
Examples
+
+
+
+
Image Classification
+ The example app uses image classification which is able to continuously classify the objects
+ it sees from the device's camera in real-time and displays the most probable inference results
+ on the screen.
+
+ The example app uses object detection which is able to continuously detect the objects in the
+ frames seen by your iOS device's back camera and display the detected object bounding boxes,
+ detected class and corresponding inference confidence on the screen.
+
+ The example app gives a demo of introducing question answering models with pre/post processing
+ into mobile scenario. Currently supports on platform Android and iOS.
+
+ ONNX Runtime Web demo is an interactive demo portal showing real use cases running ONNX
+ Runtime Web in VueJS. It currently supports five examples for you to quickly experience
+ the power of ONNX Runtime Web.
+
+ The example demonstrates how to create custom Excel functions (ORT.Sentiment() and ORT.Question())
+ to implement BERT NLP models with ONNX Runtime Web to enable deep learning in spreadsheet tasks.
+
+ If you're external to Microsoft and the issue contains information that cannot be
+ disclosed publicly, use the following link to template an email, and send it to the
+ below DRI:
+
Run PyTorch models on cloud, desktop, mobile, IoT, and even in the browser
+
+
+
+
+
+
Boost performance
+
Accelerate PyTorch models to improve user experience and reduce costs
+
+
+
+
+
+
Improve time to market
+
Used by Microsoft and many others for their production PyTorch workloads
+
+
+
+
+
+
Why PyTorch + ONNX Runtime?
+
+
+
+
+
+
Native support in PyTorch
+
+ PyTorch includes support for ONNX through the torch.onnx APIs to simplify exporting your
+ PyTorch model to the portable ONNX format. The ONNX Runtime team maintains these
+ exporter APIs to ensure a high level of compatibility with PyTorch models.
+
+ Train and deploy models reliably and at scale using a built-in PyTorch environment
+ within Azure Machine Learning to ensure that the latest PyTorch version is fully
+ supported through a lightweight, standalone environment that includes needed components
+ like ONNX Runtime for Training to effectively run optimized training for large models.
+
+ Better performance can help improve your user experience and lower your operating costs.
+ A wide range of models from computer vision (ResNet, MobileNet, Inception, YOLO, super
+ resolution, etc) to speech and NLP (BERT, RoBERTa, GPT-2, T5, etc) can benefit from ONNX
+ Runtime's optimized performance. The ONNX Runtime team regularly benchmarks and
+ optimizes top models for performance. ONNX Runtime also integrates with top hardware
+ accelerator libraries like TensorRT and OpenVINO so you can get the best performance on
+ the hardware available while using the same common APIs across all your target
+ platforms.
+
+ Development agility is a key factor in overall costs. ONNX Runtime was built on the
+ experience of taking PyTorch models to production in high scale services like Microsoft
+ Office, Bing, and Azure. It used to take weeks and months to take a model from R&D to
+ production. With ONNX Runtime, models can be ready to be deployed at scale in hours or
+ days.
+
+ ORTModule accelerates training of large transformer based PyTorch models. The training time and
+ training cost is reduced with a few lines of code change. It is built on top of highly successful and
+ proven technologies of ONNX Runtime and ONNX format. It is composable with technologies like DeepSpeed and
+ accelerates pre-training and finetuning for state of the art LLMs. It is integrated in the Hugging Face Optimum
+ library which provides an ORTTrainer API to use ONNX Runtime as the backend for training acceleration.
+
+
+
+
+ - model = build_model() # User's PyTorch model
+ + model = ORTModule(build_model())
+
+
Optimized kernels and memory optimizations provides >1.5X speed up in training time.
+
+
+
+
+
Flexible & extensible hardware support
+
+ The same model and API works with NVIDIA and AMD GPUs, and the extensible "execution
+ provider" architecture allow you to plug-in custom operators, optimizer and hardware
+ accelerators.
+
+
+
+
+
+
Part of the PyTorch ecosystem
+
+ ONNX Runtime Training is available via the torch-ort
+ package as part of the
+ Azure Container for PyTorch (ACPT) and seamlessly integrates with existing training pipelines for PyTorch models.
+
+
+
+
+
+
Composable with popular acceleration systems
+
+ Compose with DeepSpeed,
+ FairScale, Megatron, and
+ more for even faster and more efficient training.
+
Can be used to accelerate popular models like Llama-2-7b
+
+ ORT Training can be used to accelerate Hugging Face models like Llama-2-7b through these scripts.
+
+
+
+
+
+
+
Improved Foundation Model Performance with ORT Training
+
+
+
+
+
+
Average throughput improvement:
+
2.7x
+
+
+
+
+
Median throughput improvement:
+
1.7x
+
+
+
+
+
+
+
+
+
On-Device Training
+
+
+ On-Device Training refers to the process of training a model on an edge device, such as
+ mobile phones, embedded devices, gaming consoles, web browsers, etc. This is in contrast to
+ training a model on a server or a cloud. On-Device Training extends the Inference ecosystem
+ to leverage data on the device for providing customized user experiences on the edge. Once the model
+ is trained on the device, it can be used to get an Inference model for deployment, update
+ global weights for federated learning or create a checkpoint for future use. It
+ also preserves user privacy by training on the device.
+
make it easy to scale across multiple platform targets
+
+
+
+
+
Improves data privacy & security
+
+ especially when working with sensitive data that cannot be shared with a server or a cloud
+
+
+
+
+
+
Same solution runs cross-platform
+
on cloud, desktop, edge, and mobile
+
+
+
+
+
+
Use Cases
+
+
+
+
+ Personalization tasks where the model needs to be trained on
+ the user's data
+
+
+ Examples:
+
Image / Audio classification
+
Text Prediction
+
+
+
+
+
+
+
+ Federated learning tasks where the model is locally trained
+ on data distributed across multiple devices to build a more robust aggregated global model
+
+
+ Examples:
+
Medical research
+
Autonomous vehicles
+
Robotics
+
+
+
+
+
+
diff --git a/src/routes/volterra/+page.svelte b/src/routes/volterra/+page.svelte
new file mode 100644
index 0000000000000..153d22af55239
--- /dev/null
+++ b/src/routes/volterra/+page.svelte
@@ -0,0 +1,6 @@
+
diff --git a/src/routes/winarm/+page.svelte b/src/routes/winarm/+page.svelte
new file mode 100644
index 0000000000000..ad2806bb019b5
--- /dev/null
+++ b/src/routes/winarm/+page.svelte
@@ -0,0 +1,4 @@
+
+
\ No newline at end of file
diff --git a/src/routes/windows/+page.svelte b/src/routes/windows/+page.svelte
new file mode 100644
index 0000000000000..d31496acaa1fc
--- /dev/null
+++ b/src/routes/windows/+page.svelte
@@ -0,0 +1,126 @@
+
+
+
+
+
+
+
+ This gallery demonstrates different machine learning scenarios and features using Windows ML
+ in an interactive format. The app is an interactive companion that shows the integration of
+ Windows Machine Learning Library APIs
+ into a desktop
+ WinUI 3 application.
+
+ This is a desktop application that uses SqueezeNet, a pre-trained machine learning model,
+ to detect the predominant object in an image selected by the user from a file.
+
diff --git a/static/favicon.ico b/static/favicon.ico
new file mode 100644
index 0000000000000..b47880019f3c2
Binary files /dev/null and b/static/favicon.ico differ
diff --git a/svelte.config.js b/svelte.config.js
new file mode 100644
index 0000000000000..118b2a5e29379
--- /dev/null
+++ b/svelte.config.js
@@ -0,0 +1,21 @@
+import adapter from '@sveltejs/adapter-static';
+import { vitePreprocess } from '@sveltejs/kit/vite';
+
+/** @type {import('@sveltejs/kit').Config} */
+const config = {
+ // Consult https://kit.svelte.dev/docs/integrations#preprocessors
+ // for more information about preprocessors
+ preprocess: [vitePreprocess()],
+
+ kit: {
+ // adapter-auto only supports some environments, see https://kit.svelte.dev/docs/adapter-auto for a list.
+ // If your environment is not supported or you settled on a specific environment, switch out the adapter.
+ // See https://kit.svelte.dev/docs/adapters for more information about adapters.
+ adapter: adapter(),
+ paths: {
+ base: process.env.NODE_ENV === 'production' ? '' : ''
+ }
+ }
+};
+
+export default config;
diff --git a/tailwind.config.js b/tailwind.config.js
new file mode 100644
index 0000000000000..a01ed66e2116f
--- /dev/null
+++ b/tailwind.config.js
@@ -0,0 +1,29 @@
+/** @type {import('tailwindcss').Config} */
+export default {
+ content: ['./src/**/*.{html,svelte,js,ts}'],
+ theme: {
+ extend: {}
+ },
+ plugins: [require('daisyui')],
+ daisyui: {
+ themes: [
+ {
+ darkmode: {
+ ...require('daisyui/src/theming/themes')['[data-theme=business]'],
+ primary: '#0099cc',
+ 'base-100': '#212933',
+ info: '#d1d1d1',
+ },
+ lightmode: {
+ ...require('daisyui/src/theming/themes')['[data-theme=corporate]'],
+ primary: '#80dfff',
+ 'base-100': '#f3f4f6',
+ info: '#d1d1d1',
+ }
+ }
+ ],
+ base: true,
+ styled: true,
+ utils: true
+ }
+};
diff --git a/tsconfig.json b/tsconfig.json
new file mode 100644
index 0000000000000..6ae0c8c44d08a
--- /dev/null
+++ b/tsconfig.json
@@ -0,0 +1,17 @@
+{
+ "extends": "./.svelte-kit/tsconfig.json",
+ "compilerOptions": {
+ "allowJs": true,
+ "checkJs": true,
+ "esModuleInterop": true,
+ "forceConsistentCasingInFileNames": true,
+ "resolveJsonModule": true,
+ "skipLibCheck": true,
+ "sourceMap": true,
+ "strict": true
+ }
+ // Path aliases are handled by https://kit.svelte.dev/docs/configuration#alias
+ //
+ // If you want to overwrite includes/excludes, make sure to copy over the relevant includes/excludes
+ // from the referenced tsconfig.json - TypeScript does not merge them in
+}
diff --git a/vite.config.ts b/vite.config.ts
new file mode 100644
index 0000000000000..bbf8c7da43f00
--- /dev/null
+++ b/vite.config.ts
@@ -0,0 +1,6 @@
+import { sveltekit } from '@sveltejs/kit/vite';
+import { defineConfig } from 'vite';
+
+export default defineConfig({
+ plugins: [sveltekit()]
+});
diff --git a/volterra.html b/volterra.html
deleted file mode 100644
index 81d5ebccaac67..0000000000000
--- a/volterra.html
+++ /dev/null
@@ -1,61 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Project Volterra
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/winarm.html b/winarm.html
deleted file mode 100644
index 3eec46ab9885c..0000000000000
--- a/winarm.html
+++ /dev/null
@@ -1,160 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ONNX Runtime | Windows Dev Kit 2023
-
-
-
-
-
-
-
-
-
- Skip to main content
-
-
-
-
-
-
-
-
-
-
-
ONNX Runtime + Windows Dev Kit 2023
- = NPU powered AI
-
-
-
-
-
-
-
-
Delivering NPU powered AI capabilities in your apps
-
Windows Dev Kit 2023, aka Project Volterra, enables developers to build apps that
- unlock the power of the NPU hardware to accelerate AI/ML workloads delivering
- AI-enhanced features & experiences without compromising app performance.
-
You can get started now and access the power of the NPU through the open source
- and cross-platform ONNX Runtime inference engine making it easy to run AI/ML
- models from popular machine learning frameworks like PyTorch and TensorFlow.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Get started on your Windows Dev Kit 2023 today
-
Follow these steps to setup your device to use ONNX Runtime (ORT) with the built
- in NPU:
-
-
Download the Qualcomm AI Engine Direct SDK (QNN SDK)
-
Download and install the ONNX Runtime with QNN
- package
-
Start using the ONNX Runtime API in your application.
-
-
-
Optimizing models for the NPU
-
ONNX is a standard format for
- representing ML models authored in frameworks like PyTorch, TensorFlow, and
- others. ONNX Runtime can run any ONNX model, however to make use of the NPU, you
- currently need to quantize the ONNX model to QDQ model.
-
Many models can be optimized for the NPU using this process. Even if a model
- cannot be optimized for NPU, it can still be run by ONNX Runtime
- on the CPU.