From 2dd18abc419ddb059671a536acb39b9e1323cdda Mon Sep 17 00:00:00 2001 From: natke Date: Tue, 21 May 2024 08:14:51 -0700 Subject: [PATCH 01/10] Add Phi-3 medium --- docs/genai/tutorials/phi2-python.md | 6 +- docs/genai/tutorials/phi3-python.md | 114 ++++++++++++++++++++++++++-- 2 files changed, 110 insertions(+), 10 deletions(-) diff --git a/docs/genai/tutorials/phi2-python.md b/docs/genai/tutorials/phi2-python.md index a2205667ba249..6aefcef720763 100644 --- a/docs/genai/tutorials/phi2-python.md +++ b/docs/genai/tutorials/phi2-python.md @@ -1,13 +1,13 @@ --- -title: Python phi-2 tutorial +title: Phi-2 tutorial description: Learn how to write a language generation application with ONNX Runtime generate() API in Python using the phi-2 model has_children: false parent: Tutorials grand_parent: Generate API (Preview) -nav_order: 2 +nav_order: 3 --- -# Language generation in Python with phi-2 +# Language generation in Python with Phi-2 ## Setup and installation diff --git a/docs/genai/tutorials/phi3-python.md b/docs/genai/tutorials/phi3-python.md index d93eed15cdbee..930250342fc8e 100644 --- a/docs/genai/tutorials/phi3-python.md +++ b/docs/genai/tutorials/phi3-python.md @@ -1,13 +1,13 @@ --- -title: Python phi-3 tutorial -description: Small but mighty. Run Phi-3 with ONNX Runtime. +title: Phi-3 tutorial +description: Small but mighty. Run Phi-3 with ONNX Runtime in 3 easy steps. has_children: false parent: Tutorials grand_parent: Generate API (Preview) nav_order: 1 --- -# Run the Phi-3 Mini models with the ONNX Runtime generate() API +# Run Phi-3 language models with the ONNX Runtime generate() API ## Steps 1. [Setup](#setup) @@ -18,11 +18,24 @@ nav_order: 1 ## Introduction -There are two Phi-3 mini models to choose from: the short (4k) context version or the long (128k) context version. The long context version can accept much longer prompts and produce longer output text, but it does consume more memory. +Phi-3 ONNX models are hosted on HuggingFace and you can run them with the ONNX Runtime generate() API. -The Phi-3 ONNX models are hosted on HuggingFace: [short](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) and [long](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx). +The mini (3.3B) and medium (14B) versions available now, with support for small coming soon. Both mini and medium have a short (4k) context version and a long (128k) context version. The long context version can accept much longer prompts and produce longer output text, but it does consume more memory. + +Available models are: + +https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx +https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx +https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cpu +https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cuda +https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml +https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cpu +https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda +https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml + + +This tutorial downloads and runs the short context (4k) mini (3B) model variant. See the [model reference](#phi3-onnx-model-reference) for download commands for the other variants. -This tutorial downloads and runs the short context model. If you would like to use the long context model, change the `4k` to `128k` in the instructions below. ## Setup @@ -128,7 +141,7 @@ Are you on a Windows machine with GPU? ```bash Input: Tell me a joke about creative writing - Output: Why don\'t writers ever get lost? Because they always follow the plot! + Output: Why don't writers ever get lost? Because they always follow the plot! ``` ## Run on CPU @@ -165,3 +178,90 @@ Are you on a Windows machine with GPU? Output: Why did the generative AI go to school? To improve its "creativity" algorithm! + ``` + +## Phi-3 ONNX model reference + +### Phi-3 mini 4k context CPU + +```bash +huggingface-cli download microsoft/Phi-3-mini-4k-instruct-onnx --include cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/* --local-dir . +python phi3-qa.py -m cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4 +``` + +### Phi-3 mini 4k context CUDA + +```bash +huggingface-cli download microsoft/Phi-3-mini-4k-instruct-onnx --include cuda/cuda-int4-rtn-block-32/* --local-dir . +python phi3-qa.py -m cuda/cuda-int4-rtn-block-32 +``` + +### Phi-3 mini 4k context DirectML + +```bash +huggingface-cli download microsoft/Phi-3-mini-4k-instruct-onnx --include directml/* --local-dir . +python phi3-qa.py -m directml\directml-int4-awq-block-128 +``` + +### Phi-3 mini 128k context CPU + +```bash +huggingface-cli download microsoft/Phi-3-mini-128k-instruct-onnx --include cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/* --local-dir . +python phi3-qa.py -m cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4 +``` + +### Phi-3 mini 128k context CUDA + +```bash +huggingface-cli download microsoft/Phi-3-mini-128k-instruct-onnx --include cuda/cuda-int4-rtn-block-32/* --local-dir . +python phi3-qa.py -m cuda/cuda-int4-rtn-block-32 +``` + +### Phi-3 mini 128k context DirectML + +```bash +huggingface-cli download microsoft/Phi-3-mini-128k-instruct-onnx --include directml/* --local-dir . +python phi3-qa.py -m directml\directml-int4-awq-block-128 +``` + +### Phi-3 medium 4k context CPU + +```bash +git clone https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cpu +python phi3-qa.py -m Phi-3-medium-4k-instruct-onnx-cpu/cpu-int4-rtn-block-32-acc-level-4 +``` + +### Phi-3 medium 4k context CUDA + +```bash +git clone https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cuda +python phi3-qa.py -m Phi-3-medium-4k-instruct-onnx-cuda/cuda-int4-rtn-block-32 +``` + +### Phi-3 medium 4k context DirectML + +```bash +git clone https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml +python phi3-qa.py -m Phi-3-medium-4k-instruct-onnx-directml/directml-int4-awq-block-128 +``` + +### Phi-3 medium 128k context CPU + +```bash +git clone https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cpu +python phi3-qa.py -m Phi-3-medium-128k-instruct-onnx-cpu/cpu-int4-rtn-block-32-acc-level-4 +``` + +### Phi-3 medium 128k context CUDA + +```bash +git clone https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda +python phi3-qa.py -m Phi-3-medium-128k-instruct-onnx-cuda/cuda-int4-rtn-block-32 +``` + +### Phi-3 medium 128k context DirectML + +```bash +git clone https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml +python phi3-qa.py -m Phi-3-medium-128k-instruct-onnx-directml/directml-int4-awq-block-128 +``` From 6d6db2c5ce3a27069a9d12ff73db6aa06f207fa9 Mon Sep 17 00:00:00 2001 From: natke Date: Tue, 21 May 2024 08:19:20 -0700 Subject: [PATCH 02/10] Fix anchor --- docs/genai/tutorials/phi3-python.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/genai/tutorials/phi3-python.md b/docs/genai/tutorials/phi3-python.md index 930250342fc8e..e939260c32552 100644 --- a/docs/genai/tutorials/phi3-python.md +++ b/docs/genai/tutorials/phi3-python.md @@ -34,7 +34,7 @@ https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml -This tutorial downloads and runs the short context (4k) mini (3B) model variant. See the [model reference](#phi3-onnx-model-reference) for download commands for the other variants. +This tutorial downloads and runs the short context (4k) mini (3B) model variant. See the [model reference](#phi-3-onnx-model-reference) for download commands for the other variants. ## Setup From d5719a8fbf347e873278b8b8e0dab5109e37be34 Mon Sep 17 00:00:00 2001 From: natke Date: Tue, 21 May 2024 08:20:31 -0700 Subject: [PATCH 03/10] Add TOC --- docs/genai/tutorials/phi3-python.md | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/docs/genai/tutorials/phi3-python.md b/docs/genai/tutorials/phi3-python.md index e939260c32552..ff63018027e5a 100644 --- a/docs/genai/tutorials/phi3-python.md +++ b/docs/genai/tutorials/phi3-python.md @@ -8,13 +8,7 @@ nav_order: 1 --- # Run Phi-3 language models with the ONNX Runtime generate() API - -## Steps -1. [Setup](#setup) -2. [Choose your platform](#choose-your-platform) -3. [Run with DirectML](#run-with-directml) -4. [Run with NVDIA CUDA](#run-with-nvidia-cuda) -5. [Run on CPU](#run-on-cpu) +{: .no_toc } ## Introduction @@ -36,6 +30,8 @@ https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml This tutorial downloads and runs the short context (4k) mini (3B) model variant. See the [model reference](#phi-3-onnx-model-reference) for download commands for the other variants. +* TOC placeholder +{:toc} ## Setup From 4e0b0111ddf1d44753d7ae9998cd699f463017d1 Mon Sep 17 00:00:00 2001 From: natke Date: Tue, 21 May 2024 08:21:39 -0700 Subject: [PATCH 04/10] Remove small --- docs/genai/tutorials/phi3-python.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/genai/tutorials/phi3-python.md b/docs/genai/tutorials/phi3-python.md index ff63018027e5a..8cbc9117f6dd4 100644 --- a/docs/genai/tutorials/phi3-python.md +++ b/docs/genai/tutorials/phi3-python.md @@ -14,7 +14,7 @@ nav_order: 1 Phi-3 ONNX models are hosted on HuggingFace and you can run them with the ONNX Runtime generate() API. -The mini (3.3B) and medium (14B) versions available now, with support for small coming soon. Both mini and medium have a short (4k) context version and a long (128k) context version. The long context version can accept much longer prompts and produce longer output text, but it does consume more memory. +The mini (3.3B) and medium (14B) versions available now, with support. Both mini and medium have a short (4k) context version and a long (128k) context version. The long context version can accept much longer prompts and produce longer output text, but it does consume more memory. Available models are: From 608de642d76416c6fe0abc294d96aa0cead5cb69 Mon Sep 17 00:00:00 2001 From: natke Date: Tue, 21 May 2024 08:25:22 -0700 Subject: [PATCH 05/10] HF links --- docs/genai/tutorials/phi3-python.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/genai/tutorials/phi3-python.md b/docs/genai/tutorials/phi3-python.md index 8cbc9117f6dd4..bc3fbfb14c314 100644 --- a/docs/genai/tutorials/phi3-python.md +++ b/docs/genai/tutorials/phi3-python.md @@ -18,14 +18,14 @@ The mini (3.3B) and medium (14B) versions available now, with support. Both mini Available models are: -https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx -https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx -https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cpu -https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cuda -https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml -https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cpu -https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda -https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml +[https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) +[https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx] +[https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cpu] +[https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cuda] +[https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml] +[https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cpu] +[https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda] +[https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml] This tutorial downloads and runs the short context (4k) mini (3B) model variant. See the [model reference](#phi-3-onnx-model-reference) for download commands for the other variants. From 516e760692a0d738c0c278eb749941df7d400c01 Mon Sep 17 00:00:00 2001 From: natke Date: Tue, 21 May 2024 08:26:18 -0700 Subject: [PATCH 06/10] HF links list --- docs/genai/tutorials/phi3-python.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/genai/tutorials/phi3-python.md b/docs/genai/tutorials/phi3-python.md index bc3fbfb14c314..45756912f60f4 100644 --- a/docs/genai/tutorials/phi3-python.md +++ b/docs/genai/tutorials/phi3-python.md @@ -18,14 +18,14 @@ The mini (3.3B) and medium (14B) versions available now, with support. Both mini Available models are: -[https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) -[https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx] -[https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cpu] -[https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cuda] -[https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml] -[https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cpu] -[https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda] -[https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml] +* [https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) +* [https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx] +* [https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cpu] +* [https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cuda] +* [https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml] +* [https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cpu] +* [https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda] +* [https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml] This tutorial downloads and runs the short context (4k) mini (3B) model variant. See the [model reference](#phi-3-onnx-model-reference) for download commands for the other variants. From cd9de88a86798c8db154f88ffa030cd2a343eaa7 Mon Sep 17 00:00:00 2001 From: natke Date: Tue, 21 May 2024 08:26:50 -0700 Subject: [PATCH 07/10] Intro TOC --- docs/genai/tutorials/phi3-python.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/genai/tutorials/phi3-python.md b/docs/genai/tutorials/phi3-python.md index 45756912f60f4..02ac5957e4448 100644 --- a/docs/genai/tutorials/phi3-python.md +++ b/docs/genai/tutorials/phi3-python.md @@ -11,6 +11,7 @@ nav_order: 1 {: .no_toc } ## Introduction +{: .no_toc } Phi-3 ONNX models are hosted on HuggingFace and you can run them with the ONNX Runtime generate() API. From efe09d596dced530eb80a56e46bf96c10720ccb6 Mon Sep 17 00:00:00 2001 From: natke Date: Tue, 21 May 2024 08:30:24 -0700 Subject: [PATCH 08/10] Links --- docs/genai/tutorials/phi3-python.md | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/docs/genai/tutorials/phi3-python.md b/docs/genai/tutorials/phi3-python.md index 02ac5957e4448..846fb220c237e 100644 --- a/docs/genai/tutorials/phi3-python.md +++ b/docs/genai/tutorials/phi3-python.md @@ -19,14 +19,15 @@ The mini (3.3B) and medium (14B) versions available now, with support. Both mini Available models are: -* [https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) -* [https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx] -* [https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cpu] -* [https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cuda] -* [https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml] -* [https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cpu] -* [https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda] -* [https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml] + +* (https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) +* (https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx) +* (https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cpu) +* (https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cuda) +* (https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml) +* (https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cpu) +* (https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda) +* (https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml) This tutorial downloads and runs the short context (4k) mini (3B) model variant. See the [model reference](#phi-3-onnx-model-reference) for download commands for the other variants. From 77578f6c4b28c90d7e4d3411ad6d2e3c981f7be0 Mon Sep 17 00:00:00 2001 From: natke Date: Tue, 21 May 2024 08:35:18 -0700 Subject: [PATCH 09/10] Links again --- docs/genai/tutorials/phi3-python.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/genai/tutorials/phi3-python.md b/docs/genai/tutorials/phi3-python.md index 846fb220c237e..3f534a4c9c571 100644 --- a/docs/genai/tutorials/phi3-python.md +++ b/docs/genai/tutorials/phi3-python.md @@ -20,14 +20,14 @@ The mini (3.3B) and medium (14B) versions available now, with support. Both mini Available models are: -* (https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) -* (https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx) -* (https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cpu) -* (https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cuda) -* (https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml) -* (https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cpu) -* (https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda) -* (https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml) +* [https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx] +* [https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx] +* [https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cpu] +* [https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cuda] +* [https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml] +* [https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cpu] +* [https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda] +* [https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml] This tutorial downloads and runs the short context (4k) mini (3B) model variant. See the [model reference](#phi-3-onnx-model-reference) for download commands for the other variants. From a7c246fde419723221cf4adb63980d4204144f7c Mon Sep 17 00:00:00 2001 From: natke Date: Tue, 21 May 2024 08:40:46 -0700 Subject: [PATCH 10/10] Links again --- docs/genai/tutorials/phi3-python.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/genai/tutorials/phi3-python.md b/docs/genai/tutorials/phi3-python.md index 3f534a4c9c571..20ab83bf78756 100644 --- a/docs/genai/tutorials/phi3-python.md +++ b/docs/genai/tutorials/phi3-python.md @@ -20,14 +20,14 @@ The mini (3.3B) and medium (14B) versions available now, with support. Both mini Available models are: -* [https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx] -* [https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx] -* [https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cpu] -* [https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cuda] -* [https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml] -* [https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cpu] -* [https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda] -* [https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml] +* [https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) +* [https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx) +* [https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cpu](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cpu) +* [https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cuda](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-cuda) +* [https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml) +* [https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cpu](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cpu) +* [https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda) +* [https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml/) This tutorial downloads and runs the short context (4k) mini (3B) model variant. See the [model reference](#phi-3-onnx-model-reference) for download commands for the other variants.