Change phi-3-tutorial.md back (#354)

microsoft · Apr 29, 2024 · 2fa3964 · 2fa3964
1 parent bf7d2f1
commit 2fa3964
Showing 1 changed file with 22 additions and 99 deletions.
diff --git a/examples/python/phi-3-tutorial.md b/examples/python/phi-3-tutorial.md
@@ -2,139 +2,62 @@
 
 ## Steps
 1. [Download Phi-3 Mini](#download-the-model)
-2. [Build ONNX Runtime shared libraries](#build-onnx-runtime-from-source)
-3. [Build generate() API](#build-the-generate-api-from-source)
-4. [Run Phi-3 Mini](#run-the-model)
+2. [Install the generate() API](#install-the-generate()-api-package)
+3. [Run Phi-3 Mini](#run-the-model)
 
 ## Download the model 
 
 Download either or both of the [short](https://aka.ms/phi3-mini-4k-instruct-onnx) and [long](https://aka.ms/phi3-mini-128k-instruct-onnx) context Phi-3 mini models from Hugging Face.
 
-There are ONNX models for CPU (used for mobile too), as well as DirectML and CUDA.
 
-
-## Install the generate() API package
-
-Right now, both `onnxruntime` and `onnxruntime-genai` need to be built from source. Once packages are published, this tutorial will be updated.
-
-The instructions for how to build both packages from source are documented in the [build from source](https://onnxruntime.ai/docs/genai/howto/build-from-source.html) guide. They are repeated here for your convenience.
-
-### Pre-requisites
-
-#### CMake
-
-This is included on Windows if you have Visual Studio installed. If you are running on Linux or Mac, you can install it using `conda`.
-
-```bash
-conda install cmake
-```
-
-### Build ONNX Runtime from source
-
-#### Clone the repo 
-
-```bash
-git clone https://github.com/microsoft/onnxruntime.git
-cd onnxruntime
-```
-
-#### Build ONNX Runtime for DirectML on Windows
-
-```bash
-build.bat --build_shared_lib --skip_tests --parallel --use_dml --config Release
-```
-
-#### Build ONNX Runtime for CPU on Windows
-
-```bash
-build.bat --build_shared_lib --skip_tests --parallel --config Release
-```
-
-#### Build ONNX Runtime for CUDA on Windows
+For the short context model.
 
 ```bash
-build.bat --build_shared_lib --skip_tests --parallel --use_cuda --config Release
+git clone https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx
 ```
 
-#### Build ONNX Runtine on Linux
+For the long context model
 
 ```bash
-./build.sh --build_shared_lib --skip_tests --parallel [--use_cuda] --config Release
+git clone https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx
 ```
 
-You may need to provide extra command line options for building with CUDA on Linux. An example full command is as follows.
+These model repositories have models that run with DirectML, CPU and CUDA.
 
-```bash
-./build.sh --parallel --build_shared_lib --use_cuda --cuda_version 11.8 --cuda_home /usr/local/cuda-11.8 --cudnn_home /usr/lib/x86_64-linux-gnu/ --config Release --build_wheel --skip_tests --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="80" --cmake_extra_defines CMAKE_CUDA_COMPILER=/usr/local/cuda-11.8/bin/nvcc
-```
-
-Replace the values given above for different versions and locations of CUDA.
+## Install the generate() API package
 
-#### Build ONNX Runtime on Mac
+### DirectML
 
-```bash
-./build.sh --build_shared_lib --skip_tests --parallel --config Release
 ```
-
-### Build the generate() API from source
-
-#### Clone the repo
-
-```bash
-git clone https://github.com/microsoft/onnxruntime-genai
-cd onnxruntime-genai
-mkdir -p ort/include
-mkdir -p ort/lib
+pip install numpy
+pip install --pre onnxruntime-genai-directml
 ```
 
-#### Build the generate() API on Windows
-
-
-If building for DirectML
+### CPU
 
-```bash
-copy ..\onnxruntime\include\onnxruntime\core\providers\dml\dml_provider_factory.h ort\include
 ```
-
-```bash
-copy ..\onnxruntime\include\onnxruntime\core\session\onnxruntime_c_api.h ort\include
-copy ..\onnxruntime\build\Windows\Release\Release\*.dll ort\lib
-copy ..\onnxruntime\build\Windows\Release\Release\onnxruntime.lib ort\lib
-python build.py [--use_dml | --use_cuda]
-cd build\wheel
-pip install *.whl
+pip install numpy
+pip install --pre onnxruntime-genai
 ```
 
+### CUDA
 
-#### Build the generate() API on Linux
-
-```bash
-cp ../onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h ort/include
-cp ../onnxruntime/build/Linux/Release/libonnxruntime*.so* ort/lib
-python build.py [--use_cuda]
-cd build/wheel
-pip install *.whl
 ```
-
-#### Build the generate() API on Mac
-
-```bash
-cp ../onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h ort/include
-cp ../onnxruntime/build/MacOS/Release/libonnxruntime*.dylib* ort/lib
-python build.py
-cd build/wheel
-pip install *.whl
+pip install numpy
+pip install --pre onnxruntime-genai-cuda --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-genai/pypi/simple/
 ```
 
 ## Run the model
 
-Run the model with [this script](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/model-qa.py).
+Run the model with [model-qa.py](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/model-qa.py).
 
 The script accepts a model folder and takes the generation parameters from the config in that model folder. You can also override the parameters on the command line.
 
+This example is using the long context model running with DirectML on Windows.
+
 ```bash
-pip install numpy
-python model-qa.py -m models/phi3-mini-4k-instruct-cpu-int4-rtn-block-32 
+curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/model-qa.py -o model-qa.py
+python model-qa.py -m Phi-3-mini-128k-instruct-onnx/directml/directml-int4-awq-block-128 -l 2048
 ```
 
 Once the script has loaded the model, it will ask you for input in a loop, streaming the output as it is produced the model. For example: