Skip to content

Commit

Permalink
Change phi-3-tutorial.md back (#354)
Browse files Browse the repository at this point in the history
  • Loading branch information
yufenglee authored Apr 29, 2024
1 parent bf7d2f1 commit 2fa3964
Showing 1 changed file with 22 additions and 99 deletions.
121 changes: 22 additions & 99 deletions examples/python/phi-3-tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,139 +2,62 @@

## Steps
1. [Download Phi-3 Mini](#download-the-model)
2. [Build ONNX Runtime shared libraries](#build-onnx-runtime-from-source)
3. [Build generate() API](#build-the-generate-api-from-source)
4. [Run Phi-3 Mini](#run-the-model)
2. [Install the generate() API](#install-the-generate()-api-package)
3. [Run Phi-3 Mini](#run-the-model)

## Download the model

Download either or both of the [short](https://aka.ms/phi3-mini-4k-instruct-onnx) and [long](https://aka.ms/phi3-mini-128k-instruct-onnx) context Phi-3 mini models from Hugging Face.

There are ONNX models for CPU (used for mobile too), as well as DirectML and CUDA.


## Install the generate() API package

Right now, both `onnxruntime` and `onnxruntime-genai` need to be built from source. Once packages are published, this tutorial will be updated.

The instructions for how to build both packages from source are documented in the [build from source](https://onnxruntime.ai/docs/genai/howto/build-from-source.html) guide. They are repeated here for your convenience.

### Pre-requisites

#### CMake

This is included on Windows if you have Visual Studio installed. If you are running on Linux or Mac, you can install it using `conda`.

```bash
conda install cmake
```

### Build ONNX Runtime from source

#### Clone the repo

```bash
git clone https://github.com/microsoft/onnxruntime.git
cd onnxruntime
```

#### Build ONNX Runtime for DirectML on Windows

```bash
build.bat --build_shared_lib --skip_tests --parallel --use_dml --config Release
```

#### Build ONNX Runtime for CPU on Windows

```bash
build.bat --build_shared_lib --skip_tests --parallel --config Release
```

#### Build ONNX Runtime for CUDA on Windows
For the short context model.

```bash
build.bat --build_shared_lib --skip_tests --parallel --use_cuda --config Release
git clone https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx
```

#### Build ONNX Runtine on Linux
For the long context model

```bash
./build.sh --build_shared_lib --skip_tests --parallel [--use_cuda] --config Release
git clone https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx
```

You may need to provide extra command line options for building with CUDA on Linux. An example full command is as follows.
These model repositories have models that run with DirectML, CPU and CUDA.

```bash
./build.sh --parallel --build_shared_lib --use_cuda --cuda_version 11.8 --cuda_home /usr/local/cuda-11.8 --cudnn_home /usr/lib/x86_64-linux-gnu/ --config Release --build_wheel --skip_tests --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="80" --cmake_extra_defines CMAKE_CUDA_COMPILER=/usr/local/cuda-11.8/bin/nvcc
```

Replace the values given above for different versions and locations of CUDA.
## Install the generate() API package

#### Build ONNX Runtime on Mac
### DirectML

```bash
./build.sh --build_shared_lib --skip_tests --parallel --config Release
```

### Build the generate() API from source

#### Clone the repo

```bash
git clone https://github.com/microsoft/onnxruntime-genai
cd onnxruntime-genai
mkdir -p ort/include
mkdir -p ort/lib
pip install numpy
pip install --pre onnxruntime-genai-directml
```

#### Build the generate() API on Windows


If building for DirectML
### CPU

```bash
copy ..\onnxruntime\include\onnxruntime\core\providers\dml\dml_provider_factory.h ort\include
```

```bash
copy ..\onnxruntime\include\onnxruntime\core\session\onnxruntime_c_api.h ort\include
copy ..\onnxruntime\build\Windows\Release\Release\*.dll ort\lib
copy ..\onnxruntime\build\Windows\Release\Release\onnxruntime.lib ort\lib
python build.py [--use_dml | --use_cuda]
cd build\wheel
pip install *.whl
pip install numpy
pip install --pre onnxruntime-genai
```

### CUDA

#### Build the generate() API on Linux

```bash
cp ../onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h ort/include
cp ../onnxruntime/build/Linux/Release/libonnxruntime*.so* ort/lib
python build.py [--use_cuda]
cd build/wheel
pip install *.whl
```

#### Build the generate() API on Mac

```bash
cp ../onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h ort/include
cp ../onnxruntime/build/MacOS/Release/libonnxruntime*.dylib* ort/lib
python build.py
cd build/wheel
pip install *.whl
pip install numpy
pip install --pre onnxruntime-genai-cuda --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-genai/pypi/simple/
```

## Run the model

Run the model with [this script](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/model-qa.py).
Run the model with [model-qa.py](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/model-qa.py).

The script accepts a model folder and takes the generation parameters from the config in that model folder. You can also override the parameters on the command line.

This example is using the long context model running with DirectML on Windows.

```bash
pip install numpy
python model-qa.py -m models/phi3-mini-4k-instruct-cpu-int4-rtn-block-32
curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/model-qa.py -o model-qa.py
python model-qa.py -m Phi-3-mini-128k-instruct-onnx/directml/directml-int4-awq-block-128 -l 2048
```

Once the script has loaded the model, it will ask you for input in a loop, streaming the output as it is produced the model. For example:
Expand Down

0 comments on commit 2fa3964

Please sign in to comment.