Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
wangyems committed Jan 30, 2024
1 parent f7a80dd commit c588b44
Show file tree
Hide file tree
Showing 7 changed files with 87 additions and 34 deletions.
1 change: 1 addition & 0 deletions cmake/onnxruntime_python.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -547,6 +547,7 @@ add_custom_command(
COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/transformers/models/gpt2
COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/transformers/models/llama
COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/transformers/models/longformer
COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/transformers/models/phi2
COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/transformers/models/stable_diffusion
COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/transformers/models/t5
COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/transformers/models/whisper
Expand Down
93 changes: 74 additions & 19 deletions onnxruntime/python/tools/transformers/models/phi2/README.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,105 @@
# Phi2 Optimizations
## Prerequisites
```
git clone [email protected]:microsoft/onnxruntime.git
cd onnxruntime/onnxruntime/python/tools/transformers/models/phi2
pip install -r requirements.txt
```

## Export optimized onnx model for different senarios

- Export FP32 ONNX model for CPU
- From source: \
pip install onnxruntime-gpu==1.17.0
```
python convert_to_onnx.py --fp32_cpu
git clone [email protected]:microsoft/onnxruntime.git
cd onnxruntime/onnxruntime/python/tools/transformers
python -m models.phi2.convert_to_onnx -h
```
- Export INT4 ONNX model for CPU
- From wheel: \
pip install [ort-nightly-gpu](https://onnxruntime.ai/docs/install/)
```
python -m onnxruntime.transformers.models.phi2.convert_to_onnx -h
```

## Export optimized phi2 onnx model for different senarios
- Export FP32 ONNX model for Nvidia GPUs \
From source:
```
python convert_to_onnx.py --int4_cpu
python -m models.phi2.convert_to_onnx --fp32_gpu
```
- Export FP32 ONNX model for Nvidia GPUs
From wheel:
```
python convert_to_onnx.py --fp32_gpu
python -m onnxruntime.transformers.models.phi2.convert_to_onnx --fp32_gpu
```
- Export FP16 ONNX model for Nvidia GPUs
From source:
```
python -m models.phi2.convert_to_onnx --fp32_gpu
```
- python convert_to_onnx.py --fp16_gpu
From wheel:
```
python -m onnxruntime.transformers.models.phi2.convert_to_onnx --fp32_gpu
```
- Export INT4 ONNX model for Nvidia GPUs
From source:
```
python -m models.phi2.convert_to_onnx --fp32_gpu
```
python convert_to_onnx.py --int4_gpu
From wheel:
```
python -m onnxruntime.transformers.models.phi2.convert_to_onnx --fp32_gpu
```
- Export FP16 ONNX model for Nvidia A100
From source:
```
python -m models.phi2.convert_to_onnx --fp16_a100
```
python convert_to_onnx.py --fp16_a100
From wheel:
```
python -m onnxruntime.transformers.models.phi2.convert_to_onnx --fp16_a100
```
- Export INT4 ONNX model for Nvidia A100
From source:
```
python convert_to_onnx.py --int4_a100
python -m models.phi2.convert_to_onnx --int4_a100
```
- Export all of them
From wheel:
```
python convert_to_onnx.py --fp32_cpu --int4_cpu --fp32_gpu --fp16_gpu --int4_gpu --fp16_a100 --int4_a100
python -m onnxruntime.transformers.models.phi2.convert_to_onnx --int4_a100
```
- Export FP32 ONNX model for CPU
From source:
```
python -m models.phi2.convert_to_onnx --fp32_cpu
```
From wheel:
```
python -m onnxruntime.transformers.models.phi2.convert_to_onnx --fp32_cpu
```
- Export INT4 ONNX model for CPU
From source:
```
python -m models.phi2.convert_to_onnx --int4_cpu
```
From wheel:
```
python -m onnxruntime.transformers.models.phi2.convert_to_onnx --int4_cpu
```
- Export all of them at once
From source:
```
python -m models.phi2.convert_to_onnx --fp32_cpu --int4_cpu --fp32_gpu --fp16_gpu --int4_gpu --fp16_a100 --int4_a100
```
From wheel:
```
python -m onnxruntime.transformers.models.phi2.convert_to_onnx --fp32_cpu --int4_cpu --fp32_gpu --fp16_gpu --int4_gpu --fp16_a100 --int4_a100
```
## Run example with ORT and benchmark
- Export FP16 ONNX model for Nvidia A100 and run example
- (e.g) Export FP16 and INT4 ONNX models for Nvidia A100 and run examples.
From source:
```
python -m models.phi2.convert_to_onnx --fp16_a100 --int4_a100 --run_example
```
From wheel:
```
python convert_to_onnx.py --fp16_a100 --run_example
python -m onnxruntime.transformers.models.phi2.convert_to_onnx --fp16_a100 --int4_a100 --run_example
```
The inference example currently supports all models running on CUDA.

## Limitations
There's a known issue that symbolic shape inference will fail. It can be ignored at the moment as it won't affect the optimized model's inference.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,6 @@
from onnxruntime.quantization.matmul_4bits_quantizer import MatMul4BitsQuantizer
from transformers import AutoConfig, AutoModelForCausalLM

## uncomment the following lines to use the local files instead of the pip installed version
## --------------------------------------------------------------------------
# import sys

# sys.path.append(os.path.dirname(__file__))

# transformers_dir = os.path.normpath(os.path.join(os.path.dirname(__file__), "..", ".."))
# if transformers_dir not in sys.path:
# sys.path.append(transformers_dir)
## --------------------------------------------------------------------------

from benchmark_helper import Precision


Expand Down Expand Up @@ -509,18 +498,21 @@ def run_optimize_phi2_onnx(
from inference_example import run_phi2

if args.fp16_a100:
logging.info("Running fp16_a100 example...")
run_phi2(
onnx_model_path=model_type_to_args["fp16_a100"][2],
use_buffer_share=True,
device_id=args.device_id,
)
if args.int4_a100:
logging.info("Running int4_a100 example...")
run_phi2(
onnx_model_path=model_type_to_args["int4_a100"][2],
use_buffer_share=True,
device_id=args.device_id,
)
if args.fp32_gpu:
logging.info("Running fp32_gpu example...")
run_phi2(
onnx_model_path=model_type_to_args["fp32_gpu"][2],
use_buffer_share=False,
Expand All @@ -529,13 +521,15 @@ def run_optimize_phi2_onnx(
use_fp16=False,
)
if args.fp16_gpu:
logging.info("Running fp16_gpu example...")
run_phi2(
onnx_model_path=model_type_to_args["fp16_gpu"][2],
use_buffer_share=False,
device_id=args.device_id,
packed_kv=True,
)
if args.int4_gpu:
logging.info("Running int4_gpu example...")
run_phi2(
onnx_model_path=model_type_to_args["int4_gpu"][2],
use_buffer_share=False,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,10 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------

import logging
import onnx
import os

from onnx import ModelProto


Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
onnx>=1.15.0
transformers>=4.36.2
onnxscript>=0.1.0.dev20240126

--extra-index-url https://download.pytorch.org/whl/nightly/cu121
torch>=2.3.0.dev20240126+cu121
torch==2.2.0
# --extra-index-url https://download.pytorch.org/whl/nightly/cu121
# torch>=2.3.0.dev20240126+cu121
1 change: 1 addition & 0 deletions onnxruntime/python/tools/transformers/onnx_model_phi.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

logger = getLogger(__name__)


class ProcessGemmWFunc:
def __call__(self, x):
return np.transpose(x, (1, 0))
Expand Down
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -416,6 +416,7 @@ def finalize_options(self):
"onnxruntime.transformers.models.gpt2",
"onnxruntime.transformers.models.llama",
"onnxruntime.transformers.models.longformer",
"onnxruntime.transformers.models.phi2",
"onnxruntime.transformers.models.t5",
"onnxruntime.transformers.models.stable_diffusion",
"onnxruntime.transformers.models.whisper",
Expand Down

0 comments on commit c588b44

Please sign in to comment.