-
Notifications
You must be signed in to change notification settings - Fork 3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
7 changed files
with
87 additions
and
34 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
93 changes: 74 additions & 19 deletions
93
onnxruntime/python/tools/transformers/models/phi2/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,50 +1,105 @@ | ||
# Phi2 Optimizations | ||
## Prerequisites | ||
``` | ||
git clone [email protected]:microsoft/onnxruntime.git | ||
cd onnxruntime/onnxruntime/python/tools/transformers/models/phi2 | ||
pip install -r requirements.txt | ||
``` | ||
|
||
## Export optimized onnx model for different senarios | ||
|
||
- Export FP32 ONNX model for CPU | ||
- From source: \ | ||
pip install onnxruntime-gpu==1.17.0 | ||
``` | ||
python convert_to_onnx.py --fp32_cpu | ||
git clone [email protected]:microsoft/onnxruntime.git | ||
cd onnxruntime/onnxruntime/python/tools/transformers | ||
python -m models.phi2.convert_to_onnx -h | ||
``` | ||
- Export INT4 ONNX model for CPU | ||
- From wheel: \ | ||
pip install [ort-nightly-gpu](https://onnxruntime.ai/docs/install/) | ||
``` | ||
python -m onnxruntime.transformers.models.phi2.convert_to_onnx -h | ||
``` | ||
|
||
## Export optimized phi2 onnx model for different senarios | ||
- Export FP32 ONNX model for Nvidia GPUs \ | ||
From source: | ||
``` | ||
python convert_to_onnx.py --int4_cpu | ||
python -m models.phi2.convert_to_onnx --fp32_gpu | ||
``` | ||
- Export FP32 ONNX model for Nvidia GPUs | ||
From wheel: | ||
``` | ||
python convert_to_onnx.py --fp32_gpu | ||
python -m onnxruntime.transformers.models.phi2.convert_to_onnx --fp32_gpu | ||
``` | ||
- Export FP16 ONNX model for Nvidia GPUs | ||
From source: | ||
``` | ||
python -m models.phi2.convert_to_onnx --fp32_gpu | ||
``` | ||
- python convert_to_onnx.py --fp16_gpu | ||
From wheel: | ||
``` | ||
python -m onnxruntime.transformers.models.phi2.convert_to_onnx --fp32_gpu | ||
``` | ||
- Export INT4 ONNX model for Nvidia GPUs | ||
From source: | ||
``` | ||
python -m models.phi2.convert_to_onnx --fp32_gpu | ||
``` | ||
python convert_to_onnx.py --int4_gpu | ||
From wheel: | ||
``` | ||
python -m onnxruntime.transformers.models.phi2.convert_to_onnx --fp32_gpu | ||
``` | ||
- Export FP16 ONNX model for Nvidia A100 | ||
From source: | ||
``` | ||
python -m models.phi2.convert_to_onnx --fp16_a100 | ||
``` | ||
python convert_to_onnx.py --fp16_a100 | ||
From wheel: | ||
``` | ||
python -m onnxruntime.transformers.models.phi2.convert_to_onnx --fp16_a100 | ||
``` | ||
- Export INT4 ONNX model for Nvidia A100 | ||
From source: | ||
``` | ||
python convert_to_onnx.py --int4_a100 | ||
python -m models.phi2.convert_to_onnx --int4_a100 | ||
``` | ||
- Export all of them | ||
From wheel: | ||
``` | ||
python convert_to_onnx.py --fp32_cpu --int4_cpu --fp32_gpu --fp16_gpu --int4_gpu --fp16_a100 --int4_a100 | ||
python -m onnxruntime.transformers.models.phi2.convert_to_onnx --int4_a100 | ||
``` | ||
- Export FP32 ONNX model for CPU | ||
From source: | ||
``` | ||
python -m models.phi2.convert_to_onnx --fp32_cpu | ||
``` | ||
From wheel: | ||
``` | ||
python -m onnxruntime.transformers.models.phi2.convert_to_onnx --fp32_cpu | ||
``` | ||
- Export INT4 ONNX model for CPU | ||
From source: | ||
``` | ||
python -m models.phi2.convert_to_onnx --int4_cpu | ||
``` | ||
From wheel: | ||
``` | ||
python -m onnxruntime.transformers.models.phi2.convert_to_onnx --int4_cpu | ||
``` | ||
- Export all of them at once | ||
From source: | ||
``` | ||
python -m models.phi2.convert_to_onnx --fp32_cpu --int4_cpu --fp32_gpu --fp16_gpu --int4_gpu --fp16_a100 --int4_a100 | ||
``` | ||
From wheel: | ||
``` | ||
python -m onnxruntime.transformers.models.phi2.convert_to_onnx --fp32_cpu --int4_cpu --fp32_gpu --fp16_gpu --int4_gpu --fp16_a100 --int4_a100 | ||
``` | ||
## Run example with ORT and benchmark | ||
- Export FP16 ONNX model for Nvidia A100 and run example | ||
- (e.g) Export FP16 and INT4 ONNX models for Nvidia A100 and run examples. | ||
From source: | ||
``` | ||
python -m models.phi2.convert_to_onnx --fp16_a100 --int4_a100 --run_example | ||
``` | ||
From wheel: | ||
``` | ||
python convert_to_onnx.py --fp16_a100 --run_example | ||
python -m onnxruntime.transformers.models.phi2.convert_to_onnx --fp16_a100 --int4_a100 --run_example | ||
``` | ||
The inference example currently supports all models running on CUDA. | ||
|
||
## Limitations | ||
There's a known issue that symbolic shape inference will fail. It can be ignored at the moment as it won't affect the optimized model's inference. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
6 changes: 3 additions & 3 deletions
6
onnxruntime/python/tools/transformers/models/phi2/requirements.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
onnx>=1.15.0 | ||
transformers>=4.36.2 | ||
onnxscript>=0.1.0.dev20240126 | ||
|
||
--extra-index-url https://download.pytorch.org/whl/nightly/cu121 | ||
torch>=2.3.0.dev20240126+cu121 | ||
torch==2.2.0 | ||
# --extra-index-url https://download.pytorch.org/whl/nightly/cu121 | ||
# torch>=2.3.0.dev20240126+cu121 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters