Skip to content

Commit

Permalink
Add unit test to check small phi, llama model with the exporter, dort…
Browse files Browse the repository at this point in the history
…, and benchmark them (#1579)

The PR adds two scripts:

* ``onnxscript.tools.benchmark.export_model``
* ``onnxscript.tools.benchmark.export_model_batch``

The first one measure the processing time for a model coming from
transformers.
It checks either eager mode or the exported model on cuda or cpu, with
different settings
to optimize the model after its export.

```bash
python -m onnxscript.tools.benchmark.export_model --ort_optimize 1 --optimization optimize,rewrite,inline,llama0 --exporter dynamo --repeat 10 --warmup 5 --model phi --device cuda --target_opset 18 --config medium --verbose 0 --dtype float32 --dynamic 0 --num_hidden_layers 1 --with_mask 1 --implementation eager --verbose=1
```

<details>

<summary>output</summary>

```
-------------------
[export_model]
{'config': 'medium',
 'device': 'cuda',
 'dtype': 'float32',
 'dump_folder': '',
 'dump_ort': 1,
 'dynamic': 0,
 'exporter': 'dynamo',
 'implementation': 'eager',
 'model': 'phi',
 'num_hidden_layers': 1,
 'optimization': 'optimize,rewrite,inline,llama0',
 'ort_optimize': 1,
 'repeat': 10,
 'target_opset': 18,
 'verbose': 1,
 'warmup': 5,
 'with_mask': 1}
-------------------
[export_model] create the model and inputs for 'phi' and config 'medium'
[2024-05-31 18:31:10,210] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[export_model] model created in 8.923117439000634
[export_model] input_shapes=[(torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024]))]
[export_model] export to onnx with exporter='dynamo' and optimization='optimize,rewrite,inline,llama0'
[common_export] start exporting with 'dynamo' in 'em_phi_dynamo_static_fp32_cuda_medium_h1_0fc57.onnx'
2024-05-31 18:31:17,390 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue return_val due to large size 4194304.
2024-05-31 18:31:17,392 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue full due to large size 4194304.
2024-05-31 18:31:17,455 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue masked_fill due to large size 4194304.
2024-05-31 18:31:17,540 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue return_val due to large size 4194304.
2024-05-31 18:31:17,540 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue unsqueeze_5 due to large size 4194304.
2024-05-31 18:31:17,543 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue return_val due to large size 4194304.
2024-05-31 18:31:17,544 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue unsqueeze_6 due to large size 4194304.
2024-05-31 18:31:17,556 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue slice_5 due to large size 4194304.
2024-05-31 18:31:17,578 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue slice_6 due to large size 4194304.
2024-05-31 18:31:17,595 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue return_val due to large size 8388608.
2024-05-31 18:31:17,595 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue expand_2 due to large size 8388608.
2024-05-31 18:31:17,615 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t due to large size 4194304.
2024-05-31 18:31:17,620 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_1 due to large size 4194304.
2024-05-31 18:31:17,631 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_2 due to large size 4194304.
2024-05-31 18:31:17,937 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_3 due to large size 4194304.
2024-05-31 18:31:17,962 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_4 due to large size 4194304.
2024-05-31 18:31:18,003 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_5 due to large size 4194304.
Applied 8 of general pattern rewrite rules.
2024-05-31 18:31:18,897 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue return_val due to large size 4194304.
2024-05-31 18:31:18,898 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue full due to large size 4194304.
2024-05-31 18:31:18,907 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue masked_fill due to large size 4194304.
2024-05-31 18:31:18,921 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue unsqueeze_5 due to large size 4194304.
2024-05-31 18:31:18,924 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue unsqueeze_6 due to large size 4194304.
2024-05-31 18:31:18,927 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue slice_5 due to large size 4194304.
2024-05-31 18:31:18,935 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue slice_6 due to large size 4194304.
2024-05-31 18:31:18,945 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue expand_2 due to large size 8388608.
2024-05-31 18:31:18,950 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t due to large size 4194304.
2024-05-31 18:31:18,951 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_1 due to large size 4194304.
2024-05-31 18:31:18,952 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_2 due to large size 4194304.
2024-05-31 18:31:18,993 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_3 due to large size 4194304.
2024-05-31 18:31:18,995 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_4 due to large size 4194304.
2024-05-31 18:31:19,000 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_5 due to large size 4194304.
Applied 0 of general pattern rewrite rules.
[common_export] exporter done in 4.657906204996834s
[common_export] size of the export: 31.105032920837402 Mb
[common_export] start optimization with 'optimize,rewrite,inline,llama0'
[optimize_model_proto] start optimize
2024-05-31 18:31:19,800 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue return_val due to large size 4194304.
2024-05-31 18:31:19,801 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue full due to large size 4194304.
2024-05-31 18:31:19,809 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue masked_fill due to large size 4194304.
2024-05-31 18:31:19,820 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue unsqueeze_5 due to large size 4194304.
2024-05-31 18:31:19,821 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue unsqueeze_6 due to large size 4194304.
2024-05-31 18:31:19,824 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue slice_5 due to large size 4194304.
2024-05-31 18:31:19,827 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue slice_6 due to large size 4194304.
2024-05-31 18:31:19,835 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue expand_2 due to large size 8388608.
2024-05-31 18:31:19,840 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t due to large size 4194304.
2024-05-31 18:31:19,842 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_1 due to large size 4194304.
2024-05-31 18:31:19,844 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_2 due to large size 4194304.
2024-05-31 18:31:19,882 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_3 due to large size 4194304.
2024-05-31 18:31:19,886 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_4 due to large size 4194304.
2024-05-31 18:31:19,891 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_5 due to large size 4194304.
Applied 0 of general pattern rewrite rules.
[optimize_model_proto] optimize done in 0.6248986939972383
[optimize_model_proto] start rewrite
[optimize_model_proto] rewrite done in 0.5638751030019193
[optimize_model_proto] start inline
[optimize_model_proto] inline done in 0.08630118599830894
[optimize_model_proto] start llama0
[apply_rule_sets] deserialize model
[apply_rule_sets] deserialize done in 0.013571298000897514
[apply_rule_sets] applies 'llama0'
[apply_rule_sets] llama0 done in 0.010614197999530006
[apply_rule_sets] serialize model
[apply_rule_sets] serialize done in 0.046376898000744404
[apply_rule_sets] remove unused
[apply_rule_sets] remove unused done in 0.011937999002839206
[optimize_model_proto] llama0 done in 0.08469799299928127
[common_export] optimization done in 1.3604527749994304
[common_export] saves the model in 'em_phi_dynamo_static_fp32_cuda_medium_h1_0fc57.onnx'
[common_export] done saving in 0.07749029800106655
[export_model] export to onnx done in 6.120739973997843
[run_inference] create session with providers ['CUDAExecutionProvider', 'CPUExecutionProvider']
[run_inference] created session in 1.4842597490023763
[run_inference] start 5 warmup iterations
[run_inference] warmup done in 0.12163159599731443
[run_inference] start 10 iterations
[run_inference] measure done in 0.18200129300021217
[export_model] end
------------------------------
:config,medium;
:device,cuda;
:dtype,float32;
:dump_folder,;
:dump_ort,1;
:dynamic,0;
:exporter,dynamo;
:implementation,eager;
:model,phi;
:num_hidden_layers,1;
:optimization,optimize,rewrite,inline,llama0;
:ort_optimize,1;
:repeat,10;
:target_opset,18;
:verbose,1;
:warmup,5;
:with_mask,1;
:deserialize_time,0.046376898000744404;
:export_time,4.65790070499861;
:opt_inline_time,0.08630118599830894;
:opt_llama0_time,0.08469799299928127;
:opt_optimize_time,0.6248986939972383;
:opt_remove_unused_time,0.011937999002839206;
:opt_rewrite_time,0.5638751030019193;
:opt_rule_llama0_time,0.010614197999530006;
:optimization_time,1.3604527749994304;
:ort_session_create_time,1.4842597490023763;
:providers,CUDAExecutionProvider,CPUExecutionProvider;
:repeat,10;
:repeat_iter,[0.017213798997545382, 0.01684389899673988, 0.026196798997261794, 0.01845099999991362, 0.017145399000582984, 0.017206399999849964, 0.017150798998045502, 0.017264098998566624, 0.0171972000025562, 0.01728169900161447];
:repeat_time,0.018199809299767368;
:warmup,5;
:warmup_iter,[0.03227269899798557, 0.02073639900117996, 0.017575799000042025, 0.017586000001756474, 0.03341759899922181];
:warmup_time,0.024323979199834866;
```

</details>


The second one measures runs the previous script for the same
configuration with different optimization settings. It is used to
compare optimized model again eager mode. It extracts all expressions
``:<metric>,<value>;`` from the standard otuput and merges them into a
csv file.

```bash
python -m onnxscript.tools.benchmark.export_model_batch --model phi --device cuda --config medium --num_hidden_layers=1 --dtype=float32 --dynamic=0 --verbose=1
```

---------

Signed-off-by: Xavier Dupre <[email protected]>
Signed-off-by: xadupre <[email protected]>
Co-authored-by: Justin Chu <[email protected]>
  • Loading branch information
xadupre and justinchuby authored Jun 6, 2024
1 parent 8b1a63b commit 87b3006
Show file tree
Hide file tree
Showing 15 changed files with 1,643 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ coverage.xml
.pytest_cache/
cover/
test-output.xml
*.sarif

# Sphinx documentation
docs/_build/
Expand Down Expand Up @@ -93,6 +94,8 @@ dmypy.json

# Generated files
*.onnx
*.csv
*.xlsx
!testdata/**/*.onnx
*.onnxlib
**/onnx_backend_test_code/**
Expand Down
9 changes: 9 additions & 0 deletions docs/api/index.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,17 @@
# API

## Author Models

```{toctree}
decorator
opsets
converter
values
```

## Tests and Tools

```{toctree}
testing
tools
```
6 changes: 6 additions & 0 deletions docs/api/testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Testing

```{eval-rst}
.. automodule:: onnxscript.testing
:members:
```
11 changes: 11 additions & 0 deletions docs/api/tools.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Tools

## Transformers Models

```{eval-rst}
.. autofunction:: onnxscript.tools.transformers_models.get_model_and_inputs
```

```{eval-rst}
.. autofunction:: onnxscript.tools.transformers_models.phi.get_phi_model_config
```
4 changes: 4 additions & 0 deletions onnxscript/tools/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
17 changes: 17 additions & 0 deletions onnxscript/tools/benchmark/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
from onnxscript.tools.benchmark.benchmark_helpers import (
common_export,
get_parsed_args,
run_inference,
run_onnx_inference,
)

__all__ = [
"get_parsed_args",
"common_export",
"run_inference",
"run_onnx_inference",
]
Loading

0 comments on commit 87b3006

Please sign in to comment.