Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add unit test to check small phi, llama model with the exporter, dort…
…, and benchmark them (#1579) The PR adds two scripts: * ``onnxscript.tools.benchmark.export_model`` * ``onnxscript.tools.benchmark.export_model_batch`` The first one measure the processing time for a model coming from transformers. It checks either eager mode or the exported model on cuda or cpu, with different settings to optimize the model after its export. ```bash python -m onnxscript.tools.benchmark.export_model --ort_optimize 1 --optimization optimize,rewrite,inline,llama0 --exporter dynamo --repeat 10 --warmup 5 --model phi --device cuda --target_opset 18 --config medium --verbose 0 --dtype float32 --dynamic 0 --num_hidden_layers 1 --with_mask 1 --implementation eager --verbose=1 ``` <details> <summary>output</summary> ``` ------------------- [export_model] {'config': 'medium', 'device': 'cuda', 'dtype': 'float32', 'dump_folder': '', 'dump_ort': 1, 'dynamic': 0, 'exporter': 'dynamo', 'implementation': 'eager', 'model': 'phi', 'num_hidden_layers': 1, 'optimization': 'optimize,rewrite,inline,llama0', 'ort_optimize': 1, 'repeat': 10, 'target_opset': 18, 'verbose': 1, 'warmup': 5, 'with_mask': 1} ------------------- [export_model] create the model and inputs for 'phi' and config 'medium' [2024-05-31 18:31:10,210] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [export_model] model created in 8.923117439000634 [export_model] input_shapes=[(torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024])), (torch.Size([2, 1024]), torch.Size([2, 1024]))] [export_model] export to onnx with exporter='dynamo' and optimization='optimize,rewrite,inline,llama0' [common_export] start exporting with 'dynamo' in 'em_phi_dynamo_static_fp32_cuda_medium_h1_0fc57.onnx' 2024-05-31 18:31:17,390 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue return_val due to large size 4194304. 2024-05-31 18:31:17,392 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue full due to large size 4194304. 2024-05-31 18:31:17,455 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue masked_fill due to large size 4194304. 2024-05-31 18:31:17,540 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue return_val due to large size 4194304. 2024-05-31 18:31:17,540 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue unsqueeze_5 due to large size 4194304. 2024-05-31 18:31:17,543 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue return_val due to large size 4194304. 2024-05-31 18:31:17,544 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue unsqueeze_6 due to large size 4194304. 2024-05-31 18:31:17,556 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue slice_5 due to large size 4194304. 2024-05-31 18:31:17,578 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue slice_6 due to large size 4194304. 2024-05-31 18:31:17,595 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue return_val due to large size 8388608. 2024-05-31 18:31:17,595 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue expand_2 due to large size 8388608. 2024-05-31 18:31:17,615 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t due to large size 4194304. 2024-05-31 18:31:17,620 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_1 due to large size 4194304. 2024-05-31 18:31:17,631 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_2 due to large size 4194304. 2024-05-31 18:31:17,937 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_3 due to large size 4194304. 2024-05-31 18:31:17,962 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_4 due to large size 4194304. 2024-05-31 18:31:18,003 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_5 due to large size 4194304. Applied 8 of general pattern rewrite rules. 2024-05-31 18:31:18,897 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue return_val due to large size 4194304. 2024-05-31 18:31:18,898 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue full due to large size 4194304. 2024-05-31 18:31:18,907 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue masked_fill due to large size 4194304. 2024-05-31 18:31:18,921 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue unsqueeze_5 due to large size 4194304. 2024-05-31 18:31:18,924 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue unsqueeze_6 due to large size 4194304. 2024-05-31 18:31:18,927 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue slice_5 due to large size 4194304. 2024-05-31 18:31:18,935 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue slice_6 due to large size 4194304. 2024-05-31 18:31:18,945 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue expand_2 due to large size 8388608. 2024-05-31 18:31:18,950 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t due to large size 4194304. 2024-05-31 18:31:18,951 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_1 due to large size 4194304. 2024-05-31 18:31:18,952 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_2 due to large size 4194304. 2024-05-31 18:31:18,993 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_3 due to large size 4194304. 2024-05-31 18:31:18,995 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_4 due to large size 4194304. 2024-05-31 18:31:19,000 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_5 due to large size 4194304. Applied 0 of general pattern rewrite rules. [common_export] exporter done in 4.657906204996834s [common_export] size of the export: 31.105032920837402 Mb [common_export] start optimization with 'optimize,rewrite,inline,llama0' [optimize_model_proto] start optimize 2024-05-31 18:31:19,800 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue return_val due to large size 4194304. 2024-05-31 18:31:19,801 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue full due to large size 4194304. 2024-05-31 18:31:19,809 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue masked_fill due to large size 4194304. 2024-05-31 18:31:19,820 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue unsqueeze_5 due to large size 4194304. 2024-05-31 18:31:19,821 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue unsqueeze_6 due to large size 4194304. 2024-05-31 18:31:19,824 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue slice_5 due to large size 4194304. 2024-05-31 18:31:19,827 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue slice_6 due to large size 4194304. 2024-05-31 18:31:19,835 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue expand_2 due to large size 8388608. 2024-05-31 18:31:19,840 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t due to large size 4194304. 2024-05-31 18:31:19,842 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_1 due to large size 4194304. 2024-05-31 18:31:19,844 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_2 due to large size 4194304. 2024-05-31 18:31:19,882 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_3 due to large size 4194304. 2024-05-31 18:31:19,886 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_4 due to large size 4194304. 2024-05-31 18:31:19,891 onnxscript.optimizer.constant_folding [WARNING] - Skip storing constant folded nvalue t_5 due to large size 4194304. Applied 0 of general pattern rewrite rules. [optimize_model_proto] optimize done in 0.6248986939972383 [optimize_model_proto] start rewrite [optimize_model_proto] rewrite done in 0.5638751030019193 [optimize_model_proto] start inline [optimize_model_proto] inline done in 0.08630118599830894 [optimize_model_proto] start llama0 [apply_rule_sets] deserialize model [apply_rule_sets] deserialize done in 0.013571298000897514 [apply_rule_sets] applies 'llama0' [apply_rule_sets] llama0 done in 0.010614197999530006 [apply_rule_sets] serialize model [apply_rule_sets] serialize done in 0.046376898000744404 [apply_rule_sets] remove unused [apply_rule_sets] remove unused done in 0.011937999002839206 [optimize_model_proto] llama0 done in 0.08469799299928127 [common_export] optimization done in 1.3604527749994304 [common_export] saves the model in 'em_phi_dynamo_static_fp32_cuda_medium_h1_0fc57.onnx' [common_export] done saving in 0.07749029800106655 [export_model] export to onnx done in 6.120739973997843 [run_inference] create session with providers ['CUDAExecutionProvider', 'CPUExecutionProvider'] [run_inference] created session in 1.4842597490023763 [run_inference] start 5 warmup iterations [run_inference] warmup done in 0.12163159599731443 [run_inference] start 10 iterations [run_inference] measure done in 0.18200129300021217 [export_model] end ------------------------------ :config,medium; :device,cuda; :dtype,float32; :dump_folder,; :dump_ort,1; :dynamic,0; :exporter,dynamo; :implementation,eager; :model,phi; :num_hidden_layers,1; :optimization,optimize,rewrite,inline,llama0; :ort_optimize,1; :repeat,10; :target_opset,18; :verbose,1; :warmup,5; :with_mask,1; :deserialize_time,0.046376898000744404; :export_time,4.65790070499861; :opt_inline_time,0.08630118599830894; :opt_llama0_time,0.08469799299928127; :opt_optimize_time,0.6248986939972383; :opt_remove_unused_time,0.011937999002839206; :opt_rewrite_time,0.5638751030019193; :opt_rule_llama0_time,0.010614197999530006; :optimization_time,1.3604527749994304; :ort_session_create_time,1.4842597490023763; :providers,CUDAExecutionProvider,CPUExecutionProvider; :repeat,10; :repeat_iter,[0.017213798997545382, 0.01684389899673988, 0.026196798997261794, 0.01845099999991362, 0.017145399000582984, 0.017206399999849964, 0.017150798998045502, 0.017264098998566624, 0.0171972000025562, 0.01728169900161447]; :repeat_time,0.018199809299767368; :warmup,5; :warmup_iter,[0.03227269899798557, 0.02073639900117996, 0.017575799000042025, 0.017586000001756474, 0.03341759899922181]; :warmup_time,0.024323979199834866; ``` </details> The second one measures runs the previous script for the same configuration with different optimization settings. It is used to compare optimized model again eager mode. It extracts all expressions ``:<metric>,<value>;`` from the standard otuput and merges them into a csv file. ```bash python -m onnxscript.tools.benchmark.export_model_batch --model phi --device cuda --config medium --num_hidden_layers=1 --dtype=float32 --dynamic=0 --verbose=1 ``` --------- Signed-off-by: Xavier Dupre <[email protected]> Signed-off-by: xadupre <[email protected]> Co-authored-by: Justin Chu <[email protected]>
- Loading branch information