-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add arm64 bfloat16 fastmath mode option for transformers benchmarking script #19294
Conversation
I think the models exported for benchmark is either fp16 or fp32. |
Hi @tianleiwu , thanks for the review. Here I'm not running the bf16 inference. The model is still in fp32 (src, weights, dest, bias). Recently I added a feature PR to support fastmath mode for sgemm kernels where (inside MLAS) we convert the weights to bf16 to speedup the gemm. This PR is to call that feature from transformers benchmark. |
I see. @snadampal, The python format pipeline failed, and it is required to pass. Please set up lintrunner locally: https://github.com/microsoft/onnxruntime/blob/main/docs/Coding_Conventions_and_Standards.md#linting. Then run |
b148316
to
22cfdc0
Compare
@tianleiwu, I have fixed the python lint error and updated the PR, please check if it looks good now. thank you! |
… script launch benchmarking script with the following argument to enable and test aarch64 bfloat16 fastmath gemm kernels. "python benchmark.py --enable_arm64_bfloat16_fastmath_mlas_gemm"
22cfdc0
to
cdd244f
Compare
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline |
/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows ARM64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, ONNX Runtime React Native CI Pipeline, Windows x64 QNN CI Pipeline |
/azp run Linux MIGraphX CI Pipeline, orttraining-amd-gpu-ci-pipeline |
Azure Pipelines successfully started running 2 pipeline(s). |
Azure Pipelines successfully started running 9 pipeline(s). |
1 similar comment
Azure Pipelines successfully started running 9 pipeline(s). |
the GPU CI failure doesn't seem to be related to this PR.
|
Hi @tianleiwu , please let me know if anything is required from my side to get this PR merged. also wondering if it can be merged to rel-1.17.1 branch as well. Thank you! |
/azp run Big Models |
Azure Pipelines successfully started running 1 pipeline(s). |
@snadampal, there are one required pipeline is still running. I think it can be merged once the pipeline finished. |
thanks @tianleiwu for merging it. |
Description
add arm64 bfloat16 fastmath mode option for transformers benchmarking script
Motivation and Context
onnxruntime now supports bfloat16 fastmath gemm kernels for arm64 platforms with bfloat16 instruction support. This PR updates benchmark scripts to test that mode.