Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add arm64 bfloat16 fastmath mode option for transformers benchmarking script #19294

Merged
merged 1 commit into from
Feb 12, 2024

Conversation

snadampal
Copy link
Contributor

@snadampal snadampal commented Jan 28, 2024

Description

add arm64 bfloat16 fastmath mode option for transformers benchmarking script

Motivation and Context

onnxruntime now supports bfloat16 fastmath gemm kernels for arm64 platforms with bfloat16 instruction support. This PR updates benchmark scripts to test that mode.

@tianleiwu
Copy link
Contributor

I think the models exported for benchmark is either fp16 or fp32.
If you want to benchmark bfloat16, you will need add an option to export bfloat16 onnx model first.

@snadampal
Copy link
Contributor Author

Hi @tianleiwu , thanks for the review. Here I'm not running the bf16 inference. The model is still in fp32 (src, weights, dest, bias). Recently I added a feature PR to support fastmath mode for sgemm kernels where (inside MLAS) we convert the weights to bf16 to speedup the gemm. This PR is to call that feature from transformers benchmark.

@tianleiwu
Copy link
Contributor

tianleiwu commented Jan 29, 2024

Hi @tianleiwu , thanks for the review. Here I'm not running the bf16 inference. The model is still in fp32 (src, weights, dest, bias). Recently I added a feature PR to support fastmath mode for sgemm kernels where (inside MLAS) we convert the weights to bf16 to speedup the gemm. This PR is to call that feature from transformers benchmark.

I see.

@snadampal, The python format pipeline failed, and it is required to pass. Please set up lintrunner locally: https://github.com/microsoft/onnxruntime/blob/main/docs/Coding_Conventions_and_Standards.md#linting.

Then run lintrunner -a to format the code.

@snadampal
Copy link
Contributor Author

@tianleiwu, I have fixed the python lint error and updated the PR, please check if it looks good now. thank you!

… script

launch benchmarking script with the following argument to enable and test
aarch64 bfloat16 fastmath gemm kernels.

"python benchmark.py --enable_arm64_bfloat16_fastmath_mlas_gemm"
@tianleiwu
Copy link
Contributor

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows ARM64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, ONNX Runtime React Native CI Pipeline, Windows x64 QNN CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run Linux MIGraphX CI Pipeline, orttraining-amd-gpu-ci-pipeline

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@snadampal
Copy link
Contributor Author

the GPU CI failure doesn't seem to be related to this PR.

ProviderOptionsTest > testCUDAOptions() FAILED
    org.opentest4j.AssertionFailedError: array contents differ at index [111], expected: <0.132055> but was: <0.13206385>
        at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
        at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
        at app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440)
        at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290)
        at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123)
        at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119)
        at app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360)
        at app//ai.onnxruntime.providers.ProviderOptionsTest.runProvider(ProviderOptionsTest.java:99)
        at app//ai.onnxruntime.providers.ProviderOptionsTest.testCUDAOptions(ProviderOptionsTest.java:43)

@snadampal
Copy link
Contributor Author

Hi @tianleiwu , please let me know if anything is required from my side to get this PR merged. also wondering if it can be merged to rel-1.17.1 branch as well. Thank you!

@tianleiwu
Copy link
Contributor

/azp run Big Models

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tianleiwu
Copy link
Contributor

@snadampal, there are one required pipeline is still running. I think it can be merged once the pipeline finished.
The 1.17.1 patch release only accepts bug fix and this PR does not fit that. You can use nightly package if needed.

@tianleiwu tianleiwu merged commit 7fa6f4f into microsoft:main Feb 12, 2024
77 of 78 checks passed
@snadampal
Copy link
Contributor Author

thanks @tianleiwu for merging it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants