add arm64 bfloat16 fastmath mode option for transformers benchmarking script #19294

snadampal · 2024-01-28T02:44:33Z

Description

add arm64 bfloat16 fastmath mode option for transformers benchmarking script

Motivation and Context

onnxruntime now supports bfloat16 fastmath gemm kernels for arm64 platforms with bfloat16 instruction support. This PR updates benchmark scripts to test that mode.

tianleiwu · 2024-01-29T06:42:57Z

I think the models exported for benchmark is either fp16 or fp32.
If you want to benchmark bfloat16, you will need add an option to export bfloat16 onnx model first.

snadampal · 2024-01-29T12:50:10Z

Hi @tianleiwu , thanks for the review. Here I'm not running the bf16 inference. The model is still in fp32 (src, weights, dest, bias). Recently I added a feature PR to support fastmath mode for sgemm kernels where (inside MLAS) we convert the weights to bf16 to speedup the gemm. This PR is to call that feature from transformers benchmark.

tianleiwu · 2024-01-29T21:41:03Z

Hi @tianleiwu , thanks for the review. Here I'm not running the bf16 inference. The model is still in fp32 (src, weights, dest, bias). Recently I added a feature PR to support fastmath mode for sgemm kernels where (inside MLAS) we convert the weights to bf16 to speedup the gemm. This PR is to call that feature from transformers benchmark.

I see.

@snadampal, The python format pipeline failed, and it is required to pass. Please set up lintrunner locally: https://github.com/microsoft/onnxruntime/blob/main/docs/Coding_Conventions_and_Standards.md#linting.

Then run lintrunner -a to format the code.

snadampal · 2024-01-31T04:49:36Z

@tianleiwu, I have fixed the python lint error and updated the PR, please check if it looks good now. thank you!

… script launch benchmarking script with the following argument to enable and test aarch64 bfloat16 fastmath gemm kernels. "python benchmark.py --enable_arm64_bfloat16_fastmath_mlas_gemm"

tianleiwu · 2024-02-06T17:22:13Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

tianleiwu · 2024-02-06T17:22:21Z

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows ARM64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, ONNX Runtime React Native CI Pipeline, Windows x64 QNN CI Pipeline

tianleiwu · 2024-02-06T17:22:26Z

/azp run Linux MIGraphX CI Pipeline, orttraining-amd-gpu-ci-pipeline

azure-pipelines · 2024-02-06T17:22:42Z

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines · 2024-02-06T17:22:59Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2024-02-06T17:23:02Z

Azure Pipelines successfully started running 9 pipeline(s).

snadampal · 2024-02-06T18:56:16Z

the GPU CI failure doesn't seem to be related to this PR.

ProviderOptionsTest > testCUDAOptions() FAILED
    org.opentest4j.AssertionFailedError: array contents differ at index [111], expected: <0.132055> but was: <0.13206385>
        at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
        at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
        at app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440)
        at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290)
        at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123)
        at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119)
        at app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360)
        at app//ai.onnxruntime.providers.ProviderOptionsTest.runProvider(ProviderOptionsTest.java:99)
        at app//ai.onnxruntime.providers.ProviderOptionsTest.testCUDAOptions(ProviderOptionsTest.java:43)

snadampal · 2024-02-08T23:19:00Z

Hi @tianleiwu , please let me know if anything is required from my side to get this PR merged. also wondering if it can be merged to rel-1.17.1 branch as well. Thank you!

tianleiwu · 2024-02-12T17:56:06Z

/azp run Big Models

azure-pipelines · 2024-02-12T17:56:18Z

Azure Pipelines successfully started running 1 pipeline(s).

tianleiwu · 2024-02-12T17:59:27Z

@snadampal, there are one required pipeline is still running. I think it can be merged once the pipeline finished.
The 1.17.1 patch release only accepts bug fix and this PR does not fit that. You can use nightly package if needed.

snadampal · 2024-02-13T15:36:16Z

thanks @tianleiwu for merging it.

snadampal force-pushed the benchmark_update branch from b148316 to 22cfdc0 Compare January 31, 2024 04:46

add arm64 bfloat16 fastmath mode option for transformers benchmarking…

cdd244f

… script launch benchmarking script with the following argument to enable and test aarch64 bfloat16 fastmath gemm kernels. "python benchmark.py --enable_arm64_bfloat16_fastmath_mlas_gemm"

snadampal force-pushed the benchmark_update branch from 22cfdc0 to cdd244f Compare January 31, 2024 05:13

tianleiwu approved these changes Feb 6, 2024

View reviewed changes

tianleiwu merged commit 7fa6f4f into microsoft:main Feb 12, 2024
77 of 78 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add arm64 bfloat16 fastmath mode option for transformers benchmarking script #19294

add arm64 bfloat16 fastmath mode option for transformers benchmarking script #19294

snadampal commented Jan 28, 2024 •

edited

Loading

tianleiwu commented Jan 29, 2024

snadampal commented Jan 29, 2024

tianleiwu commented Jan 29, 2024 •

edited

Loading

snadampal commented Jan 31, 2024

tianleiwu commented Feb 6, 2024

tianleiwu commented Feb 6, 2024

tianleiwu commented Feb 6, 2024

azure-pipelines bot commented Feb 6, 2024

azure-pipelines bot commented Feb 6, 2024

azure-pipelines bot commented Feb 6, 2024

snadampal commented Feb 6, 2024

snadampal commented Feb 8, 2024

tianleiwu commented Feb 12, 2024

azure-pipelines bot commented Feb 12, 2024

tianleiwu commented Feb 12, 2024

snadampal commented Feb 13, 2024

add arm64 bfloat16 fastmath mode option for transformers benchmarking script #19294

add arm64 bfloat16 fastmath mode option for transformers benchmarking script #19294

Conversation

snadampal commented Jan 28, 2024 • edited Loading

Description

Motivation and Context

tianleiwu commented Jan 29, 2024

snadampal commented Jan 29, 2024

tianleiwu commented Jan 29, 2024 • edited Loading

snadampal commented Jan 31, 2024

tianleiwu commented Feb 6, 2024

tianleiwu commented Feb 6, 2024

tianleiwu commented Feb 6, 2024

azure-pipelines bot commented Feb 6, 2024

azure-pipelines bot commented Feb 6, 2024

azure-pipelines bot commented Feb 6, 2024

snadampal commented Feb 6, 2024

snadampal commented Feb 8, 2024

tianleiwu commented Feb 12, 2024

azure-pipelines bot commented Feb 12, 2024

tianleiwu commented Feb 12, 2024

snadampal commented Feb 13, 2024

snadampal commented Jan 28, 2024 •

edited

Loading

tianleiwu commented Jan 29, 2024 •

edited

Loading