-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cpu: aarch64: injectors: Improve performance of tanh for block size 16 #2094
cpu: aarch64: injectors: Improve performance of tanh for block size 16 #2094
Conversation
Thanks for this. Can you let me know what cmake command line options you use when building to get these perf results? |
@theComputeKid following steps were used to get perf results (benchdnn was used):
Above test is for 4 Core. |
Sorry, I should have clarified, I was interested in the CMake options during the configure phase. Particularly, I wanted to know whether you compile with -DONEDNN_WERROR=ON, as I have found that the JIT codebase for aarch64 produces a lot of warnings that prevent us from turning on the flag. Could you just confirm that no new warnings are added by your changes? Thanks. |
@theComputeKid I did not compile with -DONEDNN_WERROR=ON. |
Can you still please confirm that no new warnings are added due to your changes? I.e. the number of warnings emitted before and after your changes (if any) are the same or less. |
When you say block size 16, do you mean this should only affect machines with an SVE vector length of 512? Block size is a somewhat overloaded term, especially with different data types. |
@theComputeKid Yeah, there are no new warnings due to the changes. |
@jondea Yes, it is for SVE_512, for fp32 datatype. |
@jondea Kindly support to check the changes and let us know if some information or changes are required. |
Can you please rebase and push to force pipelines to run again. Thanks. |
e1e11fa
to
9e31edf
Compare
@theComputeKid I have rebased and force pushed the code again. Please check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AArch64 approved. But might need to check with repo admins about the other CI failures.
THe only remaining failures I see are:
|
9e31edf
to
fda21a0
Compare
@vishwascm Might also need to shorten the PR title:
https://github.com/oneapi-src/oneDNN/actions/runs/11177487422/job/31073064297?pr=2094 |
Done |
Description
Performance Improvement: Eltwise Tanh performance improvement for block size 16
Major Code changes:
• Added a new function tanh_polynomial_approx_compute_vector_fwd(const TRegS &vmm_src) for
computing tanh.
• Added new tanh constants and polynomial constants table.
Checklist
General
All the tests are carried on A64FX machine which has block size 16:
make test
andmake test_benchdnn_*
) pass locally for each commit?make test
./benchdnn --eltwise --batch=inputs/eltwise/test_eltwise_all
make test_benchdnn_*
gtests
./test_eltwise
Note: All above results are same with and without the code changes.
Performance improvements