-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ML-DSA: AVX2 target feature #642
Conversation
Changes cherry-picked from `07084ab4` and `6022f6e4`
Changes cherry-picked from `a87318d8`
Changes cherry-picked from `a87318d8`
3c5b96d
to
61f72fd
Compare
It seems |
I have another version of this branch at https://github.com/cryspen/libcrux/tree/jonas/ml-dsa-target-feature-backup which is pre-rebase on top of the laxing changes. There, reverting the above commit containing SHA-3 changes fully restores performance, but leads to a stack overflow. Reverting the commit on this branch does not seem to help performance. @franziskuskiefer @karthikbhargavan Shall we try and get this PR in with somewhat degraded performance (e.g. for me the difference between the overflowing performant code and the laxing, non-overflowing code is ~10x), and look into fully restoring performance in a follow-up? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get this in and then investigate the performance regression.
This PR performs the same kind of restructuring to ML-DSA as #636 does to ML-KEM, i.e. once we know we're running on an AVX2 machine, manually enable AVX2 specific optimizations using
#[target_feature(enable = "avx2")]
and then inline most functions that are further down in the call-tree.Doing this should get us close to compiling the whole crate using
RUSTFLAGS="-C target-feature=+avx2"
in terms of performance.