Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cortex-M7 Kyber Part 2: NTT functions #148

Merged
merged 1 commit into from
Jan 7, 2025

Conversation

dop-amin
Copy link
Collaborator

@dop-amin dop-amin commented Jan 6, 2025

This PR adds Kyber's NTT functions. The forward NTT switched to the Arch_Armv7M.BranchLoop type in order to get rid of the spilling feature, which allowed the vmov+cmp loop without boundaryReserverdRegs.

Possible improvments:

  • Make address fixup work with the current code.
    • Problem 1: Split heuristic cannot handle an instruction not being found by id when enforcing a before constraint. Could likely be resolved by allowing the constraint to be optional such that in case the instruction with a certain id can't be found AND we're using splitting heuristic, we continue instead of failiing.
    • Problem 2: Sometimes, instructions still get reordered after the branch which completely breaks the code. Needs investigation.
  • Run with more liberal parameters.
  • Also use Arch_Armv7M.BranchLoop in the inverse NTT.

@dop-amin dop-amin force-pushed the armv7m-mlkem-ntt branch 2 times, most recently from 6c34397 to 999e03b Compare January 6, 2025 18:23
@dop-amin
Copy link
Collaborator Author

dop-amin commented Jan 6, 2025

The first point in the list should now be addressed by #150 and #149.
I re-optimized the code using those changes and (partially) enabling the fixup while switching both examples to the Branch loop type.

This PR now depends on #150 and #149.

Tests in pqmx passed and the performance is similar (+/- 50 cycles roughly) to the armv7m branch.

@dop-amin dop-amin marked this pull request as ready for review January 6, 2025 18:32
@mkannwischer mkannwischer merged commit 0f6b99d into slothy-optimizer:main Jan 7, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants