Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow branch instructions inside loops #139

Merged
merged 9 commits into from
Jan 3, 2025

Conversation

dop-amin
Copy link
Collaborator

Still to do:

  1. Automatically adding the branch tag. I was not sure where the right place for this is so I skipped that for now. I feel like I could just "throw" it into a lot of places but it always felt oddly specific. Perhaps in the constructor of the Loop subclass, as this is most closely related.
  2. Fixing the selftest. Not 100% sure what we'd like to have here. We could in theory also run the loop in the test, I suppose but you have a better overview of how this all works at the moment. Filtering out the branches before calling the selftest is again possible, yet again, kind of a "specific" thing.
  3. More proper testing. I ran a few examples but there are many I haven't tried so far.

@mkannwischer
Copy link
Collaborator

Still to do:

  1. Automatically adding the branch tag. I was not sure where the right place for this is so I skipped that for now. I feel like I could just "throw" it into a lot of places but it always felt oddly specific. Perhaps in the constructor of the Loop subclass, as this is most closely related.
  2. Fixing the selftest. Not 100% sure what we'd like to have here. We could in theory also run the loop in the test, I suppose but you have a better overview of how this all works at the moment. Filtering out the branches before calling the selftest is again possible, yet again, kind of a "specific" thing.
  3. More proper testing. I ran a few examples but there are many I haven't tried so far.

I have implemented (1) and (2) - please have a look.
(3) is still a todo. In particular, we need to re-optimize all examples and make sure their performance does not degrade. I did make sure everything runs successfully, but I do see some performance degradation (and some performance improvment) and some breaking tests. This needs further investigation.

@dop-amin
Copy link
Collaborator Author

dop-amin commented Jan 2, 2025

Thanks, your changes look fine to me!

On the performance: I ran the optimizations on main as well as on this branch multiple times and observed "the usual" deviations, sometimes one result was better, sometimes the other. I'd deem this okay; at least I'm assuming no degradation due to this PR.

On the broken tests: I was able to reproduce this, either wrong result data being printed in the end or seemingly endless garbage that gets send over UART. There have been two issues: 1. In small_pointmul_asm_769, the register order in cmp was not adhering to what we require. This caused the loop counter fixup to be wrong which in return made the loop have one more iteration than it should causing in OOB write. 2. In basemul_257_dilithium the ldr with increment of the loop "counter" got moved after the cmp, meaning we also had an extra iteration here. Reason is the address offset fixup, removing the outputs from the ldr. Fixed this by disabling the fixup. This can potentially also happen in other examples.

I address the two things in 8674344. If you think this should be part of this PR, good, if not, feel free to drop it and we handle it separately.

Copy link
Collaborator

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please re-optimize the examples we already have in main.
Rest looks good to me! Thank you!

@mkannwischer mkannwischer marked this pull request as ready for review January 2, 2025 15:53
@dop-amin
Copy link
Collaborator Author

dop-amin commented Jan 2, 2025

Re-optimized code is added, tests on this code pass and no more weird output.

@mkannwischer mkannwischer merged commit cb50ad9 into slothy-optimizer:main Jan 3, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants