Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Armv7-M: Allow register overlap in ldm + ldrd #153

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

SH1E0r1r2y
Copy link

Fixed the splitting of ldrd and ldm when the address register and output register overlap in ldrd_imm_splitting_cb and ldm_interval_splitting_cb.

Copy link
Collaborator

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Can you please add a new example to test that this works.
A simple

ldrd r0, r1, [r0]
ldm r0, {r0-r3}

should do.

for ldr, reg in zip(ldrs, regs):
if reg != ptr:
ldrs_reordered.append(ldr)
#log(f"inst.args_out == ptr: {reg}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this comment.

for ldr, reg in zip(ldrs, regs):
if reg == ptr:
ldrs_reordered.append(ldr)
#log(f"inst.args_out == ptr: {reg}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

@mkannwischer
Copy link
Collaborator

Thanks!

Can you please add a new example to test that this works. A simple

ldrd r0, r1, [r0]
ldm r0, {r0-r3}

should do.

Or simply extend armv7m_simple0.s

@mkannwischer mkannwischer changed the title fixed the register overlap Armv7-M: Allow register overlap in ldm + ldrd Jan 9, 2025
Copy link
Collaborator

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your changes. Almost there.

@@ -29,4 +29,7 @@ smlabt r3,r2, r2, r1
asrs r3, r3,#1
str r3, [r0,#4] // @slothy:writes=a

ldrd r2, r3, [r1, #8]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not actually overlap. Please change to

Suggested change
ldrd r2, r3, [r1, #8]
ldrd r1, r2, [r1, #8]

str r4, [r0] // ........................*.....
asrs r2, r6, #1 // .........................*....
str r2, [r0, #4] // .........................*.... // @slothy:writes=a
ldm r0, {r0,r1,r2,r3} // ..........................*...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your output shows that you have not turned on fusion. So the code you have commited isn't actually used.
You need to add a call to fusion_region into the corresponding example in example.py

@@ -670,6 +670,7 @@ def core(self,slothy):
slothy.config.variable_size=True
slothy.config.inputs_are_outputs = True
slothy.optimize(start="start", end="end")
slothy.fusion_region("start", "end", ssa=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to put the fusion_region before the optimize, otherwise this does not help SLOTHY find a better solution.

@@ -1486,7 +1486,7 @@ def make(cls, src):
obj.increment = None
obj.pre_index = 0
obj.addr = obj.args_in[0]
obj.args_in_out_different = [(0,0)] # Can't have Rd==Ra
#obj.args_in_out_different = [(0,0)] # Can't have Rd==Ra
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove those, not comment them out.
Also we need to test if this affects any other examples in SLOTHY.

For that please make sure you have a clean copy of SLOTHY, and then run

python3 example.py --timeout 60 --only-target=slothy.targets.arm_v7m.cortex_m7

This is going to run for a few hours. Then zip up the output files in examples/opt/armv7m and attach them to this PR.

mkannwischer and others added 8 commits January 14, 2025 11:46
Previously CI would only run if the PR is created
by us or if the PR is labeled with needs-ci.
This restriction is not needed as we are using
Github's runners anyway.
This commit enables it for all PRs.
It also runs CI on the main branch.
* This commit simplifies the kyber basemul naive implementations to revert modifications to the code originally taken from pqm4 that were only introduced to accomodate for shortcomings of slothy's abilities.
* This commit simplifies the Dilithium iNTT naive implementations to revert modifications to the code originally taken from pqm4 that were only introduced to accomodate for shortcomings of slothy's abilities.
* We can also enable the fixup in more cases due to switching of the loop-type + using `before` tag which is done here, too. This aids with performance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants