-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Armv7-M: Allow register overlap in ldm + ldrd #153
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Can you please add a new example to test that this works.
A simple
ldrd r0, r1, [r0]
ldm r0, {r0-r3}
should do.
slothy/targets/arm_v7m/arch_v7m.py
Outdated
for ldr, reg in zip(ldrs, regs): | ||
if reg != ptr: | ||
ldrs_reordered.append(ldr) | ||
#log(f"inst.args_out == ptr: {reg}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this comment.
slothy/targets/arm_v7m/arch_v7m.py
Outdated
for ldr, reg in zip(ldrs, regs): | ||
if reg == ptr: | ||
ldrs_reordered.append(ldr) | ||
#log(f"inst.args_out == ptr: {reg}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here.
Or simply extend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your changes. Almost there.
@@ -29,4 +29,7 @@ smlabt r3,r2, r2, r1 | |||
asrs r3, r3,#1 | |||
str r3, [r0,#4] // @slothy:writes=a | |||
|
|||
ldrd r2, r3, [r1, #8] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not actually overlap. Please change to
ldrd r2, r3, [r1, #8] | |
ldrd r1, r2, [r1, #8] |
str r4, [r0] // ........................*..... | ||
asrs r2, r6, #1 // .........................*.... | ||
str r2, [r0, #4] // .........................*.... // @slothy:writes=a | ||
ldm r0, {r0,r1,r2,r3} // ..........................*... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your output shows that you have not turned on fusion. So the code you have commited isn't actually used.
You need to add a call to fusion_region
into the corresponding example in example.py
@@ -670,6 +670,7 @@ def core(self,slothy): | |||
slothy.config.variable_size=True | |||
slothy.config.inputs_are_outputs = True | |||
slothy.optimize(start="start", end="end") | |||
slothy.fusion_region("start", "end", ssa=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to put the fusion_region before the optimize, otherwise this does not help SLOTHY find a better solution.
slothy/targets/arm_v7m/arch_v7m.py
Outdated
@@ -1486,7 +1486,7 @@ def make(cls, src): | |||
obj.increment = None | |||
obj.pre_index = 0 | |||
obj.addr = obj.args_in[0] | |||
obj.args_in_out_different = [(0,0)] # Can't have Rd==Ra | |||
#obj.args_in_out_different = [(0,0)] # Can't have Rd==Ra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove those, not comment them out.
Also we need to test if this affects any other examples in SLOTHY.
For that please make sure you have a clean copy of SLOTHY, and then run
python3 example.py --timeout 60 --only-target=slothy.targets.arm_v7m.cortex_m7
This is going to run for a few hours. Then zip up the output files in examples/opt/armv7m
and attach them to this PR.
Previously CI would only run if the PR is created by us or if the PR is labeled with needs-ci. This restriction is not needed as we are using Github's runners anyway. This commit enables it for all PRs. It also runs CI on the main branch.
* This commit simplifies the kyber basemul naive implementations to revert modifications to the code originally taken from pqm4 that were only introduced to accomodate for shortcomings of slothy's abilities.
* This commit simplifies the Dilithium iNTT naive implementations to revert modifications to the code originally taken from pqm4 that were only introduced to accomodate for shortcomings of slothy's abilities. * We can also enable the fixup in more cases due to switching of the loop-type + using `before` tag which is done here, too. This aids with performance.
Fixed the splitting of
ldrd
andldm
when the address register and output register overlap inldrd_imm_splitting_cb
andldm_interval_splitting_cb
.