Armv7-M: Allow register overlap in ldm + ldrd #153

SH1E0r1r2y · 2025-01-09T06:35:52Z

Fixed the splitting of ldrd and ldm when the address register and output register overlap in ldrd_imm_splitting_cb and ldm_interval_splitting_cb.

mkannwischer

Thanks!

Can you please add a new example to test that this works.
A simple

ldrd r0, r1, [r0]
ldm r0, {r0-r3}

should do.

mkannwischer · 2025-01-09T06:45:00Z

slothy/targets/arm_v7m/arch_v7m.py

+        for ldr, reg in zip(ldrs, regs):
+            if reg != ptr:
+                ldrs_reordered.append(ldr)
+                #log(f"inst.args_out == ptr: {reg}")


Please remove this comment.

mkannwischer · 2025-01-09T06:45:07Z

slothy/targets/arm_v7m/arch_v7m.py

+        for ldr, reg in zip(ldrs, regs):
+            if reg == ptr:
+                ldrs_reordered.append(ldr)
+                #log(f"inst.args_out == ptr: {reg}")


mkannwischer · 2025-01-09T06:48:26Z

Thanks!

Can you please add a new example to test that this works. A simple
ldrd r0, r1, [r0]
ldm r0, {r0-r3}
should do.

Or simply extend armv7m_simple0.s

slothy/targets/arm_v7m/arch_v7m.py

mkannwischer

Thanks for your changes. Almost there.

mkannwischer · 2025-01-10T09:21:36Z

examples/naive/armv7m/armv7m_simple0.s

@@ -29,4 +29,7 @@ smlabt r3,r2, r2, r1
 asrs r3,   r3,#1
 str r3, [r0,#4] // @slothy:writes=a

+ldrd r2, r3, [r1, #8]


This does not actually overlap. Please change to

Suggested change

ldrd r2, r3, [r1, #8]

ldrd r1, r2, [r1, #8]

mkannwischer · 2025-01-10T09:22:36Z

examples/opt/armv7m/armv7m_simple0_opt_m7.s

+        str r4, [r0]                    // ........................*.....
+        asrs r2, r6, #1                 // .........................*....
+        str r2, [r0, #4]                // .........................*.... // @slothy:writes=a
+        ldm r0, {r0,r1,r2,r3}           // ..........................*...


Your output shows that you have not turned on fusion. So the code you have commited isn't actually used.
You need to add a call to fusion_region into the corresponding example in example.py

mkannwischer · 2025-01-14T02:28:29Z

example.py

@@ -670,6 +670,7 @@ def core(self,slothy):
        slothy.config.variable_size=True
        slothy.config.inputs_are_outputs = True
        slothy.optimize(start="start", end="end")
+        slothy.fusion_region("start", "end", ssa=False)


You need to put the fusion_region before the optimize, otherwise this does not help SLOTHY find a better solution.

mkannwischer · 2025-01-14T02:31:43Z

slothy/targets/arm_v7m/arch_v7m.py

@@ -1486,7 +1486,7 @@ def make(cls, src):
        obj.increment = None
        obj.pre_index = 0
        obj.addr = obj.args_in[0]
-        obj.args_in_out_different = [(0,0)] # Can't have Rd==Ra
+        #obj.args_in_out_different = [(0,0)] # Can't have Rd==Ra


Please remove those, not comment them out.
Also we need to test if this affects any other examples in SLOTHY.

For that please make sure you have a clean copy of SLOTHY, and then run

python3 example.py --timeout 60 --only-target=slothy.targets.arm_v7m.cortex_m7

This is going to run for a few hours. Then zip up the output files in examples/opt/armv7m and attach them to this PR.

Previously CI would only run if the PR is created by us or if the PR is labeled with needs-ci. This restriction is not needed as we are using Github's runners anyway. This commit enables it for all PRs. It also runs CI on the main branch.

* This commit simplifies the kyber basemul naive implementations to revert modifications to the code originally taken from pqm4 that were only introduced to accomodate for shortcomings of slothy's abilities.

* This commit simplifies the Dilithium iNTT naive implementations to revert modifications to the code originally taken from pqm4 that were only introduced to accomodate for shortcomings of slothy's abilities. * We can also enable the fixup in more cases due to switching of the loop-type + using `before` tag which is done here, too. This aids with performance.

fixed the register overlap

0a9ac4e

mkannwischer requested changes Jan 9, 2025

View reviewed changes

mkannwischer changed the title ~~fixed the register overlap~~ Armv7-M: Allow register overlap in ldm + ldrd Jan 9, 2025

mkannwischer reviewed Jan 9, 2025

View reviewed changes

slothy/targets/arm_v7m/arch_v7m.py Show resolved Hide resolved

mkannwischer reviewed Jan 9, 2025

View reviewed changes

slothy/targets/arm_v7m/arch_v7m.py Show resolved Hide resolved

add example for overlap

bc18688

mkannwischer requested changes Jan 10, 2025

View reviewed changes

mkannwischer added the needs-ci label Jan 10, 2025

fixed the example

3c8dd2f

mkannwischer requested changes Jan 14, 2025

View reviewed changes

mkannwischer and others added 8 commits January 14, 2025 11:46

M7: Fix Keccak M7 code size

1825961

Always run CI

7ce4998

Previously CI would only run if the PR is created by us or if the PR is labeled with needs-ci. This restriction is not needed as we are using Github's runners anyway. This commit enables it for all PRs. It also runs CI on the main branch.

CM7: Simplify Kyber basemuls naive

89e54ee

* This commit simplifies the kyber basemul naive implementations to revert modifications to the code originally taken from pqm4 that were only introduced to accomodate for shortcomings of slothy's abilities.

CM7: Simplify Dilithium 769 NTT code

ec8112c

CM7: Re-enable address offset fixup for Dilithium NTT

da0c503

fixed the example.2

66fab54

fixed.3 & add part of examples

02dd6fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Armv7-M: Allow register overlap in ldm + ldrd #153

Armv7-M: Allow register overlap in ldm + ldrd #153

SH1E0r1r2y commented Jan 9, 2025

mkannwischer left a comment

mkannwischer Jan 9, 2025

mkannwischer Jan 9, 2025

mkannwischer commented Jan 9, 2025

mkannwischer left a comment

mkannwischer Jan 10, 2025

mkannwischer Jan 10, 2025

mkannwischer Jan 14, 2025

mkannwischer Jan 14, 2025

Armv7-M: Allow register overlap in ldm + ldrd #153

Are you sure you want to change the base?

Armv7-M: Allow register overlap in ldm + ldrd #153

Conversation

SH1E0r1r2y commented Jan 9, 2025

mkannwischer left a comment

Choose a reason for hiding this comment

mkannwischer Jan 9, 2025

Choose a reason for hiding this comment

mkannwischer Jan 9, 2025

Choose a reason for hiding this comment

mkannwischer commented Jan 9, 2025

mkannwischer left a comment

Choose a reason for hiding this comment

mkannwischer Jan 10, 2025

Choose a reason for hiding this comment

mkannwischer Jan 10, 2025

Choose a reason for hiding this comment

mkannwischer Jan 14, 2025

Choose a reason for hiding this comment

mkannwischer Jan 14, 2025

Choose a reason for hiding this comment