kern: rework ARMv8-M context switch, fixing a bug? #1963

cbiffle · 2024-12-26T04:24:57Z

I was playing with reducing ARMv8-M context switch time, by looking into the same sort of changes to MPU loading that I recently did on v6/7.

I noticed that we only ever bitwise-OR values into the MAIR registers. In other words, the first task that gets activated loads its exact attributes into MAIR for each region; the second task combines its attributes using bitwise OR; and so on until the registers contain the bitwise OR of all regions in all tasks. (Note that these are the cacheability/sharability attributes, not the access permissions, which are set correctly. The main implication of this would be accidentally caching device memory or breaking DMA.)

That seemed bad, so I removed it and reworked the MPU loading loop. Now we build up the MAIR contents in a local before writing them -- no more read-modify-write.

Halting the processor in task code, I was able to observe a meaningful distinction in MAIR contents: previously tasks were unable to set memory as device (which involves more 1 bits than not-device), and now they can. The fact that this hasn't bitten us speaks to the relative simplicity of our current ARMv8-M parts!

I don't have detailed measurements of the code, but this knocks 32 bytes off the routine, which is likely a performance win too.

I was playing with reducing ARMv8-M context switch time, by looking into the same sort of changes to MPU loading that I recently did on v6/7. I noticed that we only ever bitwise-OR values into the MAIR registers. In other words, the first task that gets activated loads its exact attributes into MAIR for each region; the _second_ task combines its attributes using bitwise OR; and so on until the registers contain the bitwise OR of all regions in all tasks. (Note that these are the cacheability/sharability attributes, _not_ the access permissions, which are set correctly. The main implication of this would be accidentally caching device memory or breaking DMA.) That seemed bad, so I removed it and reworked the MPU loading loop. Now we build up the MAIR contents in a local before writing them -- no more read-modify-write. Halting the processor in task code, I was able to observe a meaningful distinction in MAIR contents: previously tasks were unable to set memory as device (which involves more 1 bits than not-device), and now they can. The fact that this hasn't bitten us speaks to the relative simplicity of our current ARMv8-M parts! I don't have detailed measurements of the code, but this knocks 32 bytes off the routine, which is likely a performance win too.

aapoalas · 2024-12-26T13:32:37Z

sys/kern/src/arch/arm_m.rs

+        // the same index as the region. This lets us treat MAIR as an array
+        // corresponding to the regions.
+        //
+        // We unfortunately can't do this at compile time, because regions can


thought: This seems to call for a TaskDescExt kind of thing, possibly stored ouf-of-band in a separate array, which could then contain the MAIR0 and MAIR1 values precomputed.

Right now, the mair field for each region takes up 4 bytes (due to padding) while storing only a single byte of data. With an unrealistic assumption that each task only uses, on average, two unique regions that already means that the mair fields add 8 bytes of data to the flash. I assume something like 4-5 unique regions might be a more reasonable average, meaning 16-20 bytes are added for each task to, eventually, store only the 4-5 bytes of data. Moving this data to be task-specific would duplicate data but still lead to a reduction in flash size due to the removal of the padding.

hawkw

This feels like it deserves a review from @labbott or someone else who knows the deep Cortex-M lore, but I was happy to give it a look anyway. Commented on a few things, but nothing major --- I'll leave that to someone who actually knows more about what these registers do :)

hawkw · 2024-12-26T04:30:26Z

sys/kern/src/arch/arm_m.rs

+/// the hardware, which improves code generation. We do not actually rely on the
+/// in-memory representation of this struct otherwise.


Thanks for documenting that!

hawkw · 2024-12-26T20:03:17Z

sys/kern/src/arch/arm_m.rs

+    /// This is the contents of the RLAR register with the enable bit set. We
+    /// can write it with the enable bit set directly, since we disable the MPU
+    /// before doing so.


Nitpicky, feel free to ignore me: this comment kinda feels like it's written in dialogue with the comment that was there previously, which stated that we set the enable bit separately, and now it's saying that we don't do that any more. Which is great, but...it feels like there's some missing context around how the MPU works on v8-M. ~~Is the context here that the enable bit in RLAR cannot be set while the MPU is enabled? It could be nice to say that a little more explicitly.~~

Edit: Oh, I see that this is discussed later on in a comment in compute_region_extension_data. That rationale makes sense...but, I feel like it might be worth moving some of that discussion here? Otherwise, this comment doesn't stand up that well on its own, and it's the one a reader is likelier to encounter first.

This is, admittedly, a nitpick, since the rationale is clearly documented, just...a little bit later.

hawkw · 2024-12-26T22:30:07Z

sys/kern/src/arch/arm_m.rs

+        mpu.mair[0].write(u32::from_le_bytes(mairs[..4].try_into().unwrap()));
+        mpu.mair[1].write(u32::from_le_bytes(mairs[4..].try_into().unwrap()));


hm, the try_into() is a bit of a bummer here, since these are always going to be fixed-size arrays, and it would be nice to be able to reason about that statically and avoid checking that they actually are 4 bytes at runtime. but, i guess the alternative is to declare two 4-byte arrays and make the for loop above do some extra logic to reason about which one it's in, which also feels kind of sad. i dunno...

cbiffle requested a review from labbott December 26, 2024 04:24

cbiffle force-pushed the cbiffle/armv8m-mair-bug branch from da02fc1 to 69bebe8 Compare December 26, 2024 04:27

aapoalas reviewed Dec 26, 2024

View reviewed changes

hawkw reviewed Dec 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kern: rework ARMv8-M context switch, fixing a bug? #1963

kern: rework ARMv8-M context switch, fixing a bug? #1963

cbiffle commented Dec 26, 2024

aapoalas Dec 26, 2024

hawkw left a comment

hawkw Dec 26, 2024

hawkw Dec 26, 2024

hawkw Dec 26, 2024

		/// the hardware, which improves code generation. We do not actually rely on the
		/// in-memory representation of this struct otherwise.

		mpu.mair[0].write(u32::from_le_bytes(mairs[..4].try_into().unwrap()));
		mpu.mair[1].write(u32::from_le_bytes(mairs[4..].try_into().unwrap()));

kern: rework ARMv8-M context switch, fixing a bug? #1963

Are you sure you want to change the base?

kern: rework ARMv8-M context switch, fixing a bug? #1963

Conversation

cbiffle commented Dec 26, 2024

aapoalas Dec 26, 2024

Choose a reason for hiding this comment

hawkw left a comment

Choose a reason for hiding this comment

hawkw Dec 26, 2024

Choose a reason for hiding this comment

hawkw Dec 26, 2024

Choose a reason for hiding this comment

hawkw Dec 26, 2024

Choose a reason for hiding this comment