-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve context switch / MPU loading performance #1960
Conversation
Looks like I get to run around adjusting all the kernel size allocations. I'll push a commit for that here in a bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments but LGTM
#[cfg(armv8m)] | ||
#[derive(Copy, Clone, Debug, Serialize, Deserialize)] | ||
pub struct RegionDescExt { | ||
rlar_disabled: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add add comment explaining this is the RLAR with everything except the enable
bit set? It took me a bit to figure out that was what rlar_disabled
meant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had to go convince myself that we need rlar_disabled
so I could explain it ... I think we actually don't, but I'll leave it in in the interest of not breaking things. I will add an explanatory comment and a TODO.
// Turn off the MPU. | ||
// | ||
// Safety: this has no actual memory safety implications, except for | ||
// potentially exposing the kernel to a NULL dereference that succeeds. | ||
unsafe { | ||
mpu.ctrl.write(0); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really wish ARMv7m gave a sample sequence for the MPU. IIRC the reason we turn off/on on v8m is because that was the sequence specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed.
For those following along, @labbott did manage to track down this: https://github.com/ARM-software/m-profile-user-guide-examples/tree/main/Memory_model/rtos_context_switch
That is, unfortunately, wrong, and were we to follow that example we would recreate the crash previously fixed by #1193.
sys/kern/src/arch/arm_m.rs
Outdated
@@ -73,6 +73,7 @@ | |||
use core::arch::{self, global_asm}; | |||
use core::sync::atomic::{AtomicBool, AtomicPtr, AtomicU32, Ordering}; | |||
|
|||
use serde::{Deserialize, Serialize}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a little weird to be bringing these into the kernel, mostly because those seem much higher level. I don't have a better idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me remind myself where this is actually used nowadays, maybe we can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it's required here because the (cross-platform) RegionDesc
in kern::descs
derives Serialize
and Deserialize
, and the RegionDescExt
struct must also to not break that. I don't actually see any obvious places where we serialize or deserialize RegionDesc
s --- I wonder if this is a historical artifact. Did we once use these types in the build code when deserializing an application's config file? It looks like we now use a different type in build_kconfig
:
hubris/build/kconfig/src/lib.rs
Lines 96 to 118 in 874c3b8
/// Description of one memory region. | |
/// | |
/// A memory region spans a range of physical addresses, and applies access | |
/// permissions to whatever lies in that range. Despite our use of the term | |
/// "memory" here, the region may not describe RAM -- ROM and memory-mapped | |
/// peripherals are also described by memory regions. | |
/// | |
/// A memory region can be used by multiple tasks. This is mostly used to have | |
/// tasks share a no-access region (often index 0) in unused region slots, but | |
/// you could also use it for shared peripheral or RAM access. | |
#[derive(Copy, Clone, Debug, Serialize, Deserialize)] | |
pub struct RegionConfig { | |
/// Address of start of region. The platform likely has alignment | |
/// requirements for this; it must meet them. (For example, on ARMv7-M, it | |
/// must be naturally aligned for the size.) | |
pub base: u32, | |
/// Size of region, in bytes. The platform likely has alignment requirements | |
/// for this; it must meet them. (For example, on ARMv7-M, it must be a | |
/// power of two greater than 16.) | |
pub size: u32, | |
/// Flags describing what can be done with this region. | |
pub attributes: RegionAttributes, | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good news! I can remove it, and will push the commit shortly.
// else but this routine. Here we must convert between the two -- and | ||
// quickly, because this is called on every context switch. | ||
// | ||
// The image-generation tools check at build time that region sizes are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth unifying where this check occurs later to remove some code from big scary xtask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could actually move the checks into this function, now that we expect it to be evaluated only in const context.....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. We could actually do the check from the const fn nowadays if we wanted. Followup maybe.
67e0371
to
df6d5f6
Compare
Just to bring visibility, I took a look at splitting the I'm not convinced it is the best choice, but I'm fairly convinced it's not an asinine choice. It may not be worth bringing into this PR, but maybe it would be worth a follow-up? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
df6d5f6
to
414087f
Compare
1ea8043
to
1dbd28c
Compare
Neat, I'm happy to take a look at that @aapoalas. It may be a little while, schedules are starting to get weird due to the encroaching holidays. But I can test the cache effects of that pretty easily, I think. |
This reduces the cost of the `apply_memory_protection` phase of context switches by computing the descriptors in a const fn and sneaking them into the RegionDesc struct via a new field. This causes MPU load on ARMv7-M (Cortex-M4) to finish in 45% the time it previously took, which in turn knocks about 266 cycles off IPC-related syscalls.
This alters the v6/v7 routine to switch off the MPU before updating it, and switch it back on after, like v8 does. This has allowed me to use cheaper code to do the update. That, combined with some field ordering changes to optimize for access order in the generated code, cuts the cycles spent in this routine by roughly half.
1dbd28c
to
80b5188
Compare
After discovering that MPU loading was a significant fraction of the time we were spending in the kernel, I took a pass over it. The primary changes here are:
const fn
so they can be evaluated at compile time.const fn
s, so that the MPU code is no longer being evaluated at runtime.RegionDesc
internal kernel structure has grown a new field to hold unspecified architecture specific data, which for our current architectures happens to be a pair of words to be loaded into the MPU (RASR/RBAR or RBAR/RLAR depending on tyrannosaur version).