-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add kipc::find_faulted_task. #1931
Add kipc::find_faulted_task. #1931
Conversation
47735cb
to
d6a6e67
Compare
doc/kipc.adoc
Outdated
|
||
==== Preconditions | ||
|
||
The `starting_index` must be a valid index for this system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we also allow the last valid index + 1, which is guaranteed to return 0.
sys/kern/src/kipc.rs
Outdated
for i in index..tasks.len() { | ||
if let TaskState::Faulted { .. } = tasks[i].state() { | ||
let response_len = | ||
serialize_response(&mut tasks[caller], response, &(i as u32))?; | ||
tasks[caller] | ||
.save_mut() | ||
.set_send_response_and_length(0, response_len); | ||
return Ok(NextTask::Same); | ||
} | ||
} | ||
|
||
let response_len = | ||
serialize_response(&mut tasks[caller], response, &0_u32)?; | ||
tasks[caller] | ||
.save_mut() | ||
.set_send_response_and_length(0, response_len); | ||
Ok(NextTask::Same) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take it or leave it, but this could be written more concisely as
for i in index..tasks.len() { | |
if let TaskState::Faulted { .. } = tasks[i].state() { | |
let response_len = | |
serialize_response(&mut tasks[caller], response, &(i as u32))?; | |
tasks[caller] | |
.save_mut() | |
.set_send_response_and_length(0, response_len); | |
return Ok(NextTask::Same); | |
} | |
} | |
let response_len = | |
serialize_response(&mut tasks[caller], response, &0_u32)?; | |
tasks[caller] | |
.save_mut() | |
.set_send_response_and_length(0, response_len); | |
Ok(NextTask::Same) | |
let i = tasks[index..] | |
.iter() | |
.position(|task| matches!(task.state(), TaskState::Faulted { .. })) | |
.map(|i| i + index) // relative -> global task index | |
.unwrap_or(0); | |
let response_len = | |
serialize_response(&mut tasks[caller], response, &(i as u32))?; | |
tasks[caller] | |
.save_mut() | |
.set_send_response_and_length(0, response_len); | |
return Ok(NextTask::Same); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Into it, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rejiggered this a little to use enumerate
and find
, to ensure that we don't get a silly overflow check added to the i+index
expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, amusingly, that totally broke it, due to a brain-o on my part. Switched back to your method. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this is lovely, but it looks like the binding in userlib is actually incorrect! It's using the KIPC number for ReadTaskStatus
instead of for the new IPC!
Should be easy to fix but I wanted to flag it now before this merges :)
The "task ID or zero" return value is represented as an `Option<NonZeroUsize>` | ||
in the Rust API, so a typical use of this kipc looks like: | ||
|
||
[source,rust] | ||
---- | ||
let mut next_task = 1; // skip supervisor | ||
while let Some(fault) = kipc::find_faulted_task(next_task) { | ||
let fault = usize::from(fault); | ||
// do things with the faulted task | ||
|
||
next_task = fault + 1; | ||
} | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is lovely :)
sys/userlib/src/kipc.rs
Outdated
let mut response = 0_u32; | ||
let (_, _) = sys_send( | ||
TaskId::KERNEL, | ||
Kipcnum::ReadTaskStatus as u16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uhh...this looks like the wrong Kipcnum
variant?
Kipcnum::ReadTaskStatus as u16, | |
Kipcnum::FindFaultedTask as u16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I've mostly tested this with exhubris where I used the right number. The perils of a fork!
d6a6e67
to
38771da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great!
45ae34f
to
2b1ddde
Compare
This resolves a four-year-old TODO in El Jefe asking for a way to process faulted tasks without making so many kipcs. The original supervisor kipc interface was, by definition, designed before we knew what we were doing. Now that we have some miles on the system, some things are more clear: 1. The supervisor doesn't use the TaskState data to make its decisions. 2. The TaskState data is pretty expensive to serialize/deserialize, and produces code containing panic sites. 3. Panic sites in the supervisor are bad, since it is not allowed to panic. The new find_faulted_task operation can detect all N faulted tasks using N+1 kipcs, instead of one per potentially faulted task, and the request and response messages are trivial to serialize (one four-byte integer each way). This has allowed me to write (out-of-tree) "minisuper," a supervisor in 256 bytes that cannot panic. In-tree, this has the advantage of knocking 33% off Jefe's flash size and reducing statically-analyzable max stack depth by 20%.
2b1ddde
to
123b749
Compare
This resolves a four-year-old TODO in El Jefe asking for a way to process faulted tasks without making so many kipcs. The original supervisor kipc interface was, by definition, designed before we knew what we were doing. Now that we have some miles on the system, some things are more clear:
The new find_faulted_task operation can detect all N faulted tasks using N+1 kipcs, instead of one per potentially faulted task, and the request and response messages are trivial to serialize (one four-byte integer each way). This has allowed me to write (out-of-tree) "minisuper," a supervisor in 256 bytes that cannot panic.
In-tree, this has the advantage of knocking 33% off Jefe's flash size and reducing statically-analyzable max stack depth by 20%.