-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
signpost: Avoid updating into nothing #444
Conversation
/// Sets 'tries left' to 1 on the inactive partition to represent a | ||
/// potentially valid image, but does not change the priority. | ||
/// **does not write to the disk**. | ||
pub fn mark_inactive_valid(&mut self) { | ||
let mut inactive_flags = self.gptprio(self.inactive()); | ||
inactive_flags.set_tries_left(1); | ||
self.set_gptprio(self.inactive(), inactive_flags); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should do nothing if successful=true
, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually looked at the definition of set_tries_left()
and yes! We could change this to 2 instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-read this in the morning and maybe not; these are exclusive aren't they?
ensure!( | ||
inactive_flags.tries_left() > 0, | ||
error::InactiveNotValid { | ||
inactive: &self.os_disk | ||
} | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a successful check here, too.
We're adding complexity here so I'd like to make sure I understand.
Wouldn't |
Right, this adds the latter of those, checking if it's OK. What I'm trying to avoid here is a user running |
(Updated to remove a related |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything wrong, but I also don't feel like I'm smart enough to understand what's going on. I think the interactions between our many commands and the multiple flags on multiple partitions are complex, and therefore likely to have unanticipated interactions, and this is our most critical code path. I think it'd be nice to reflect the state more clearly in fewer places; I'm not sure what to recommend without a better understanding. That said, I'm approving because I don't see any immediate issues and the experts like the flow, but I'd definitely recommend more documentation overall.
let mut inactive_flags = self.gptprio(self.inactive()); | ||
ensure!( | ||
inactive_flags.priority() != 2 && !inactive_flags.successful(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we care whether inactive was successful? It seems like the priority check should be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be related to my primary confusion about this PR. We can't actually validate the inactive partition, we can only check that it has some markers from the user, so in essence we're funneling them through a different process we believe to be more indicative of their intent. Is that a fair characterization?
Add two new commands to signpost: - mark-inactive-valid, used to mark the inactive partition as having a potentially valid image but not for boot. - cancel-upgrade, which reverts the changes of upgrade-to-inactive. In addition upgrade-to-inactive makes two initial checks; whether the inactive partition has been marked valid with mark-inactive-valid, and whether the inactive partition has already been marked for boot. The first is to avoid accidentally booting into a blank partition, while the second is a warning that makes it more clear that invoking upgrade-to-inactive twice does not revert the change. Signed-off-by: Samuel Mendoza-Jonas <[email protected]>
Updated with the above suggestions, and changing the first |
I also did a quick write-up of how these changes fit into signpost/updog - it got a bit long but hopefully it makes it more clear what we're adding here while avoiding changing the update process. If it reads well enough I can look into folding it into some docs as well. GPT flagsThere are currently three attributes associated with a "side"; priority, tries-left, and a "successful" marker. These are distinct and don't overlap in the partition header. Current commands/methods:
Commands/methods after changes:
How changes add safety without borking current behaviourPreviously upgrade-to-inactive unconditionally set the inactive side to be the next boot target. The new commit introduces mark-inactive-valid which sets inactive.tries_left to 1. This does not by itself mark the side ready for booting as the active partition will still have the higher priority. However upgrade-to-inactive now has a way to check that the inactive side has a good chance of booting into something, before giving it the higher priority. Cancel-upgrade is a new command which clears the inactive side flags and restores the active side's priority. Previously there wasn't an obvious way to reverse the effect of upgrade-to-inactive. How this affects updog / manual signpostThis avoids a scenario where the user runs For example, trying to upgrade without having written an update:
Trying to mark for upgrade twice:
Cancel-upgrade shouldn't be part of any normal usage; it provides a way to back out if the user was using signpost directly and accidentally marked something undesirable for upgrade. Cancelling the upgrade:
|
This is the piece I'm not sure I agree with. The only reason we believe it has a better chance of booting is that the user said so, but the user was already saying so by running I'm OK with this if you think it's clearer - I know I don't represent all users - but I'd like to see us consider alternate models that could reduce the number of commands. (Perhaps by hiding some of the partition flags; I'm not sure we need to represent everything so directly to the user.) |
I agree, it took me a bit of staring at Signpost before I felt like I had a good handle on what's happening. Adding this extra command doesn't help in that regard, but on the other hand my interpretation of a In normal usage everything is done via |
What if signpost is just a library + CLI utility for modifying the three fields on the boot partitions, and the logic for what the bits should actually be moves to updog? |
That could work, and give the user a single interface for these kinds of changes. w.r.t this PR, the question appears to be whether introducing |
Has anyone had thoughts about alternative structures/workflows that could make this verification clearer, or tried changing signpost+updog per #444 (comment) ? My approval stands, in case I'm alone in finding it confusing. Not sure if @iliana's stands after her comment about signpost+updog or if she'd prefer that change. |
My approval stands but I'd appreciate an issue being opened to track the proposal of moving signpost-workflows into updog and turning signpost into a bit twiddling library/tool. |
Issue #, if available:
N/A
Description of changes:
Add two new commands to signpost:
potentially valid image but not for boot.
In addition upgrade-to-inactive makes two initial checks; whether the
inactive partition has been marked valid with mark-inactive-valid, and
whether the inactive partition has already been marked for boot. The
first is to avoid accidentally booting into a blank partition, while the
second is a warning that makes it more clear that invoking
upgrade-to-inactive twice does not revert the change.
Signed-off-by: Samuel Mendoza-Jonas [email protected]
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.