-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1907333: daemon: Move rollback removal to update loop #2297
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be problematic where updated OS doesn't boot fine. Admin will no longer be able to manually rollback which I think can be the case in disconnected install.
I was thinking maybe we can keep the previous code and just ignore the error when deleting rollback failed? Keeping rollback deployment is anyway not a concern other than it provides additional free space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is a concern. That said, OpenShift doesn't really support rollbacks in general, and even specifically for the MCO doing it will trigger issues related to #1190 - basically the MCO will go degraded.
We don't really have a strong story for e.g. "The new kernel in 4.6.X broke booting on this type of hardware" other than "pause the update and wait for a fixed kernel to arrive and then update".
The problem with that is that if we e.g. just log errors we have no real visibility into failures (though I guess we could add an alert) and:
The original motivation for doing this is (very briefly) mentioned in the PR:
#2220
It's basically: compliance tooling that e.g. adjusts kernel parameters for security wants to avoid a system (accidentally) booting into a non-compliant configuration. If the rollback deployment is there it's an easy down-arrow away for an admin at the grub prompt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to clarify, this will be removing the rollback (current) deployment when there is an osupdate, during the update cycle of the MCD. If something fails below (e.g. updating kargs), we also call a
removePendingDeployment
. Would that remove both deployments from rpm-ostree? i.e. if we somehow fail there, would we be able to recover at all?We also run an explicit defer of
applyOSChanges
to roll back to the previous deployment, maybe we should modify that code just in case something was hitting that before.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's actually three deployments potentially; "pending", "booted", "rollback". rpm-ostree will never let you remove the booted deployment. There's no API to do it in rpm-ostree, and multiple layers of safeguards against it (e.g. this code), in addition to the entire codebase being designed to avoid mutating the running system.
Does that address the concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, sorry I got the terms confused.
In this case wouldn't the rollback deployment being removed here get pushed out anyways once we update and reboot? Assuming nothing went wrong the
booted
becomesrollback
, thepending
becomesbooted
, and therollback
being removed here would be gone anyways?In a firstboot scenario, this code also runs but fresh nodes would only have a
booted
deployment, and updates to apending
deployment via the firstboot systemd unit. There wouldn't be arollback
to remove there, so if there are security concerns the firstboot would linger as a rollback and doesn't get removed until a further update.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes you're totally right 😄 This won't have the intended effect indeed.
Blah...the architectural problem we have here is there's no "sync loop" in the MCD outside of trying to perform an update; the closest we have is a few ad-hoc goroutines like the ssh monitor.