-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Make possible to pin target release #947
Comments
hey @sgohl, thanks for the feature request. Our expert on this topic will be back next week so we'll probably discuss this in next week's meeting. |
Might be similar to coreos/zincati#245 & coreos/zincati#540 |
@sgohl thanks for the report. This looks like an interesting RFE at its heart, but it possibly needs to be refined/scoped a bit in order to turn it into a viable implementation. I'd start by putting aside the initial Taking a step back, it would be useful to get a better view on the actual problem and the surrounding environment you have at hand. If that is the case, we should probably drill down on how cluster coordination is performed in your environment. Specifically, whether there is a central coordinator pushing live signals to all nodes, or whether each node is individually pulling fresh details from a coordinator. |
Hi and many thanks for your interest on this case!
yes, my case is multiple datacenters with a lots of single nodes, loosely coupled node-groups, many clusters with variable node-count each (nodes join and leave, think of CloudFormation), all in all about 600-1000 vms and bare-metals - almost everything Fedora CoreOS This is highly related to having no manual update mechanism anymore. Besides many ephemeral and testing machines with different streams with always/immediate updates and higher available nodes with scheduled updates like staging systems, the problem starts with HA systems having absolute no update schedule at all because not even 10sec downtime is acceptable and some applications need manual preparing and intervention. Schedules and Fleetlock is not enough by far. Instant-update like it was on old coreos is highly missed - i would pay to have this back again, and you may think this would be a bad idea against the concept, because it causes systems to be older, but no, the opposite is the case. When I could fire an immediate-update with forced reboot, I could for example
Auto-Updates are nice for systems where high availability is not a big thing. Some applications often need seconds to minutes to go back to full availability, caused by things like health checks need their time to drain backends, schedulers need to move services and prepare/pull image on new host right before a zincati update, etc
even if we put it via ignition? because we do this anyway with a lots of other files, and update-strategy is anyway already modified via ignition
yes, that would be very static if you see it as just this. I'd put a consul-watcher service then, or another simple approach, to have it centrally managable. |
❤️ |
out of curiosity, couldn't we have such thing as a cincinnati-proxy application for the purpose of "lying" what the current release is? :D unfortunately, this page (https://github.com/coreos/zincati/blob/main/docs/development/cincinnati/protocol.md) But if we had a web-app act like a proxy-server which intercepts the response from the "real" cincinnati server, we could build a webapp to pin specific servers to a release (with optional expiration) and change the release value on-the-fly before back-relaying the response to our zincati-client.
yeees, i love coreos 🥇 |
Some servers' firmware push any new detected boot options to the tail of the boot order. When other boot options are present and bootable, such a server will boot from them instead of the new one. As a (temporary?) workaround, we manually add the boot option. NOTE: it's assumed that old OSes boot options are removed from the boot options list during the wipe operations. xrefs: https://bugzilla.redhat.com/show_bug.cgi?id=1997805 coreos/fedora-coreos-tracker#946 coreos/fedora-coreos-tracker#947
* Support Dell IPMI power commands On Dell servers, `ipmi power (off|on|reset)` returns errors when the host is in a state that doesn't allow the requested transition. Enforcing two commands (on + off) instead of reset, and ignoring any power off errors to ignore those validation errors. * Set the efi boot order after installing RHCOS in UPI/UEFI/PXE scenarios Some servers' firmware push any new detected boot options to the tail of the boot order. When other boot options are present and bootable, such a server will boot from them instead of the new one. As a (temporary?) workaround, we manually add the boot option. NOTE: it's assumed that old OSes boot options are removed from the boot options list during the wipe operations. xrefs: https://bugzilla.redhat.com/show_bug.cgi?id=1997805 coreos/fedora-coreos-tracker#946 coreos/fedora-coreos-tracker#947
Describe the enhancement
To ensure a stable and consistent working landscape, it would be very helpful to pin coreos to a specific version, to which zincati is allowed to upgrade to.
Otherwise, any system would possibly have any version installed, which gets more complicated with schedules and different wariness settings
If I encounter a bug in a new release, I want to prevent all systems to be able to upgrade to this version.
Rollback is not sufficient in this case because it can not be done preventively and only for every machine one-by-one.
I wish this could be made possible with a dropin file into
/etc/zincati/config.d
like this:If this setting does not exist, nothing will be changed for anyone.
And yes i understand, that you actually want every system to be most up to date, but it's not your head to be hung when systems crash, so please let us decide on our own and have control over our lifecycle management :)
System details
Additional information
n/a
Edit: To prevent you radically deny that as a whole, i could think of a compromise in a way that, lets say, it has a limit of x number of releases we are allowed to skip or it will print a warning at motd or something like that ...
The text was updated successfully, but these errors were encountered: