Feature Request: Make possible to pin target release #947

sgohl · 2021-09-01T12:28:45Z

Describe the enhancement

To ensure a stable and consistent working landscape, it would be very helpful to pin coreos to a specific version, to which zincati is allowed to upgrade to.

Otherwise, any system would possibly have any version installed, which gets more complicated with schedules and different wariness settings

If I encounter a bug in a new release, I want to prevent all systems to be able to upgrade to this version.
Rollback is not sufficient in this case because it can not be done preventively and only for every machine one-by-one.

I wish this could be made possible with a dropin file into /etc/zincati/config.d like this:

[updates]
max_target_release = "34.20210808.3.0"

If this setting does not exist, nothing will be changed for anyone.

And yes i understand, that you actually want every system to be most up to date, but it's not your head to be hung when systems crash, so please let us decide on our own and have control over our lifecycle management :)

System details

Bare Metal/QEMU/AWS/GCP/etc. -> any
Fedora CoreOS version -> any

Additional information
n/a

Edit: To prevent you radically deny that as a whole, i could think of a compromise in a way that, lets say, it has a limit of x number of releases we are allowed to skip or it will print a warning at motd or something like that ...

The text was updated successfully, but these errors were encountered:

dustymabe · 2021-09-01T13:33:57Z

hey @sgohl, thanks for the feature request. Our expert on this topic will be back next week so we'll probably discuss this in next week's meeting.

travier · 2021-09-01T16:41:00Z

Might be similar to coreos/zincati#245 & coreos/zincati#540

lucab · 2021-09-16T13:22:13Z

@sgohl thanks for the report. This looks like an interesting RFE at its heart, but it possibly needs to be refined/scoped a bit in order to turn it into a viable implementation.

I'd start by putting aside the initial config.d proposal for now. Among other things, the content in /etc gets versioned with OS deployment so upgrades/rollbacks are going to wreak havoc with any kind of rolling/live data.

Taking a step back, it would be useful to get a better view on the actual problem and the surrounding environment you have at hand.
It sounds like you are trying to steer updates through a fleet of nodes (i.e. not handling a single machine), am I reading this right?
And you are looking for a mechanism to obtain homogenous OS versions, correct? Plus some kind of oracle / canary system to select viable update targets?

If that is the case, we should probably drill down on how cluster coordination is performed in your environment. Specifically, whether there is a central coordinator pushing live signals to all nodes, or whether each node is individually pulling fresh details from a coordinator.

sgohl · 2021-09-17T17:08:39Z

Hi and many thanks for your interest on this case!

steer updates through a fleet of nodes (i.e. not handling a single machine), am I reading this right?
And you are looking for a mechanism to obtain homogenous OS versions, correct? Plus some kind of oracle / canary system to select viable update targets?

yes, my case is multiple datacenters with a lots of single nodes, loosely coupled node-groups, many clusters with variable node-count each (nodes join and leave, think of CloudFormation), all in all about 600-1000 vms and bare-metals - almost everything Fedora CoreOS
And I mainly want all "important" nodes having a specific, pinned CoreOS release to prevent facing a known bug on auto-update nodes one by one and doing rollbacks (god no). edit: I am widely still using Docker Swarm, which I suppose being deprecated with cgroups2 in the future, while not having migrated to k8s this is definitely a point I'm afraid of ^^

This is highly related to having no manual update mechanism anymore.

Besides many ephemeral and testing machines with different streams with always/immediate updates and higher available nodes with scheduled updates like staging systems, the problem starts with HA systems having absolute no update schedule at all because not even 10sec downtime is acceptable and some applications need manual preparing and intervention. Schedules and Fleetlock is not enough by far.

Instant-update like it was on old coreos is highly missed - i would pay to have this back again, and you may think this would be a bad idea against the concept, because it causes systems to be older, but no, the opposite is the case. When I could fire an immediate-update with forced reboot, I could for example

control organized group-of-nodes to be updated at any time I want (again, schedules not sufficient)
let system users choose the best time for a update theirselves, which will be more often than never, because its off right now
include this in host provision scripts (which I cant, because zincati is so asynchronous and won't reboot on open tty or running rpm-ostree actions etc), I tried hours and hours to solve this problem, with lockfiles and checks, but without success (always facing a hen-egg situation)

Auto-Updates are nice for systems where high availability is not a big thing.

Some applications often need seconds to minutes to go back to full availability, caused by things like health checks need their time to drain backends, schedulers need to move services and prepare/pull image on new host right before a zincati update, etc
Intelligence is needed where we dont have any, so we have to do certain things manually, it's just that, real world issues. High availability is just more important

the content in /etc gets versioned with OS deployment so upgrades/rollbacks are going to wreak havoc with any kind of rolling/live data.

even if we put it via ignition? because we do this anyway with a lots of other files, and update-strategy is anyway already modified via ignition

putting aside the initial config.d proposal

yes, that would be very static if you see it as just this. I'd put a consul-watcher service then, or another simple approach, to have it centrally managable.

dustymabe · 2021-09-17T18:02:33Z

all in all about 600-1000 vms and bare-metals - almost everything Fedora CoreOS

❤️

sgohl · 2021-09-21T13:38:31Z

out of curiosity, couldn't we have such thing as a cincinnati-proxy application for the purpose of "lying" what the current release is? :D

unfortunately, this page (https://github.com/coreos/zincati/blob/main/docs/development/cincinnati/protocol.md)
is not really helpful how a request should look like.

But if we had a web-app act like a proxy-server which intercepts the response from the "real" cincinnati server, we could build a webapp to pin specific servers to a release (with optional expiration) and change the release value on-the-fly before back-relaying the response to our zincati-client.

heart

yeees, i love coreos 🥇

Some servers' firmware push any new detected boot options to the tail of the boot order. When other boot options are present and bootable, such a server will boot from them instead of the new one. As a (temporary?) workaround, we manually add the boot option. NOTE: it's assumed that old OSes boot options are removed from the boot options list during the wipe operations. xrefs: https://bugzilla.redhat.com/show_bug.cgi?id=1997805 coreos/fedora-coreos-tracker#946 coreos/fedora-coreos-tracker#947

* Support Dell IPMI power commands On Dell servers, `ipmi power (off|on|reset)` returns errors when the host is in a state that doesn't allow the requested transition. Enforcing two commands (on + off) instead of reset, and ignoring any power off errors to ignore those validation errors. * Set the efi boot order after installing RHCOS in UPI/UEFI/PXE scenarios Some servers' firmware push any new detected boot options to the tail of the boot order. When other boot options are present and bootable, such a server will boot from them instead of the new one. As a (temporary?) workaround, we manually add the boot option. NOTE: it's assumed that old OSes boot options are removed from the boot options list during the wipe operations. xrefs: https://bugzilla.redhat.com/show_bug.cgi?id=1997805 coreos/fedora-coreos-tracker#946 coreos/fedora-coreos-tracker#947

sgohl added the kind/enhancement label Sep 1, 2021

dustymabe added the meeting topics for meetings label Sep 1, 2021

lucab removed the meeting topics for meetings label Sep 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Make possible to pin target release #947

Feature Request: Make possible to pin target release #947

sgohl commented Sep 1, 2021 •

edited

Loading

dustymabe commented Sep 1, 2021

travier commented Sep 1, 2021

lucab commented Sep 16, 2021

sgohl commented Sep 17, 2021 •

edited

Loading

dustymabe commented Sep 17, 2021

sgohl commented Sep 21, 2021 •

edited

Loading

Feature Request: Make possible to pin target release #947

Feature Request: Make possible to pin target release #947

Comments

sgohl commented Sep 1, 2021 • edited Loading

dustymabe commented Sep 1, 2021

travier commented Sep 1, 2021

lucab commented Sep 16, 2021

sgohl commented Sep 17, 2021 • edited Loading

dustymabe commented Sep 17, 2021

sgohl commented Sep 21, 2021 • edited Loading

sgohl commented Sep 1, 2021 •

edited

Loading

sgohl commented Sep 17, 2021 •

edited

Loading

sgohl commented Sep 21, 2021 •

edited

Loading