-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow installer to include/exclude components based on user select install solution #784
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/assign @derekwaynecarr |
5f325f0
to
145856a
Compare
the CVO overrides as part of rendering+editing manifests. | ||
|
||
The proposed UX is to make this a first class part of the install config api with the implementation | ||
being arguments supplied to the CVO to filter the user-selected manifests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is more involved in the installer implementation of this than simply passing along arguments to the CVO.
- Presumably the installer would need to know which addons can be excluded so that the installer can do validation on the user input.
- Some addons may require the installer to omit manifests that it would normally create. For example, if the baremetal-operator addons were omitted, then the installer would need to omit the manifest that configures the metal provisioning.
- There would probably be an expectation by the user that the installer would make some opinionated decisions on behalf of the user about which addons to exclude. Again with the baremetal-operator addon, my expectation as a user would be that the installer would configure the exclusions automatically so that the baremetal-operator addon was excluded when using a platform other than "baremetal".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
maybe? specifying things to exclude that don't actually exclude anything isn't a great UX but i dunno that it's a requirement that we tell you your exclusion didn't match anything.
-
not sure i follow this one. I'd expect all related manifests to be annotated such that they get excluded as a group. Could the manifest that configures the metal provisioning not also be annotated? Is it instantiated by the CVO or by the installer itself?
-
i'd classify that as a potential future goal/possibility, but not a requirement here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the same topic - will the user be able to say exclude bm-operator but install bm platform? If not, how the installer will know to block it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to cover these sort of scenarios here:
https://github.com/openshift/enhancements/pull/784/files#diff-4b412132f6f354bd03403cc85e104b445d5ff581abebae3d0b6fb69c58f094e7R120-R122
so we can discuss if that's "sufficient/acceptable" or not, but the position of this EP was to punt on such things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe? specifying things to exclude that don't actually exclude anything isn't a great UX but i dunno that it's a requirement that we tell you your exclusion didn't match anything.
I think it will be a bad experience if the user will have a typo. Maybe, we can find a way for the CVO to verify it during bootstrap and fail early. kinda cvo-render --verify-components
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. not sure i follow this one. I'd expect all related manifests to be annotated such that they get excluded as a group. Could the manifest that configures the metal provisioning not also be annotated? Is it instantiated by the CVO or by the installer itself?
It is ultimately added to the cluster via CVO. So the manifest could have the annotation. The installer would need to add that annotation since the installer creates the manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the same topic - will the user be able to say exclude bm-operator but install bm platform? If not, how the installer will know to block it?
Seems like that is the scope of this proposal, but as @staebler mentions it implies some additional validation in the installer, e.g taking the baremetal example - it's reasonable to do create manifests
or create install-configs
with the baremetal SLO disabled (perhaps to deploy via UPI or assisted where those components may not be required), but a user should not be allowed to do create cluster
with that configuration, because deployment of any worker hosts will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have more recent requirements to be able to install the metal tools on platforms other than baremetal
, so that's probably no longer a good example of something we would expect the installer to exclude automatically.
That said, I think we can ignore the automated guidance angle, at least in a first iteration. We can make it fancier/smarter later, but the first version should have simple rules: The default guidance is to install everything. If you want less than everything, you tell the CVO and it leaves out the things you don't want.
b) enable/disable specific configurations such as "HA", where components could contribute multiple | ||
deployment definitions for different configurations and then the installer/CVO would select the correct | ||
one based on the chosen install configuration (HA vs single node) instead of having components read/reconcile | ||
the infrastructure resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would be useful to expand the scope to include opt-in to optional components, for example see the discussion on #747 where we'd like to make the kubernetes-nmstate operator optionally available on any platform (and probably enabled by default on some where it's commonly used, e.g baremetal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now i want to constrain this to "how do we make the set of things the CVO currently installs be dynamic instead of static"
mechanisms for having OLM-based components be a first class part of an install experience is going to be a follow-on item.
of that component explicitly opting into a particular (presumably well tested) configuration/topology | ||
[here](https://github.com/openshift/enhancements/pull/200#discussion_r375837903). The position of this EP is that | ||
we should only recommend the exclusion of fully independent "addon" components that are not depended on by | ||
other components. Further the assumption is that it will be reasonable to tell a customer who disabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that removing the CBO from hypershift clusters is desirable, but I think there's a misunderstanding that we need to address.
Most of the examples given above are "easy" because nothing else depends on having them present. That's not true for the baremetal, unfortunately. The whole reason it's in the payload at all is that it underpins IPI deployment and the machine-api. As hardys points out, when the platform is baremetal
, if the implementation isn't present, the cluster won't form and deployment will fail in a not obvious way. (Either the installer would try to create BareMetalHost resources using a CRD that isn't installed and installation would time out, or the installer would not try to create BareMetalHost resources and then the nodes and machines wouldn't be linked and nodes wouldn't be allowed to join the cluster as it is forming.)
Maybe baremetal is a one-off case, and we can do the validation for it in the place where we validate the other platform inputs. Can we identify other similar cases, though? Do we need something more complex than annotations to support cross-component dependencies? Or are there similar places in the installer where we could put dependency-specific validation checks for other cases, one at a time as they come up?
the infrastructure resource. | ||
|
||
2. How does the admin enable a component post-install if they change their mind about what components | ||
they want enabled? Do we need/want to allow this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we produce OLM versions of the add-ons, and let them use OLM to install the components later if they aren't present from the start?
If we do that, could we move the add-ons out of the release payload and always manage them via OLM, even if it means the installer defaults to generating the OLM manifests to trigger installing the add-on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's plausible, but olm versions of things that are deemed "essential" brings its own set of challenges:
- disconnected installs (you need to mirror the olm catalog also, or we need to include an olm catalog in the payload)
- independent versioning/release cadence and ensuring version-matrix compatibility
- install ordering (no olm component can possibly come up until OLM itself comes up). - likely not a concern for things we have deemed "Addons" but just noting it.
these are definitely things we're going to be starting to explore (looks like cert-manager may be the first candidate) but at least right now it raises more questions than it answers, for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a better post-install solution is to just let the user modify the CVO resource that records the list of skipped components? Then when the CVO reconciles, it would (un)install components based on the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then when the CVO reconciles, it would (un)install components based on the change.
that would be a substantially new capability for the CVO (and also requires the operators themselves get a chance to clean up any changes they've made to the cluster directly). The other things in this EP are fairly small/incremental changes from what the CVO does today, but removal would be a big new capability.
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
a) turn on/off groups of components as defined by "solutions" (e.g. a "headless" solution | ||
which might turn off the console but also some other components). This is what CLUSTER_PROFILES | ||
sort of enable, but there seems to be reluctance to expand the cluster profile use case to include | ||
these sorts of things. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of the issue with profiles is managing the proliferation of definitions of groupings of individual components as we try to predict what users will or won't prioritize including in their clusters. We may have the same challenge with defining solutions or configurations. Starting with the low level components individually will give users complete control, and we can layer some sort of grouping concept on top later if we find it necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at this point the EP is directed towards this approach, basically grouping resources at the second-level-operator level (technically people can annotate the resources however they want, but the expected usage will be to annotate component-related resources with a common name that the admin can then opt in/out of).
slightly more complicated is the case where 2 or more components have interdependencies and what we do if the admin only disables one of them. I'm not sure we've identified any examples of this that need to be disable-able in the first pass, though.
the infrastructure resource. | ||
|
||
2. How does the admin enable a component post-install if they change their mind about what components | ||
they want enabled? Do we need/want to allow this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a better post-install solution is to just let the user modify the CVO resource that records the list of skipped components? Then when the CVO reconciles, it would (un)install components based on the change.
the cluster and used to filter applied resources during upgrades also. My understanding is today | ||
this is handled with CLUSTER_PROFILES and EXCLUDE_ANNOTATIONS by setting the env vars on the CVO | ||
pod, but if we want to allow the set to be changed (see (2), we need a more first class config | ||
resource that is admin editable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would a new component have been filtered out if the user didn't know to do so because it didn't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the new component (or new resource for an existing component) uses the same "component name" that was part of the initial filter list during install.
|
||
4) After installing a cluster, change the set of addons that are included/excluded. | ||
Newly included addon resources should be created, newly excluded ones should be | ||
deleted. (Do we want to support this?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it cost us anything extra to support it, other than the testing effort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allowing previously excluded components to be added to the cluster is relatively easy.
allowing previously included components to be removed is hard, per the other comment thread (CVO doesn't know how to clean stuff up today, and can't be sure it knows everything an operator may have done)
And allowing it at all means we need a user editable config resource that the CVO watches to update its filter list. If you can't change it after install time, you don't need any of that.
the CVO overrides as part of rendering+editing manifests. | ||
|
||
The proposed UX is to make this a first class part of the install config api with the implementation | ||
being arguments supplied to the CVO to filter the user-selected manifests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have more recent requirements to be able to install the metal tools on platforms other than baremetal
, so that's probably no longer a good example of something we would expect the installer to exclude automatically.
That said, I think we can ignore the automated guidance angle, at least in a first iteration. We can make it fancier/smarter later, but the first version should have simple rules: The default guidance is to install everything. If you want less than everything, you tell the CVO and it leaves out the things you don't want.
new PR: |
No description provided.