Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Kubernetes NMstate by default for selected platforms #747

Closed
wants to merge 1 commit into from

Conversation

yboaron
Copy link

@yboaron yboaron commented Apr 21, 2021

Network interface configuration is a frequent requirement for some platforms, particularly baremetal.
In recent releases work was completed (see [1]) to enable Kubernetes NMstate (optionally via OLM) so that declarative configuration of secondary network interfaces is possible.

To improve the user-experience it is desirable to enable the NMState API by default on selected platforms where such configuration is common.
Having Kubernetes NMstate deployed by CVO will allow for example secondary NIC configuration via NodeNetworkConfigurationPolicy manifests at install time

[1] https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/mco-network-configuration.md

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign dcbw after the PR has been reviewed.
You can assign the PR to them by writing /assign @dcbw in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@markmc
Copy link
Contributor

markmc commented Apr 22, 2021

I don't think we can justify this simply on the basis that installing an optional operator is "inconvenient and cumbersome"

In https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/mco-network-configuration.md I think we acknowledge that this configuration falls under the domain of the MCO, although we seem to be making a distinction about "only affecting post-kubelet". It also discusses only enabled the option C integration on bare-metal. I admit I'm confused.

If we do see this configuration in the domain of the MCO, and if we would like this capability to be enabled by default on all clusters, then I'd like to see the problem framed as an extension of the previous enhancement - e.g. considering the option of kubernetes-nmstate as an MCO operand.

@hardys
Copy link
Contributor

hardys commented Apr 22, 2021

I don't think we can justify this simply on the basis that installing an optional operator is "inconvenient and cumbersome"

Agreed we can perhaps be more specific about the user impact here, IIUC one of the reasons is so that secondary network configuration may be provided via NNCP manifests during initial deployment, vs requiring an optional add-on API and post-deploy configuration - @yboaron @celebdor do we have any more examples of the requirements driving this?

In https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/mco-network-configuration.md I think we acknowledge that this configuration falls under the domain of the MCO, although we seem to be making a distinction about "only affecting post-kubelet". It also discusses only enabled the option C integration on bare-metal. I admit I'm confused.

I believe option C was implemented, as a compromise because MCO is supposed to own all persistent config, but it lacks the features required to enable NIC configuration (particularly Machine specific config, the ability to apply some NIC config without rebooting the box, some kind of declarative API (e.g not just writing NM keyfiles), ability to rollback a bad config, etc.

The issue then, given that this kind of configuration is very common for CNV/baremetal is that the implemented API isn't available by default in those environments.

If we do see this configuration in the domain of the MCO, and if we would like this capability to be enabled by default on all clusters, then I'd like to see the problem framed as an extension of the previous enhancement - e.g. considering the option of kubernetes-nmstate as an MCO operand.

Agreed, perhaps framing the problem as "deployed via CVO" is missing the bigger picture, which is that we've got a large API gap for on-premise users - we need a standard way to describe NIC config (for both secondary and controlplane networks, although currently the nmstate/NNCP integration only enables the former)

@yboaron yboaron changed the title Add Kubernetes nmstate operator to CVO Enable Kubernetes NMstate by default for selected platforms Apr 26, 2021
@yboaron yboaron requested review from hardys and markmc April 26, 2021 13:43
@romfreiman
Copy link

Just to verify - how many instances will it run with? Any affect on single node openshift?

@cgwalters
Copy link
Member

Does nmstate have any support for e.g. wireguard? That could be useful even in cloud platforms.

@yboaron yboaron force-pushed the k8s_nmstate_cvo branch from 7f0ceb5 to 91e8770 Compare May 2, 2021 08:23
@phoracek
Copy link
Member

phoracek commented May 3, 2021

Does nmstate have any support for e.g. wireguard? That could be useful even in cloud platforms.

It does not support it now. We could open an RFE on nmstate to see if it is something they would consider adding.

@derekwaynecarr
Copy link
Member

Just recording some questions that I think will help frame the enhancement for use in future discussions.

Justification for why this should run everywhere (on-prem, cloud, edge) in all delivery models (self-managed, hosted).

This would be important to emphasize if we expand this to more than "selected platfoms"

Is the signal/noise ratio by having this API/controller everywhere worth the potential confusion for an SRE in a particular infra, location, or delivery model when debugging an issue?

I am assuming the API would only be installed on selected platforms as a result of this enhancement.

What existing SLO could deliver the content and not confuse the user? e.g. could Cluster Network Operator deliver this content optionally based on its own local config like storage/csi?

This enhancement is asking for the capability to be versioned with OCP (in the release payload), but not enabled on all platforms. We need to evaluate which second level meta-domain operator makes the most sense to enable this, but I am against adding a net new second level operator.

What is the resource budget for this API/controller?

To enable it by default, we should understand its impact to the control plane:

  • write rate to the API server
  • number of clients that write to the resource (is it per node? is there a single controller co-located with api-server?)

To enable it by default, we should understand its impact to each worker:

  • what does it watch?
  • per node daemon resource cost (cpu/memory)
  • per node daemon priority (is it node critical?)
  • per node authority to write to the corresponding resource (think node restriction admission plugin in k8s as example)
  • any unique backup/restore semantics introduced by the component

Its future impact to other topologies (single node or externalized control planes)

For single node

  • understanding what namespace hosts the component, and ensuring it fits in the cpuset isolation for mgmt components

For externalized control planes

  • is this needed only for the 'first' mgmt cluster and its workers?
  • is this needed for every worker in all agnostic clusters, and therefore should be hosted with an externalized control plane because its needed to support any 'join worker' flow.

is it ok to deliver the content in a release payload, but enable it via some other configuration knob?

What is the knob to turn on/off the capability? This is related to the SLO chosen to deliver this new meta-operator and what configuration it reads to know to turn it on. For example, if this is for selected platforms, the component would read the Infrastructure CR to determine if it should enable it all, but is there another knob that would allow opt-in/out?

@yboaron yboaron force-pushed the k8s_nmstate_cvo branch from 91e8770 to e82cd67 Compare May 18, 2021 07:29
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 18, 2021

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign dcbw after the PR has been reviewed.
You can assign the PR to them by writing /assign @dcbw in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yboaron yboaron force-pushed the k8s_nmstate_cvo branch from e82cd67 to 7cf8dd1 Compare May 18, 2021 07:33
@yboaron
Copy link
Author

yboaron commented May 18, 2021

Just recording some questions that I think will help frame the enhancement for use in future discussions.

Justification for why this should run everywhere (on-prem, cloud, edge) in all delivery models (self-managed, hosted).

This would be important to emphasize if we expand this to more than "selected platfoms"

Is the signal/noise ratio by having this API/controller everywhere worth the potential confusion for an SRE in a particular infra, location, or delivery model when debugging an issue?

I am assuming the API would only be installed on selected platforms as a result of this enhancement.

What existing SLO could deliver the content and not confuse the user? e.g. could Cluster Network Operator deliver this content optionally based on its own local config like storage/csi?

This enhancement is asking for the capability to be versioned with OCP (in the release payload), but not enabled on all platforms. We need to evaluate which second level meta-domain operator makes the most sense to enable this, but I am against adding a net new second level operator.

What is the resource budget for this API/controller?

To enable it by default, we should understand its impact to the control plane:

  • write rate to the API server
  • number of clients that write to the resource (is it per node? is there a single controller co-located with api-server?)

To enable it by default, we should understand its impact to each worker:

  • what does it watch?
  • per node daemon resource cost (cpu/memory)
  • per node daemon priority (is it node critical?)
  • per node authority to write to the corresponding resource (think node restriction admission plugin in k8s as example)
  • any unique backup/restore semantics introduced by the component

Its future impact to other topologies (single node or externalized control planes)

For single node

  • understanding what namespace hosts the component, and ensuring it fits in the cpuset isolation for mgmt components

For externalized control planes

  • is this needed only for the 'first' mgmt cluster and its workers?
  • is this needed for every worker in all agnostic clusters, and therefore should be hosted with an externalized control plane because its needed to support any 'join worker' flow.

is it ok to deliver the content in a release payload, but enable it via some other configuration knob?

What is the knob to turn on/off the capability? This is related to the SLO chosen to deliver this new meta-operator and what configuration it reads to know to turn it on. For example, if this is for selected platforms, the component would read the Infrastructure CR to determine if it should enable it all, but is there another knob that would allow opt-in/out?

Just recording some questions that I think will help frame the enhancement for use in future discussions.

Justification for why this should run everywhere (on-prem, cloud, edge) in all delivery models (self-managed, hosted).

This would be important to emphasize if we expand this to more than "selected platfoms"

Is the signal/noise ratio by having this API/controller everywhere worth the potential confusion for an SRE in a particular infra, location, or delivery model when debugging an issue?

I am assuming the API would only be installed on selected platforms as a result of this enhancement.

What existing SLO could deliver the content and not confuse the user? e.g. could Cluster Network Operator deliver this content optionally based on its own local config like storage/csi?

This enhancement is asking for the capability to be versioned with OCP (in the release payload), but not enabled on all platforms. We need to evaluate which second level meta-domain operator makes the most sense to enable this, but I am against adding a net new second level operator.

What is the resource budget for this API/controller?

To enable it by default, we should understand its impact to the control plane:

  • write rate to the API server
  • number of clients that write to the resource (is it per node? is there a single controller co-located with api-server?)

To enable it by default, we should understand its impact to each worker:

  • what does it watch?
  • per node daemon resource cost (cpu/memory)
  • per node daemon priority (is it node critical?)
  • per node authority to write to the corresponding resource (think node restriction admission plugin in k8s as example)
  • any unique backup/restore semantics introduced by the component

Its future impact to other topologies (single node or externalized control planes)

For single node

  • understanding what namespace hosts the component, and ensuring it fits in the cpuset isolation for mgmt components

For externalized control planes

  • is this needed only for the 'first' mgmt cluster and its workers?
  • is this needed for every worker in all agnostic clusters, and therefore should be hosted with an externalized control plane because its needed to support any 'join worker' flow.

is it ok to deliver the content in a release payload, but enable it via some other configuration knob?

What is the knob to turn on/off the capability? This is related to the SLO chosen to deliver this new meta-operator and what configuration it reads to know to turn it on. For example, if this is for selected platforms, the component would read the Infrastructure CR to determine if it should enable it all, but is there another knob that would allow opt-in/out?

Thank you for raising these questions, a Q&A section was added to the enhancement.

@bengland2
Copy link

just a comment - OCS could make use of nmstate for MTU, bonding, a bunch of things that would significantly improve its performance if this was part of base system. @obnoxxx


Now, the support status of NMState is moving from tech-preview to fully supported, the main consideration is how to conditionally enable this only on platforms where it's likely to be needed.

If we can enable Kubernetes NMstate as part of the release payload, we can improve the user experience on platforms where it is required - for example allow for NodeNetworkConfigurationPolicy to be provided via manifests at install-time via openshift-installer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #753 we are addressing a similar need with the performance-addon-operator. In that case, we're exploring the option of running the operator's render command outside of the installer between the create manifests and create cluster phases. Could a similar approach be used with kubernetes-nmstate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think that could work, if the needed containers are in the release payload then even if we require an "advanced" interface at the installer phase it's an improvement on the current process where everything has to happen day-2 via OLM. @yboaron would you agree?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that sounds like a good direction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't be adding the PAO containers to the release payload. Users who want to use workload partitioning will download the PAO separately (perhaps running it via podman).


### Goals

- Provide NMState APIs by default on desired platforms, [NetworkManager configuration should be updated](https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/mco-network-configuration.md#option-c-design) to support this capability on these platforms.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading through this, it's not entirely clear what the "desired platforms" are. Is it just baremetal IPI?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial target is just the baremetal platform, to support both CNV and general baremetal cases, but ref discussion above with @bengland2 there are other potential use-cases e.g OCS which wouldn't necessarily be specific to baremetal?

So I'd like to see this enabled by default for baremetal, with a way to opt-in for other platforms.

The other option that you mention above is to add the components to the release payload, but don't enable by default for any platform, then require some "advanced" customization step e.g manifest to enable (on any platform).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For OpenShift Virtualization it would be important too to have a way to opt-in on other platforms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd certainly want this on baremetal, None, vSphere, and potentially ovirt. Really any where we're likely to encounter static network configuration which may need to be managed day2.


### Goals

- Provide NMState APIs by default on desired platforms, [NetworkManager configuration should be updated](https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/mco-network-configuration.md#option-c-design) to support this capability on these platforms.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We state the 'desired platforms' all over this document but don't list what the desired platforms are.

Who has this list?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, lets get a list into the enhancement. Here's my suggestion #747 (comment)


**Q:** why NMSTATE capability should run everywhere (on-prem, cloud, edge) in all delivery models (self-managed, hosted) ?

**A:** If it was supported/possible, it would have been better to deploy nmstate capability only in required platforms.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we elaborate more on 'why its not possible'? I find this to be an answer without context.


If we can enable Kubernetes NMstate as part of the release payload, we can improve the user experience on platforms where it is required - for example allow for NodeNetworkConfigurationPolicy to be provided via manifests at install-time via openshift-installer.

Additionally, having the Kubernetes NMstate as part of the release payload will allow uniformity between the various platforms (CNV, IPI-Baremetal).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these two operators run the risk of conflicting with one another because of this approach?

@bengland2
Copy link

bengland2 commented Jun 3, 2021 via email


Now, the support status of NMState is moving from tech-preview to fully supported, the main consideration is how to conditionally enable this only on platforms where it's likely to be needed.

If we can enable Kubernetes NMstate as part of the release payload, we can improve the user experience on platforms where it is required - for example allow for NodeNetworkConfigurationPolicy to be provided via manifests at install-time via openshift-installer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't be adding the PAO containers to the release payload. Users who want to use workload partitioning will download the PAO separately (perhaps running it via podman).


Additionally, having the Kubernetes NMstate as part of the release payload will allow uniformity between the various platforms (CNV, IPI-Baremetal).

Currently, CNV uses [Hyperconverged Cluster Operator - HCO](https://github.com/kubevirt/hyperconverged-cluster-operator) to deploy [custom operator - CNAO](https://github.com/kubevirt/cluster-network-addons-operator#nmstate), the CNAO installs Kubernetes NMstate, among other network related items.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this approach also use OLM to install the operator?

Copy link
Member

@phoracek phoracek Jun 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. HCO and CNAO are both deployed through OLM. kubernetes-nmstate however is deployed directly by CNAO, without OLM. The reason is that we want a granular control over the lifecycle without a dependency on OLM as the U/S project targets vanilla Kubernetes environments too. It is worth mentioning that CNAO in this case deploys kubernetes-nmstate directly. The nmstate operator was introduced only recently to allow migration of kubernetes-nmstate under OpenShift - hence the dedicated operator decoupled from the rest of CNAO's networking.


**A:** For the tech-preview phase we have a lightweight controller that only runs a deployment (replica=1) and NMSTATE CRD definition ,
creating NMSTATE CR instance triggers the full API controller deployment.
With this approach, we shouldn't have the full NMSTATE API everywhere.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would, though, because the CRD is installed and can be triggered. Is that appropriate for all platforms and deployment architectures?

### Goals

- Provide NMState APIs by default on desired platforms, [NetworkManager configuration should be updated](https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/mco-network-configuration.md#option-c-design) to support this capability on these platforms.
- Allow using NMState APIs at install-time, where NIC config is a common requirement.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a dumb question, but will install time configuration be persistent via the MCO? Compared to the ephemeral option-C above that is. Otherwise wouldn't it be lost after the first reboot that happens on all RHCOS nodes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Install-time configuration would be persisted via the kubernetes-nmstate CR. For day 1 we would pull the config out of that and apply it directly to the image, then for day 2 the operator would take over and allow more dynamic changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misunderstood the exact boundaries of responsibilities. The MCO is just writing a config setup for the NMstate operator to use? Or is the MCO not involved at all in this nice flow

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 30d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 21, 2021
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 28, 2021
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Oct 5, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 5, 2021

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.