Skip to content

Commit

Permalink
Add Kubernetess nmstate operator to cvo
Browse files Browse the repository at this point in the history
  • Loading branch information
yboaron committed Apr 29, 2021
1 parent 7962498 commit 4f5171c
Showing 1 changed file with 215 additions and 0 deletions.
215 changes: 215 additions & 0 deletions enhancements/network/kubernetes-nmstate-operator-to-cvo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
---
title: add-kubernetes-nmstate-operator-to-cvo
authors:
- "@bcrochet"
- "@yboaron"
- "@hardys"
reviewers:
- "@cgwalters"
- "@derekwaynecarr"
- "@russellb"
approvers:
- TBD
creation-date: 2021-03-16
last-updated: 2021-04-23
status: provisional
see-also:
- "/enhancements/machine-config/mco-network-configuration.md"
---

# Add Kubernetes NMState Operator to CVO

## Release Signoff Checklist

- [ ] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] Operational readiness criteria is defined
- [ ] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)

## Summary

Enable Kubernetes NMstate by default for selected platforms.

Network interface configuration is a frequent requirement for some platforms, particularly baremetal.
In recent releases [work was completed]("/enhancements/machine-config/mco-network-configuration.md")
to enable Kubernetes NMstate (optionally via OLM) so that declarative configuration
of secondary network interfaces is possible.

To improve the user-experience it is desirable to enable the NMState API by default on selected platforms
where such configuration is common.

## Motivation

In previous discussion it was noted that we may [not want NMState enabled on all platforms](https://github.com/openshift/enhancements/pull/161#discussion_r433303754) and also that [all tech-preview features must require opt-in](https://github.com/openshift/enhancements/pull/161#discussion_r433315534).

Now, the support status of NMState is moving from tech-preview to fully supported, the main consideration is how to conditionally enable this only on platforms where it's likely to be needed.

If we can enable Kubernetes NMstate as part of the CVO payload, we can improve the user experience on platforms where it is required - for example allow for NodeNetworkConfigurationPolicy to be provided via manifests at install-time via openshift-installer.

Additionally, having the Kubernetes NMstate as part of the CVO will allow uniformity between the various platforms (CNV, IPI-Baremetal).

Currently, CNV uses [Hyperconverged Cluster Operator - HCO](https://github.com/kubevirt/hyperconverged-cluster-operator) to deploy [custom operator - CNAO](https://github.com/kubevirt/cluster-network-addons-operator#nmstate), the CNAO installs Kubernetes NMstate, among other network related items.
While IPI-Baremetal uses a [new operator](https://github.com/openshift/kubernetes-nmstate/blob/master/manifests/kubernetes-nmstate-operator.package.yaml) to install Kubernetes NMstate using OLM.

### Goals

- Provide NMState APIs by default on desired platforms (OpenShift Virtualization, IPI-Baremetal), [NetworkManager configuration should be updated](https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/mco-network-configuration.md#option-c-design) to support this capability on these platforms.
- Allow using NMState APIs at install-time, where NIC config is a common requirement.

### Non-Goals

- Make persistent networking changes to nodes. That should be handled by Machine Config Operator ( check [this link](https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/mco-network-configuration.md#option-c) for more details).
- Provide an API for controlplane network configuration via openshift-install (although in future a common API would be desirable for controlplane and secondary network interfaces)

## Proposal

### User Stories

#### Story 1

As a user of Baremetal IPI I want to provide NodeNetworkConfigurationPolicy resources via manifests at install-time, to simplify
my deployment workflow and avoid additional post-deploy steps.

I'd like to avoid dependencies on additional registries in disconnected environments and enable core platform functions
such as network configuration directly from the release payload.

#### Story 2

TODO - can we add a CNV story around simplification and maintenance overhead?

### Risks and Mitigations

Exposes a method to modify host networking. If a configuration is applied that breaks connectivity to the API, then it will be rolled back automatically.

## Design Details

### Open Questions [optional]

1.

### Test Plan

Will be tested with an E2E test suite that also runs upstream. There is also a unit test suite.

### Graduation Criteria

**Note:** *Section not required until targeted at a release.*

Define graduation milestones.

These may be defined in terms of API maturity, or as something else. Initial proposal
should keep this high-level with a focus on what signals will be looked at to
determine graduation.

Consider the following in developing the graduation criteria for this
enhancement:

- Maturity levels
- [`alpha`, `beta`, `stable` in upstream Kubernetes][maturity-levels]
- `Dev Preview`, `Tech Preview`, `GA` in OpenShift
- [Deprecation policy][deprecation-policy]

Clearly define what graduation means by either linking to the [API doc definition](https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-versioning),
or by redefining what graduation means.

In general, we try to use the same stages (alpha, beta, GA), regardless how the functionality is accessed.

[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions
[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/

**Examples**: These are generalized examples to consider, in addition
to the aforementioned [maturity levels][maturity-levels].

#### Dev Preview -> Tech Preview

- Kubernetes NMState Operator is currently available in the Red Hat Catalog as Tech Preview

#### Tech Preview -> GA

- More testing (upgrade, downgrade, scale)
- Sufficient time for feedback
- Available by default
- Conduct load testing
- E2E testing

#### Removing a deprecated feature

- Announce deprecation and support policy of the existing feature
- Deprecate the feature

### Upgrade / Downgrade Strategy

If applicable, how will the component be upgraded and downgraded? Make sure this
is in the test plan.

Consider the following in developing an upgrade/downgrade strategy for this
enhancement:

- What changes (in invocations, configurations, API use, etc.) is an existing
cluster required to make on upgrade in order to keep previous behavior?
- What changes (in invocations, configurations, API use, etc.) is an existing
cluster required to make on upgrade in order to make use of the enhancement?

Upgrade expectations:

- Each component should remain available for user requests and
workloads during upgrades. Ensure the components leverage best practices in handling [voluntary disruption](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/). Any exception to this should be
identified and discussed here.
- Micro version upgrades - users should be able to skip forward versions within a
minor release stream without being required to pass through intermediate
versions - i.e. `x.y.N->x.y.N+2` should work without requiring `x.y.N->x.y.N+1`
as an intermediate step.
- Minor version upgrades - you only need to support `x.N->x.N+1` upgrade
steps. So, for example, it is acceptable to require a user running 4.3 to
upgrade to 4.5 with a `4.3->4.4` step followed by a `4.4->4.5` step.
- While an upgrade is in progress, new component versions should
continue to operate correctly in concert with older component
versions (aka "version skew"). For example, if a node is down, and
an operator is rolling out a daemonset, the old and new daemonset
pods must continue to work correctly even while the cluster remains
in this partially upgraded state for some time.

Downgrade expectations:

- If an `N->N+1` upgrade fails mid-way through, or if the `N+1` cluster is
misbehaving, it should be possible for the user to rollback to `N`. It is
acceptable to require some documented manual steps in order to fully restore
the downgraded cluster to its previous state. Examples of acceptable steps
include:
- Deleting any CVO-managed resources added by the new version. The
CVO does not currently delete resources that no longer exist in
the target version.

### Version Skew Strategy

How will the component handle version skew with other components?
What are the guarantees? Make sure this is in the test plan.

Consider the following in developing a version skew strategy for this
enhancement:

- During an upgrade, we will always have skew among components, how will this impact your work?
- Does this enhancement involve coordinating behavior in the control plane and
in the kubelet? How does an n-2 kubelet without this feature available behave
when this feature is used?
- Will any other components on the node change? For example, changes to CSI, CRI
or CNI may require updating that component before the kubelet.

## Implementation History

- Kubernetes NMState Operator being built by ART.
- Kubernetes NMState Operator available in Red Hat Catalog as Tech Preview

## Drawbacks

1.

## Alternatives

1. CNV continues to install Kubernetes NMstate as they do today

## Infrastructure Needed [optional]

At a minimum, for e2e test suite, the worker nodes of the SUT would need 2 additional NICs available.

0 comments on commit 4f5171c

Please sign in to comment.