A controller that connects the maintenance-controller with Cluster API.
Cluster API provides a way to declare, create, and manage Kubernetes clusters.
By integrating with the maintenance-controller, the automation can be extended to the maintenance processes, providing a more comprehensive cluster lifecycle management experience.
The maintenance-controller enables easy and structured maintenance processes for the nodes in a Kubernetes cluster.
When integrated with the Cluster API, the maintenance of Machine
objects can be orchestrated and aligned with deployments in workload clusters.
A Machine
object that should adhere to a maintenance-controller deployment in the workload cluster needs to be attached with the runtime-extension-maintenance-controller.cloud.sap/enabled: "true"
label.
This is done best on MachineDeployments
resources for consistency reasons.
As soon as a machine is marked as enabled for runtime-extension-maintenance-controller it is annotated with pre-drain.delete.hook.machine.cluster.x-k8s.io/maintenance-controller: runtime-extensions-maintenance-controller
to stop the cleanup logic of Cluster API on machine deletion.
Instead the corresponding Node
object in the workload cluster is marked with the runtime-extension-maintenance-controller.cloud.sap/machine-deleted: "true"
label.
This label should be used to trigger the maintenance-controller in the workload cluster.
The runtime-extension-maintenance-controller.cloud.sap/approve-deletion: "true"
label should be used by the maintenance-controller to notify the runtime-extension, that the node is ready to be removed.
After attaching the approve-deletion label the pre-drain hook will be removed, which allows machine deletion to continue.
Sometimes a machine needs a longer maintenance outside of the cluster-api machine lifecycle.
To initiate such a longer maintenance a Machine
object can be labelled with runtime-extension-maintenance-controller.cloud.sap/maintenance: requested
.
That will deliver the runtime-extension-maintenance-controller.cloud.sap/machine-maintenance: "true"
label on the respective node, which the maintenance-controller can act upon.
The runtime-extension-maintenance-controller.cloud.sap/approve-maintenance: "true"
label should be used by the maintenance-controller to notify the runtime-extension, that the node is ready to be maintained.
This sets the label on the Machine
object to runtime-extension-maintenance-controller.cloud.sap/maintenance: approved
.
A different controller needs to start the actual maintenance procedure once the a maintenance on a machine is approved.
Such a procedure depends on the infrastructure provider backing the machine.
The maintenance can be stopped by removing the runtime-extension-maintenance-controller.cloud.sap/maintenance
label.
The runtime-extension ships with an optional integration for Metal3, which is enabled by the --enable-metal3-maintenance
CLI flag.
It attaches BareMetalHosts
backing machines, which are approved for maintenances, with a reboot annotation.
This integration does not drain nodes.
A Helm chart is available here.