Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fixes linkerd/linkerd2#11073 This fixes the issue of injected pods that cannot acquire proper network config because `linkerd-cni` and/or the cluster's network CNI haven't fully started. They are left in a permanent crash loop and once CNI is ready, they need to be restarted externally, which is what this controller does. This controller "`linkerd-cni-repair-controller`" watches over events on pods in the current node, which have been injected but are in a terminated state and whose `linkerd-network-validator` container exited with code 95, and proceeds to delete them so they can restart with a proper network config. The controller is to be deployed as an additional container in the `linkerd-cni` DaemonSet (addressed in linkerd/linkerd2#11699). This exposes two custom counter metrics: `linkerd_cni_repair_controller_queue_overflow` (in the spirit of the destination controller's `endpoint_updates_queue_overflow`) and `linkerd_cni_repair_controller_deleted`
- Loading branch information