From ed65537184656597a9b52c7584de6b12811b4ffa Mon Sep 17 00:00:00 2001 From: Richard Wall Date: Tue, 12 Mar 2024 19:30:53 +0000 Subject: [PATCH 1/2] Explain why to use priorityClassName: system-cluster-critical in production Signed-off-by: Richard Wall --- .spelling | 1 + content/docs/installation/best-practice.md | 100 ++++++++++++++++++ .../best-practice/values.best-practice.yaml | 2 + 3 files changed, 103 insertions(+) diff --git a/.spelling b/.spelling index 56bcae48472..001da11806f 100644 --- a/.spelling +++ b/.spelling @@ -288,6 +288,7 @@ YAMLs accessors acme-dns ad-hoc +add-ons allowlist alrs analyse diff --git a/content/docs/installation/best-practice.md b/content/docs/installation/best-practice.md index 49064ed226d..b5de2e05a59 100644 --- a/content/docs/installation/best-practice.md +++ b/content/docs/installation/best-practice.md @@ -327,6 +327,106 @@ cainjector: > You must increase the `replicaCount` of each Deployment to more than the `minAvailable` value, > otherwise the PodDisruptionBudget will prevent you from draining cert-manager Pods. +### Priority Class Name + +The Kubernetes blog: [Protect Your Mission-Critical Pods From Eviction With `PriorityClass`](https://kubernetes.io/blog/2023/01/12/protect-mission-critical-pods-priorityclass/) says: +> Pod priority and preemption help to make sure that mission-critical pods are up in the event of a resource crunch by deciding order of scheduling and eviction. + +You should treat the cert-manager pods as mission-critical, +because cert-manager provides an API to the applications running on your platform. +Therefore, you need to protect the cert-manager Pods from [preemption](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#preemption), if a Kubernetes node becomes starved of resources. +Other lower priority Pods on that node should be preempted before the cert-manager Pods. +You can achieve this by setting the `priorityClassName` of the cert-manager Pods to a high priority Priority Class. + +Consider an application on your platform with a Helm chart which includes a cert-manager Certificate resource. +It has a Pod that mounts the desired TLS serving certificate into the application Pod. +The cert-manager webhook service must be available before you deploy the Helm chart, +because otherwise the Certificate resource in the Helm chart will not be accepted by the K8S API server. +The cert-manager controller must be running, +because only when it has reconciled the Certificate resource will the Secret be created, +and only when the Secret has been created will the application Pod be able to start up. + +#### Which `priorityClassName` should I use? + +Most Kubernetes clusters will come with two builtin priority class names: +`system-cluster-critical` and `system-node-critical`, +which are used for Kubernetes core components, [which can also be used for critical add-ons](https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/). + +We recommend that you use `priorityClassName: system-cluster-critical` for cert-manager, +because it is a critical add-on. +Here are the Helm chart values: + +```yaml +global: + priorityClassName: system-cluster-critical +``` + +On some clusters the [`ResourceQuota` admission controller](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#resourcequota) may be configured to [limit the use of certain priority classes to certain namespaces](https://kubernetes.io/docs/concepts/policy/resource-quotas/#limit-priority-class-consumption-by-default). +For example, on a GKE cluster, by default, you will only be allowed to use `priorityClassName: system-cluster-critical` for Pods in the `kube-system` namespace, because that namespace contains a `ResourceQuota` called `gcp-critical-pods`: + +```sh +$ kubectl get resourcequota -n kube-system gcp-critical-pods -o yaml +``` + +```yaml +apiVersion: v1 +kind: ResourceQuota +metadata: + labels: + addonmanager.kubernetes.io/mode: Reconcile + name: gcp-critical-pods + namespace: kube-system +spec: + hard: + pods: 1G + scopeSelector: + matchExpressions: + - operator: In + scopeName: PriorityClass + values: + - system-node-critical + - system-cluster-critical +``` + +> 📖 Read [Kubernetes PR #93121](https://github.com/kubernetes/kubernetes/pull/93121) to see how and why this was implemented. + +To use `priorityClassName: system-cluster-critical` in the `cert-manager` namespace you will need to create a similar `ResourceQuota`. +Here's an example: + +```yaml +# cert-manager-resourcequota.yaml +apiVersion: v1 +kind: ResourceQuota +metadata: + name: cert-manager-critical-pods + namespace: cert-manager +spec: + hard: + pods: 1G + scopeSelector: + matchExpressions: + - operator: In + scopeName: PriorityClass + values: + - system-node-critical + - system-cluster-critical +``` + +```sh +kubectl apply -f cert-manager-resourcequota.yaml +``` + +> 📖 Read [Protect Your Mission-Critical Pods From Eviction With `PriorityClass`](https://kubernetes.io/blog/2023/01/12/protect-mission-critical-pods-priorityclass/), a Kubernetes blog post about how Pod priority and preemption help to make sure that mission-critical pods are up in the event of a resource crunch by deciding order of scheduling and eviction. +> +> 📖 Read [Guaranteed Scheduling For Critical Add-On Pods](https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/) to learn why `system-cluster-critical` should be used for add-ons that are critical to a fully functional cluster. +> +> 📖 Read [Limit Priority Class consumption by default](https://kubernetes.io/docs/concepts/policy/resource-quotas/#limit-priority-class-consumption-by-default), to learn why platform administrators might restrict usage of certain high priority classes to a limited number of namespaces. +> +> 📖 Some examples of other critical add-ons that use the `system-cluster-critical` priority class name: +> [NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/google-gke.html), +> [OPA Gatekeeper](https://github.com/open-policy-agent/gatekeeper/pull/1282), +> [Cilium](https://github.com/cilium/cilium/pull/13878). + ## Scalability cert-manager has three long-running components: controller, cainjector, and webhook. diff --git a/public/docs/installation/best-practice/values.best-practice.yaml b/public/docs/installation/best-practice/values.best-practice.yaml index 7a545118724..448a0c9432d 100644 --- a/public/docs/installation/best-practice/values.best-practice.yaml +++ b/public/docs/installation/best-practice/values.best-practice.yaml @@ -4,6 +4,8 @@ # # Read the rationale for these values in: # * https://cert-manager.io/docs/installation/best-practice/ +global: + priorityClassName: system-cluster-critical replicaCount: 2 podDisruptionBudget: From 22da1ab3c8f18d0102b9d447a23d8a88ff0c029f Mon Sep 17 00:00:00 2001 From: Richard Wall Date: Fri, 15 Mar 2024 13:37:47 +0000 Subject: [PATCH 2/2] Trying to make the directions clearer and less prescriptive Signed-off-by: Richard Wall --- content/docs/installation/best-practice.md | 61 +++++----------------- 1 file changed, 14 insertions(+), 47 deletions(-) diff --git a/content/docs/installation/best-practice.md b/content/docs/installation/best-practice.md index b5de2e05a59..20a65fe1975 100644 --- a/content/docs/installation/best-practice.md +++ b/content/docs/installation/best-practice.md @@ -329,32 +329,23 @@ cainjector: ### Priority Class Name -The Kubernetes blog: [Protect Your Mission-Critical Pods From Eviction With `PriorityClass`](https://kubernetes.io/blog/2023/01/12/protect-mission-critical-pods-priorityclass/) says: +The reason for setting a priority class is summarized as follows in the Kubernetes blog [Protect Your Mission-Critical Pods From Eviction With `PriorityClass`](https://kubernetes.io/blog/2023/01/12/protect-mission-critical-pods-priorityclass/): > Pod priority and preemption help to make sure that mission-critical pods are up in the event of a resource crunch by deciding order of scheduling and eviction. -You should treat the cert-manager pods as mission-critical, -because cert-manager provides an API to the applications running on your platform. -Therefore, you need to protect the cert-manager Pods from [preemption](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#preemption), if a Kubernetes node becomes starved of resources. -Other lower priority Pods on that node should be preempted before the cert-manager Pods. -You can achieve this by setting the `priorityClassName` of the cert-manager Pods to a high priority Priority Class. - -Consider an application on your platform with a Helm chart which includes a cert-manager Certificate resource. -It has a Pod that mounts the desired TLS serving certificate into the application Pod. -The cert-manager webhook service must be available before you deploy the Helm chart, -because otherwise the Certificate resource in the Helm chart will not be accepted by the K8S API server. -The cert-manager controller must be running, -because only when it has reconciled the Certificate resource will the Secret be created, -and only when the Secret has been created will the application Pod be able to start up. - -#### Which `priorityClassName` should I use? +If cert-manager is mission-critical to your platform, +then set a `priorityClassName` on the cert-manager Pods +to protect them from [preemption](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#preemption), +in situations where a Kubernetes node becomes starved of resources. +Without a `priorityClassName` the cert-manager Pods may be evicted to free up resources for other Pods, +and this may cause disruption to any applications that rely on cert-manager. Most Kubernetes clusters will come with two builtin priority class names: `system-cluster-critical` and `system-node-critical`, -which are used for Kubernetes core components, [which can also be used for critical add-ons](https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/). +which are used for Kubernetes core components. +These [can also be used for critical add-ons](https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/), +such as cert-manager. -We recommend that you use `priorityClassName: system-cluster-critical` for cert-manager, -because it is a critical add-on. -Here are the Helm chart values: +We recommend using the following Helm chart values to set `priorityClassName: system-cluster-critical`, for all cert-manager Pods: ```yaml global: @@ -362,36 +353,12 @@ global: ``` On some clusters the [`ResourceQuota` admission controller](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#resourcequota) may be configured to [limit the use of certain priority classes to certain namespaces](https://kubernetes.io/docs/concepts/policy/resource-quotas/#limit-priority-class-consumption-by-default). -For example, on a GKE cluster, by default, you will only be allowed to use `priorityClassName: system-cluster-critical` for Pods in the `kube-system` namespace, because that namespace contains a `ResourceQuota` called `gcp-critical-pods`: - -```sh -$ kubectl get resourcequota -n kube-system gcp-critical-pods -o yaml -``` - -```yaml -apiVersion: v1 -kind: ResourceQuota -metadata: - labels: - addonmanager.kubernetes.io/mode: Reconcile - name: gcp-critical-pods - namespace: kube-system -spec: - hard: - pods: 1G - scopeSelector: - matchExpressions: - - operator: In - scopeName: PriorityClass - values: - - system-node-critical - - system-cluster-critical -``` +For example, Google Kubernetes Engine (GKE) will only allow `priorityClassName: system-cluster-critical` for Pods in the `kube-system` namespace, +by default. > 📖 Read [Kubernetes PR #93121](https://github.com/kubernetes/kubernetes/pull/93121) to see how and why this was implemented. -To use `priorityClassName: system-cluster-critical` in the `cert-manager` namespace you will need to create a similar `ResourceQuota`. -Here's an example: +In such cases you will need to create a `ResourceQuota` in the `cert-manager` namespace: ```yaml # cert-manager-resourcequota.yaml