From 9d70a3acbdaae3436884e9f5b132a297324c910e Mon Sep 17 00:00:00 2001 From: g-pan Date: Wed, 30 Aug 2023 19:17:21 -0400 Subject: [PATCH] HPCC-27610 Update Container Placements Documentation Signed-off-by: g-pan --- .../ContainerizedMods/ConfigureValues.xml | 723 ++++++++++++++---- 1 file changed, 566 insertions(+), 157 deletions(-) diff --git a/docs/EN_US/ContainerizedHPCC/ContainerizedMods/ConfigureValues.xml b/docs/EN_US/ContainerizedHPCC/ContainerizedMods/ConfigureValues.xml index a8d5d801290..1d5a087b178 100644 --- a/docs/EN_US/ContainerizedHPCC/ContainerizedMods/ConfigureValues.xml +++ b/docs/EN_US/ContainerizedHPCC/ContainerizedMods/ConfigureValues.xml @@ -1072,212 +1072,621 @@ thor: components such as Thor have multiple resources. The manager, worker, eclagent components all have different resource requirements. + - - Taints, Tolerations, and Placements + + Environment Values - This is an important consideration for containerized systems. - Taints and Tolerations are types of Kubernetes node constraints also - referred to by Node Affinity. Node - affinity is a way to constrain pods to nodes. Only one "affinity" can - be applied to a pod. If a pod matches multiple placement 'pods' lists, - then only the last "affinity" definition will apply. + You can define environment variables in a YAML file. The + environment values are defined under the global.env + portion of the provided HPCC Systems values.yaml file. These values are + specified as a list of name value pairs as illustrated below. - Taints and tolerations work together to ensure that pods are not - scheduled onto inappropriate nodes. Tolerations are applied to pods, - and allow (but do not require) the pods to schedule onto nodes with - matching taints. Taints are the opposite -- they allow a node to repel - a set of pods. + global: + env: + - name: SMTPserver + value: mysmtpserver - For example, Thor workers should all be on the appropriate type - of VM. If a big Thor job comes along – then the taints level comes - into play. + The global.env section of the supplied values.yaml file adds + default environment variables for all components. You can also specify + environment variables for the individual components. Refer to the schema + for setting this value for individual components. - For more information and examples of our Taints, Tolerations, - and Placements please review our developer documentation: + To add environment values you can insert them into your + customization configuration YAML file when you deploy your containerized + HPCC Systems. - https://github.com/hpcc-systems/HPCC-Platform/blob/master/helm/hpcc/docs/placements.md + + Environment Variables for Containerized Systems + + There are several settings in environment.conf for bare-metal + systems. While many environment.conf settings are not valid for + containers, some can be useful. In a cloud deployment, these settings + are inherited from environment variables. These environment variables + are configurable using the values yaml either globally, or at the + component level. + + Some of those variables are available for container and cloud + deployments and can be set using the Helm chart. The following + bare-metal environment.conf values have these equivalent values which + can be set for containerized instances. + + + + + + Environment.conf + Value + + Helm Environment + Variable + + + + skipPythonCleanup + + SKIP_PYTHON_CLEANUP + + + + jvmlibpath + + JAVA_LIBRARY_PATH + + + + jvmoptions + + JVM_OPTIONS + + + + classpath + + CLASSPATH + + + + + + The following example sets the environment variable to skip + Python cleanup on the Thor component: + + thor: + env: + - name: SKIP_PYTHON_CLEANUP + value: true + + + + + Index Build Plane + + Define the indexBuildPlane value as a helm + chart option to allow index files to be written by default to a + different data plane. Unlike flat files, index files have different + requirements. The index files benefit from quick random access storage. + Ordinarily flat files and index files are output to the defined default + data plane(s). Using this option you can define that index files are + built on a separate data plane from other common files. This chart value + can be supplied at a component or global level. + + For example, adding the value to a global level under + globlal.storage : + + global: + storage: + indexBuildPlane: myindexplane - - Placements + Optionally, you could add it at the component level, as + follows: - The Placement is responsible for finding the best node for a - pod. Most often placement is handled automatically by Kubernetes. - You can constrain a Pod so that it can only run on particular set of - Nodes. Using placements you can configure the Kubernetes scheduler - to use a "pods" list to apply settings to pods. For example: + thor: +- name: thor + prefix: thor + numWorkers: 2 + maxJobs: 4 + maxGraphs: 2 + indexBuildPlane: myindexplane - placements: + When this value is set at the component level it would override + the value set at the global level. + + + + + Pods and Nodes + + One of the key features of Kubernetes is its ability to schedule + pods on to nodes in the cluster. A pod is the smallest and simplest unit + in the Kubernetes environment that you can create or deploy. A node is + either a physical or virtual "worker" machine in Kubernetes. + + The task of scheduling pods to specific nodes in the cluster is + handled by the kube-scheduler. The default behavior of this component is + to filter nodes based on the resource requests and limits of each + container in the created pod. Feasible nodes are then scored to find the + best candidate for the pod placement. The scheduler also takes into + account other factors such as pod affinity and anti-affinity, taints and + tolerations, pod topology spread constraints, and the node selector + labels. The scheduler can be configured to use these different algorithms + and policies to optimize the pod placement according to your cluster’s + needs. + + You can deploy these values either using the values.yaml file or you + can place into a file and have Kubernetes instead read the values from the + supplied file. See the above section Customization + Techniques for more information about customizing your + deployment. + + + Placements + + Placements is a term used by HPCC Systems, which Kubernetes refers + to as the scheduler or scheduling/assigning. In order to avoid confusion + within the HPCC Systems and ECL specific scheduler terms, refer to + Kubernetes scheduling as placements. Placements are a value in an HPCC + Systems configuration which is at a level above items, such as the + nodeSelector, Toleration, Affinity and Anti-Affinity, and + TopologySpreadConstraints. + + The placement is responsible for finding the best node for a pod. + Most often placement is handled automatically by Kubernetes. You can + constrain a Pod so that it can only run on particular set of + Nodes. + + Placements would then be used to ensure that pods or jobs that + want nodes with specific characteristics are placed on those + nodes. + + For instance a Thor cluster could be targeted for machine learning + using nodes with a GPU. Another job may want nodes with a good amount + more memory or another for more CPU. + + Using placements you can configure the Kubernetes scheduler to use + a "pods" list to apply settings to pods. + + For example: + + placements: - pods: [list] placement: <supported configurations> - The pods: [list] can contain a variety of items. + + Placement Scope - - - HPCC Systems component types, using the prefix - type: this can be: dali, esp, eclagent, eclccserver, - roxie, thor. For example "type:esp" - + Use pod patterns to apply the scope for the placements. - - Target; the name of an array item from the above types - using prefix "target:" For example "target:roxie" or - "target:thor". - + The pods: [list] item can contain one of the following: - - Pod, "Deployment" metadata name from the name of the array - item of a type. For example, "eclwatch", "mydali", - "thor-thoragent" - + + + - - Job name regular expression: For example "compile-" or - "compile-." or exact match "^compile-.$" - + - - All: to apply for all HPCC Systems components. The default - placements for pods we deliver is [all] - - - - Placements – in Kubernetes - the Placement concept allows you to spread your pods across types of - nodes with particular characteristics. Placements would be used to - ensure that pods or jobs that want nodes with specific - characteristics are placed on them. - - For instance a Thor cluster could be targeted for machine - learning using nodes with a GPU. Another job may want nodes with a - good amount more memory or another for more CPU. You can use - placements to ensure that pods with specific requirements are placed - on appropriate nodes. + + + Type: <component> + + Covers all pods/jobs under this type of component. This + is commonly used for HPCC Systems components. For example, the + type:thor which will apply to any of the + Thor type components; thoragent, thormanager, thoragent and + thorworker, etc. + + + + Target: <name> + + The "name" field of each component, typical usage for + HPCC Systems components referrs to the cluster name. For + example Roxie: -name: roxie which will be + the "Roxie" target (cluster). You can also define multiple + targets with each having a unique name such as "roxie", + "roxie2", "roxie-web" etc. + + + + Pod: <name> + + This is the "Deployment" metadata name from the name of + the array item of a type. For example, "eclwatch-", "mydali-", + "thor-thoragent" using a regular expression is preferred since + Kubernetes will use the metadata name as a prefix and + dynamically generate the pod name such as, + eclwatch-7f4dd4dd44cb-c0w3x. + + + + Job name: + + The job name is typically a regular expression as well, + since the job name is generated dynamically. For example, a + compile job compile-54eB67e567e, could use "compile-" or + "compile-.*" or "^compile-.*$" + + + + All: + + Applies for all HPCC Systems components. The default + placements for pods delivered is [all] + + + + + + Regardless of the order the placements appear in the + configuration, they will be processed in the following order: "all", + "type", "target", and then "pod"/"job". + + + Mixed combinations + + NodeSelector, taints and tolerations, and other values can all + be placed on the same pods: [list] both per zone and per node on + Azure placements: +- pods: ["eclwatch","roxie-workunit","^compile-.*$","mydali"] + placement: + nodeSelector: + name: npone + - - Environment Values + + Node Selection - You can define environment variables in a YAML file. The - environment values are defined under the - global.env portion of the provided HPCC Systems - values.yaml file. These values are specified as a list of name value - pairs as illustrated below. + In a Kubernetes container environment, there are several ways to + schedule your nodes. The recommended approaches all use label selectors + to facilitate the selection. Generally, you may not need to set such + constraints; as the scheduler usually does reasonably acceptable + placement automatically. However, with some deployments you may want + more control over specific pods. - global: - env: - - name: SMTPserver - value: mysmtpserver + Kubernetes uses the following methods to choose where to schedule + pods: - The global.env section of the supplied values.yaml file adds - default environment variables for all components. You can also specify - environment variables for the individual components. Refer to the - schema for setting this value for individual components. + + + nodeSelector field matching against node labels + - To add environment values you can insert them into your - customization configuration YAML file when you deploy your - containerized HPCC Systems. + + Affinity and anti-affinity + - - Environment Variables for Containerized Systems + + Taints and Tolerations + - There are several settings in environment.conf for bare-metal - systems. While many environment.conf settings are not valid for - containers, some can be useful. In a cloud deployment, these - settings are inherited from environment variables. These environment - variables are configurable using the values yaml either globally, or - at the component level. + + nodeName field + - Some of those variables are available for container and cloud - deployments and can be set using the Helm chart. The following - bare-metal environment.conf values have these equivalent values - which can be set for containerized instances. + + Pod topology spread constraints + + - - - - - Environment.conf - Value + + Node Labels - Helm Environment - Variable - + Kubernetes nodes have labels. Kubernetes has a standard set of + labels used for nodes in a cluster. You can also manually attach + labels which is recommended as the value of these labels is + cloud-provider specific and not guaranteed to be reliable. + + Adding labels to nodes allows you to schedule pods to nodes or + groups of nodes. You can then use this functionality to ensure that + specific pods only run on nodes with certain properties. + - - skipPythonCleanup + + The nodeSelector + + The nodeSelector is a field in the Pod specification that allows + you to specify a set of node labels that must be present on the target + node for the Pod to be scheduled there. It is the simplest form of + node selection constraint. It selects nodes based on the labels, but + it has some limitations. It only supports one label key and one label + value. If you wanted to match multiple labels or use more complex + expressions, you need to use node Affinity. + + Add the nodeSelector field to your pod specification and specify + the node labels you want the target node to have. You must have the + node labels defined in the job and pod. Then you need to specify each + node group the node label to use. Kubernetes only schedules the pod + onto nodes that have the labels you specify. + + The following example shows the nodeSelector placed in the pods + list. This example schedules "all" HPCC components to use the node + pool with the label group: "hpcc". + + placements: + - pods: ["all"] + placement: + nodeSelector: + group: "hpcc" + + Note: The label group:hpcc + matches the node pool label:hpcc. + + This next example shows how to configure a node pool to prevent + scheduling a Dali component onto this node pool labelled with the key + spot: via the value false. As this kind of node is not always + available and could get revoked therefore you would not want to use + the spot node pool for Dali components. This is an example for how to + configure a specific type (Dali) of HPCC Systems component not to use + a particular node pool. + + placements: + - pods: ["type:dali"] + placement: + nodeSelector: + spot: "false" - SKIP_PYTHON_CLEANUP - + When using nodeSelector, multiple nodeSelectors can be applied. + If duplicate keys are defined, only the last one prevails. + - - jvmlibpath + + Taints and Tolerations - JAVA_LIBRARY_PATH - + Taints and Tolerations are types of Kubernetes node constraints + also referred to by node Affinity. Only one affinity can be applied to + a pod. If a pod matches multiple placement 'pods' lists, then only the + last affinity definition will apply. - - jvmoptions + Taints and tolerations work together to ensure that pods are not + scheduled onto inappropriate nodes. Tolerations are applied to pods, + and allow (but do not require) the pods to schedule onto nodes with + matching taints. Taints are the opposite -- they allow a node to repel + a set of pods. One way to deploy using taints, is to set to repel all + but a specific node. Then that pod can be scheduled onto another node + that is tolerate. - JVM_OPTIONS - + For example, Thor workers should all be on the appropriate type + of VM. If a big Thor job comes along – then the taints level repels + any pods that attempt to be scheduled onto a node that does not meet + the requirements. - - classpath + For more information and examples of our Taints, Tolerations, + and Placements please review our developer documentation: - CLASSPATH - - - - + https://github.com/hpcc-systems/HPCC-Platform/blob/master/helm/hpcc/docs/placements.md - The following example sets the environment variable to skip - Python cleanup on the Thor component: + + Taints and Tolerations Examples + + The following examples illustrate how some taints and + tolerations can be applied. + + Kubernetes can schedule a pod on to any node pool without a + taint. In the following examples Kubernetes can only schedule the + two components to the node pools with these exact labels, group and + gpu. + + placements: + - pods: ["all"] + tolerations: + key: "group" + operator: "Equal" + value: "hpcc" + effect: "NoSchedule" + + placements: + - pods: ["type:thor"] + tolerations: + key: "gpu" + operator: "Equal" + value: "true" + effect: "NoSchedule" + + Multiple tolerations can also be used. The following example + has two tolerations, group and gpu. + + #The settings will be applied to all thor pods/jobs and myeclccserver pod and job +- pods: ["thorworker-", "thor-thoragent", "thormanager-","thor-eclagent","hthor-"] + placement: + nodeSelector: + app: tf-gpu + tolerations: + - key: "group" + operator: "Equal" + value: "hpcc" + effect: "NoSchedule" + - key: "gpu" + operator: "Equal" + value: "true" + effect: "NoSchedule" + - thor: - env: - - name: SKIP_PYTHON_CLEANUP - value: true + In this example the nodeSelector is preventing the Kubernetes + scheduler from deploying any/all to this node pool. Without taints + the scheduler could deploy to any pods onto the node pool. By + utilizing the nodeSelector, the taint will force the pod to deploy + only to the pods who match that node label. There are two + constraints then, in this example one from the node pool and the + other from the pod. - - Index Build Plane + + Topology Spread Constraints - Define the indexBuildPlane value as a helm - chart option to allow index files to be written by default to a - different data plane. Unlike flat files, index files have different - requirements. The index files benefit from quick random access - storage. Ordinarily flat files and index files are output to the - defined default data plane(s). Using this option you can define that - index files are built on a separate data plane from other common - files. This chart value can be supplied at a component or global - level. + You can use topology spread constraints to control how pods are + spread across your cluster among failure-domains such as regions, + zones, nodes, and other user-defined topology domains. This can help + to achieve high availability as well as efficient resource + utilization. You can set cluster-level constraints as a default, or + configure topology spread constraints for individual workloads. The + Topology Spread Constraints topologySpreadConstraints requires Kubernetes + v1.19+.or better. - For example, adding the value to a global level under - globlal.storage : + For more information see: - global: - storage: - indexBuildPlane: myindexplane + https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/ + and - Optionally, you could add it at the component level, as - follows: + https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/ - thor: -- name: thor - prefix: thor - numWorkers: 2 - maxJobs: 4 - maxGraphs: 2 - indexBuildPlane: myindexplane + Using the "topologySpreadConstraints" example, there are two + node pools which have "hpcc=nodepool1" and "hpcc=nodepool2" + respectively. The Roxie pods will be evenly scheduled on the two node + pools. + + After deployment you can verify by issuing the following + command: + + kubectl get pod -o wide | grep roxie + + The placements code: + + - pods: ["type:roxie"] + placement: + topologySpreadConstraints: + - maxSkew: 1 + topologyKey: hpcc + whenUnsatisfiable: ScheduleAnyway + labelSelector: + matchLabels: + roxie-cluster: "roxie" + + + + Affinity and Anti-Affinity` + + Affinity and anti-affinity expands the types of constraints that + you can define. The affinity and anti-affinity rules are still based + on the labels. In addition to the labels, they provide rules that + guide Kubernetes’ scheduler where to place pods based on specific + criteria. The affinity/anti-affinity language is more expressive than + simple labels and gives you more control over the selection + logic. + + The are two main kinds of affinity, Node Affinity and Pod + Affinity. + + + Node Affinity + + Node affinity is similar to the nodeSelector concept that + allows you to constrain which nodes your pod can be scheduled onto + based on the node labels. These are used to constrain the nodes that + can receive a pod by matching labels of those nodes. Node affinity + and anti-affinity can only be used to set positive affinities that + attract pods to the node. These are used to constrain the nodes that + can receive a pod by matching labels to those nodes. Node affinity + and anti-affinity can only be used to set positive affinities that + attract pods to the node. + + There is no schema check for the content of affinity. Only one + affinity can be applied to a pod or job. If a pod/job matches + multiple placement pods lists, then only the last affinity + definition applies. + + For more information, see https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/ + + There are two types of node affinity: + + requiredDuringSchedulingIgnoredDuringExecution: + The scheduler can't schedule the pod unless this rule is met. This + function is similar to the nodeSelector, but with a more expressive + syntax. + + preferredDuringSchedulingIgnoredDuringExecution: + The scheduler tries to find a node that meets the rule. If a + matching node is not available, the scheduler still schedules the + pod. + + You can specify node affinities using the + .spec.affinity.nodeAffinity field in your pod + spec. + - When this value is set at the component level it would override - the value set at the global level. + + Pod Affinity + + Pod affinity or Inter-Pod Affinity is used to constrain the + nodes that can receive a pod by matching the labels of the existing + pods already running on to those nodes. Pod affinity and + anti-affinity can be either an attracting affinity or a repelling + anti-affinity. + + Inter-Pod Affinity works very similarly to Node Affinity but + have some important differences. The "hard" and "soft" modes are + indicated using the same + requiredDuringSchedulingIgnoredDuringExecution + and + preferredDuringSchedulingIgnoredDuringExecution + fields. However, these should be nested under the + spec.affinity.podAffinity or + spec.affinity.podAntiAffinity fields depending + on whether you want to increase or reduce the Pod's affinity. + + + + Affinity Example + + The following code illustrates an example of affinity: + + - pods: ["thorworker-.*"] + placement: + affinity: + nodeAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + nodeSelectorTerms: + - matchExpressions: + - key: kubernetes.io/e2e-az-name + operator: In + values: + - e2e-az1 + - e2e-az2 + + In the following schedulerName section the, the "affinity" + settings can also be included with that example. + + Note: The "affinity" value in + the "schedulerName" field is only supported in Kubernetes 1.20.0 + beta and later versions. + + + + + schedulerName + + The schedulerName field + specifies the name of the scheduler that is responsible for scheduling + a pod or a task. In Kubernetes, you can configure multiple schedulers + with different names and profiles to run simultaneously in the + cluster. + + Only one "schedulerName" can be applied to any pod/job. + + A schedulerName example: + + - pods: ["target:roxie"] + placement: + schedulerName: "my-scheduler" +#The settings will be applied to all thor pods/jobs and myeclccserver pod and job +- pods: ["target:myeclccserver", "type:thor"] + placement: + nodeSelector: + app: "tf-gpu" + tolerations: + - key: "gpu" + operator: "Equal" + value: "true" + effect: "NoSchedule" +