NERC Maintenance (Taint nerc-ocp-prod GPUs and add acceleratorProfiles) - Jan 7 #849
Open
3 of 4 tasks
Labels
documentation
Improvements or additions to documentation
gpu
openshift
This issue pertains to NERC OpenShift
rhoai
RHOAI
Motivation
To prevent general non-GPU workloads from scheduling on nodes with GPUs we will be tainting the nerc-ocp-prod GPU nodes. This is important because users are billed on a per host basis, so currently user workloads can be unintentionally scheduled on GPU nodes even when no GPU resources are allocated, resulting in being billed for GPU usage. Additionally, this keeps the GPUs clear of all workloads not explicitly requesting GPU resources.
As a result of adding taints to the GPU nodes, we will also be adding accelerators to allow RHOAI users who have GPUs allocated, to select which tainted GPU they would like to land on (eg. A100 or V100).
We will also fix the "None" acceleratorProfile behavior in: issue
Completion Criteria
During the Jan 7 maintenance window, nerc-ocp-prod GPU nodes are tainted and acceleratorProfiles are added to the nerc-ocp-prod cluster RHOAI installation.
Description
Completion dates
Required - 2025-01-07
The text was updated successfully, but these errors were encountered: