You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After cluster is provisioned, run bootstrap.sh and choose: 6) rhoai-stable-2.13-aws-gpu
The GPU Operator in the ArgoCD is constantly going out of sync every couple of seconds:
Looking at the sync error, we see that Argo is trying to add these lines:
But the GPU Operator seems to be removing them.
Also, the monitoring console in OpenShift appears to be broken in this release:
It's probably just a new update within the NVIDIA Operator that must be incorporated into our kustomize templates in this project. Source code for the lines that will not synchronize are at:
Making a note that this failed twice in a row with 2 brand new 4.17 clusters, but appears to not be an issue in earlier versions such as 4.15 and 4.16 during enablement testing.
When building a brand new cluster using these settings:
bootstrap.sh
and choose:6) rhoai-stable-2.13-aws-gpu
The GPU Operator in the ArgoCD is constantly going out of sync every couple of seconds:
Looking at the sync error, we see that Argo is trying to add these lines:
But the GPU Operator seems to be removing them.
Also, the monitoring console in OpenShift appears to be broken in this release:
It's probably just a new update within the NVIDIA Operator that must be incorporated into our kustomize templates in this project. Source code for the lines that will not synchronize are at:
ai-accelerator/components/operators/gpu-operator-certified/operator/components/console-plugin/consoleplugin.yaml
Line 17 in 39a2cf4
The text was updated successfully, but these errors were encountered: