You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!
I have a 2-node setup equipped with several H100 GPUs, and I’m working on LLM training within OpenShift. The nodes’ MLNX interfaces are connected directly to each other (back-to-back), without a switch in between. While this configuration works on RHEL, it’s currently not supported on OpenShift/Kubernetes via the NVIDIA network operator (Mellanox Technologies MT28908 Family [ConnectX-6]). I would greatly appreciate support for this setup. As a workaround, I'm attempting to create my own OpenSM DaemonSet, but this is not an ideal or customer-friendly solution.
The text was updated successfully, but these errors were encountered:
Hello!
I have a 2-node setup equipped with several H100 GPUs, and I’m working on LLM training within OpenShift. The nodes’ MLNX interfaces are connected directly to each other (back-to-back), without a switch in between. While this configuration works on RHEL, it’s currently not supported on OpenShift/Kubernetes via the NVIDIA network operator (Mellanox Technologies MT28908 Family [ConnectX-6]). I would greatly appreciate support for this setup. As a workaround, I'm attempting to create my own OpenSM DaemonSet, but this is not an ideal or customer-friendly solution.
The text was updated successfully, but these errors were encountered: