-
Notifications
You must be signed in to change notification settings - Fork 145
Machine: Marlowe
Pinku Surana edited this page Dec 18, 2024
·
1 revision
Marlowe is a NVIDIA DGX H100 SuperPOD managed by Stanford's Data Science dept.
The DGX uses Infiniband networking. GASNet should be configured to use the IBV conduit. The default settings should work. When running multiple ranks on a single node, there may be some improvement by mapping ports to specific HCAs. These settings seem to work, but YMMV:
export GASNET_IBV_LIST_PORTS=1 # Print status of all detected HCAs
export GASNET_IBV_PORTS_TYPE=HRank # Process's host-relative rank
export GASNET_IBV_PORTS_0=mlx5_0
export GASNET_IBV_PORTS_1=mlx5_3
export GASNET_IBV_PORTS_2=mlx5_4
export GASNET_IBV_PORTS_3=mlx5_5
export GASNET_IBV_PORTS_4=mlx5_6
export GASNET_IBV_PORTS_5=mlx5_9
export GASNET_IBV_PORTS_6=mlx5_10
export GASNET_IBV_PORTS_7=mlx5_11
The mlx5_* numbers come from this table.