You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The problem I am currently facing is that I am unable to train with multiple cards on a single machine. Due to the abandonment of torch. distributed. launch, I have attempted CUDA_ VISIBLE_ DIVICES=0,1,2,3 Python - m torch. distributed. run -- nnodes 1-- nproc_ Per_ Node 4 train.py -- config configs/demo. yaml; Torchrun train.py -- config configs/demo.yaml and other training commands cannot be trained, and there is no relevant log information output. Therefore, I would like to ask you for advice on how to solve this problem. Thank you very much and look forward to your reply. Thank you again.
The text was updated successfully, but these errors were encountered:
Dear author, hello
Thank you very much for sharing these codes.
The problem I am currently facing is that I am unable to train with multiple cards on a single machine. Due to the abandonment of torch. distributed. launch, I have attempted CUDA_ VISIBLE_ DIVICES=0,1,2,3 Python - m torch. distributed. run -- nnodes 1-- nproc_ Per_ Node 4 train.py -- config configs/demo. yaml; Torchrun train.py -- config configs/demo.yaml and other training commands cannot be trained, and there is no relevant log information output. Therefore, I would like to ask you for advice on how to solve this problem. Thank you very much and look forward to your reply. Thank you again.
The text was updated successfully, but these errors were encountered: