The model cannot be trained with multiple cards on a single machine #57

langge52 · 2023-11-03T07:35:18Z

Dear author, hello

Thank you very much for sharing these codes.

The problem I am currently facing is that I am unable to train with multiple cards on a single machine. Due to the abandonment of torch. distributed. launch, I have attempted CUDA_ VISIBLE_ DIVICES=0,1,2,3 Python - m torch. distributed. run -- nnodes 1-- nproc_ Per_ Node 4 train.py -- config configs/demo. yaml; Torchrun train.py -- config configs/demo.yaml and other training commands cannot be trained, and there is no relevant log information output. Therefore, I would like to ask you for advice on how to solve this problem. Thank you very much and look forward to your reply. Thank you again.

langge52 closed this as completed Nov 3, 2023

langge52 reopened this Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The model cannot be trained with multiple cards on a single machine #57

The model cannot be trained with multiple cards on a single machine #57

langge52 commented Nov 3, 2023

The model cannot be trained with multiple cards on a single machine #57

The model cannot be trained with multiple cards on a single machine #57

Comments

langge52 commented Nov 3, 2023