You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently rented a cloud service with 4 NVIDIA RTX 4090 GPUs, aiming to leverage them for training models using your chargpt.py script. However, I encountered an issue where the script seems to utilize only the memory of a single GPU (24GB), which is insufficient for my training requirements.
Given the potential of multi-GPU training to significantly reduce training time and handle larger models or datasets, I'm interested in modifying chargpt.py to support multi-GPU parallel training. Could you provide guidance or suggestions on how to achieve this? Specifically, I'm looking for advice on integrating PyTorch's DataParallel or DistributedDataParallel functionalities into the script.
I appreciate any help or pointers you can provide. Thank you for your time and for the great work on the minGPT project.
Best regards
The text was updated successfully, but these errors were encountered:
Hello minGPT Team,
I recently rented a cloud service with 4 NVIDIA RTX 4090 GPUs, aiming to leverage them for training models using your chargpt.py script. However, I encountered an issue where the script seems to utilize only the memory of a single GPU (24GB), which is insufficient for my training requirements.
Given the potential of multi-GPU training to significantly reduce training time and handle larger models or datasets, I'm interested in modifying chargpt.py to support multi-GPU parallel training. Could you provide guidance or suggestions on how to achieve this? Specifically, I'm looking for advice on integrating PyTorch's
DataParallel
orDistributedDataParallel
functionalities into the script.I appreciate any help or pointers you can provide. Thank you for your time and for the great work on the minGPT project.
Best regards
The text was updated successfully, but these errors were encountered: