Support for Multi-GPU Parallel Training in chargpt.py #130

JinXiaofeng1234 · 2024-02-06T08:06:00Z

Hello minGPT Team,

I recently rented a cloud service with 4 NVIDIA RTX 4090 GPUs, aiming to leverage them for training models using your chargpt.py script. However, I encountered an issue where the script seems to utilize only the memory of a single GPU (24GB), which is insufficient for my training requirements.

Given the potential of multi-GPU training to significantly reduce training time and handle larger models or datasets, I'm interested in modifying chargpt.py to support multi-GPU parallel training. Could you provide guidance or suggestions on how to achieve this? Specifically, I'm looking for advice on integrating PyTorch's DataParallel or DistributedDataParallel functionalities into the script.

I appreciate any help or pointers you can provide. Thank you for your time and for the great work on the minGPT project.

Best regards

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Multi-GPU Parallel Training in chargpt.py #130

Support for Multi-GPU Parallel Training in chargpt.py #130

JinXiaofeng1234 commented Feb 6, 2024

Support for Multi-GPU Parallel Training in chargpt.py #130

Support for Multi-GPU Parallel Training in chargpt.py #130

Comments

JinXiaofeng1234 commented Feb 6, 2024