About Training Epochs & GPU Memory #112

user20421 · 2024-11-19T12:31:40Z

Thank you very much for your excellent work.I would like to build on your work and try some new ideas. However, I am currently encountering some issues. Could you please provide me with some suggestions or solutions? I would be very grateful for your help.
1.When I try to reproduce the YOLOv-S and YOLOv++-S models on a single 3080Ti GPU following your configuration, I sometimes encounter a "CUDA out of memory" error. Using nvidia-smi, I noticed that the GPU memory usage fluctuates significantly, which is uncommon in typical deep learning scenarios. Could this behavior be related to the multi-scale training mode or EMA training mode?
2.Regarding the first issue, when I attempt multi-GPU training, the program gets stuck and eventually throws a timeout error. Is there any way to enable single-machine multi-GPU training?
3.In your YOLOv-S experiments, the maximum epoch is set to 7. I noticed that the loss remains high near the end of the training. I tried increasing the maximum epoch to 14, but the loss is still quite large.Does having a high loss significantly impact the training results? Should I train for more epochs or reduce the learning rate to improve the performance?

YuHengsss · 2024-11-20T02:41:10Z

For the first question, the GPU memory usage is related to the candidate proposal numbers, you may set the maximum limitation as a hyper-parameter in (300 will be ok for 11G GPU memory):

YOLOV/exps/yolov++/v++_base_decoupleReg_2x.py

Line 44 in 5f069b2

self.maximal_limit = 0

For the second question, due to the limited computational cost and my limited coding ability when I try Video Object Detection, the multi-GPU training is not well supported.

For the third question, there are some losses belonging to the base detector that is not optimized (see here). In my experience on ImageNet VID dataset, the best performance could be obtained in three or four epochs and will decrease then.

user20421 · 2024-11-21T03:15:50Z

Thank you very much for your suggestions and for the outstanding work you have provided.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Training Epochs & GPU Memory #112

About Training Epochs & GPU Memory #112

user20421 commented Nov 19, 2024

YuHengsss commented Nov 20, 2024

user20421 commented Nov 21, 2024

About Training Epochs & GPU Memory #112

About Training Epochs & GPU Memory #112

Comments

user20421 commented Nov 19, 2024

YuHengsss commented Nov 20, 2024

user20421 commented Nov 21, 2024