对于train.crl和train.mtcl我不太懂是什么样的结果才算是运行完成？ #15

YYrgb · 2024-12-26T09:44:08Z

我在运行train.crl时，max-epoch为500，我看了对应论文，但是我还是不太明白到什么时候适合停止运行。
同样的我在运行train.mtcl时，运行时间比较久，最上面显示fold 1 epoch 0，下面显示进度条准确率，一直再缓慢提高，我不知道我做的这样是对吗？

SeongjuLee · 2024-12-27T09:41:42Z

Thank you for your interest in our work!
In the CRL stage, early stopping with a patience of 20 is applied. This means the training is terminated if there is no decrease in validation loss within 20 epochs.

Regarding your second question, I anticipate the following possible situations:

You might have conducted MTCL before finishing the CRL stage. The MTCL stage should only be conducted after the CRL stage is completely finished (i.e., after training all folds).
It seems that the training is being conducted on a CPU, I guess.
Please share some screenshots; they will be helpful for identifying and resolving the issues.

YYrgb · 2024-12-27T13:13:17Z

首先感谢您的回复，您说的是对的，是在cpu上运行，对于crl我仅仅运行到保存了进行第1折训练的结果，然后我就运行mtcl，也仅仅是运行到保存第1折的网络训练参数。可能因为我用的是cpu，而且我把batch_size缩小了一倍，我完成第1折的运行已经耗费至少24小时。下面的图片是我运行mtcl的结果，其中utils里的stty，我在windows系统下无法使用，我将里面的代码替换为使用tqdm库显示进度。第一张是刚开始运行mtcl时打印的一些信息，第二张图片是第1折训练完后，开始验证。第三张是验证环节结束，保存的model。下面又重新开始训练，我不知道这是开始第二个epoch了嘛？

再次感谢您一开始能回复我，谢谢您！

YYrgb · 2024-12-27T13:13:41Z

YYrgb · 2024-12-27T13:13:51Z

SeongjuLee · 2024-12-30T05:35:46Z

The second and third images are broken. Can you re-upload the second and third screenshot?

YYrgb · 2024-12-30T05:58:34Z

再次感谢您的回复，我将重新上传第二张，第三张图片

SeongjuLee · 2024-12-31T07:51:37Z

In our work, we used iteration-based train loop, which means that the validation is conducted at every N training iteration (not epoch). Please refer to our paper and "val_period" variable in config.
In your case, the "val_preiod" is set to 500 and the batch size is 16. Therefore, the validation step is processed at 8000th training iteration (500*16).
Anyway, it seems that there's no issue in training and validation.

YYrgb · 2024-12-31T08:56:54Z

好的，非常感谢您的回复！祝您万事如意！

SeongjuLee · 2025-01-02T09:21:50Z

Thanks. Happy new year!

SeongjuLee closed this as completed Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

对于train.crl和train.mtcl我不太懂是什么样的结果才算是运行完成？ #15

对于train.crl和train.mtcl我不太懂是什么样的结果才算是运行完成？ #15

YYrgb commented Dec 26, 2024

SeongjuLee commented Dec 27, 2024

YYrgb commented Dec 27, 2024

YYrgb commented Dec 27, 2024

YYrgb commented Dec 27, 2024

SeongjuLee commented Dec 30, 2024

YYrgb commented Dec 30, 2024

SeongjuLee commented Dec 31, 2024 •

edited

Loading

YYrgb commented Dec 31, 2024

SeongjuLee commented Jan 2, 2025

对于train.crl和train.mtcl我不太懂是什么样的结果才算是运行完成？ #15

对于train.crl和train.mtcl我不太懂是什么样的结果才算是运行完成？ #15

Comments

YYrgb commented Dec 26, 2024

SeongjuLee commented Dec 27, 2024

YYrgb commented Dec 27, 2024

YYrgb commented Dec 27, 2024

YYrgb commented Dec 27, 2024

SeongjuLee commented Dec 30, 2024

YYrgb commented Dec 30, 2024

SeongjuLee commented Dec 31, 2024 • edited Loading

YYrgb commented Dec 31, 2024

SeongjuLee commented Jan 2, 2025

SeongjuLee commented Dec 31, 2024 •

edited

Loading