You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use librimix dataset to traing DCCRN by 8gpus
I open early stop in conf
I find the model always stop in very early stage like 10 or 20 epochs
In the log, I find, the val loss is caculated by diffierent gpus and early stop is implemented only by gpu 0, which I think is the reason to very early stop, the log is as follows:
[rank: 5] Metric val_loss improved by 0.433 >= min_delta = 0.0. New best score: -11.178
[rank: 0] Metric val_loss improved by 0.333 >= min_delta = 0.0. New best score: -11.104
[rank: 7] Metric val_loss improved by 0.530 >= min_delta = 0.0. New best score: -10.551
[rank: 4] Metric val_loss improved by 0.408 >= min_delta = 0.0. New best score: -10.931
[rank: 1] Metric val_loss improved by 0.287 >= min_delta = 0.0. New best score: -10.971
[rank: 3] Metric val_loss improved by 0.415 >= min_delta = 0.0. New best score: -11.321
[rank: 2] Metric val_loss improved by 0.418 >= min_delta = 0.0. New best score: -10.858
[rank: 6] Metric val_loss improved by 0.504 >= min_delta = 0.0. New best score: -11.375
Epoch 2, global step 1587: 'val_loss' reached -11.10351 (best -11.10351),
The text was updated successfully, but these errors were encountered:
I use librimix dataset to traing DCCRN by 8gpus
I open early stop in conf
I find the model always stop in very early stage like 10 or 20 epochs
In the log, I find, the val loss is caculated by diffierent gpus and early stop is implemented only by gpu 0, which I think is the reason to very early stop, the log is as follows:
[rank: 5] Metric val_loss improved by 0.433 >= min_delta = 0.0. New best score: -11.178
[rank: 0] Metric val_loss improved by 0.333 >= min_delta = 0.0. New best score: -11.104
[rank: 7] Metric val_loss improved by 0.530 >= min_delta = 0.0. New best score: -10.551
[rank: 4] Metric val_loss improved by 0.408 >= min_delta = 0.0. New best score: -10.931
[rank: 1] Metric val_loss improved by 0.287 >= min_delta = 0.0. New best score: -10.971
[rank: 3] Metric val_loss improved by 0.415 >= min_delta = 0.0. New best score: -11.321
[rank: 2] Metric val_loss improved by 0.418 >= min_delta = 0.0. New best score: -10.858
[rank: 6] Metric val_loss improved by 0.504 >= min_delta = 0.0. New best score: -11.375
Epoch 2, global step 1587: 'val_loss' reached -11.10351 (best -11.10351),
The text was updated successfully, but these errors were encountered: