Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about training convergence speed #51

Open
Feynman1999 opened this issue Sep 11, 2023 · 0 comments
Open

about training convergence speed #51

Feynman1999 opened this issue Sep 11, 2023 · 0 comments

Comments

@Feynman1999
Copy link

I have used your code for replication, and the first tens of thousands of loss and validation results are as follows. Is this convergence normal? (training on div2k, val in set5)

validation:
23-09-11 12:14:06.207 - INFO: <epoch: 99, iter:   5,000> psnr: 3.3753e+01.
23-09-11 13:09:13.902 - INFO: <epoch:199, iter:  10,000> psnr: 3.4853e+01.
23-09-11 14:15:48.923 - INFO: <epoch:299, iter:  15,000> psnr: 3.6143e+01.
23-09-11 15:26:26.413 - INFO: <epoch:399, iter:  20,000> psnr: 3.6398e+01.
23-09-11 16:31:23.065 - INFO: <epoch:499, iter:  25,000> psnr: 3.6537e+01.

traing loss:
........
........
........
23-09-11 15:47:40.054 - INFO: <epoch:429, iter:  21,500, lr:2.000e-04> l_forw_fit: 9.0363e+00 l_forw_ce: 7.0932e-01 l_back_rec: 5.3173e+02 
23-09-11 15:49:06.322 - INFO: <epoch:431, iter:  21,600, lr:2.000e-04> l_forw_fit: 1.1463e+01 l_forw_ce: 2.1733e+00 l_back_rec: 4.9017e+02 
23-09-11 15:50:32.917 - INFO: <epoch:433, iter:  21,700, lr:2.000e-04> l_forw_fit: 2.0284e+01 l_forw_ce: 9.2368e-01 l_back_rec: 6.0587e+02 
23-09-11 15:51:58.845 - INFO: <epoch:435, iter:  21,800, lr:2.000e-04> l_forw_fit: 1.0459e+01 l_forw_ce: 1.3879e+00 l_back_rec: 4.6550e+02 
23-09-11 15:53:25.101 - INFO: <epoch:437, iter:  21,900, lr:2.000e-04> l_forw_fit: 2.2528e+01 l_forw_ce: 1.8308e+00 l_back_rec: 7.1213e+02 
23-09-11 15:54:51.292 - INFO: <epoch:439, iter:  22,000, lr:2.000e-04> l_forw_fit: 1.9186e+01 l_forw_ce: 1.1946e+00 l_back_rec: 7.1015e+02 
23-09-11 15:56:17.214 - INFO: <epoch:441, iter:  22,100, lr:2.000e-04> l_forw_fit: 1.4027e+01 l_forw_ce: 1.2276e+00 l_back_rec: 5.7817e+02 
23-09-11 15:57:39.386 - INFO: <epoch:443, iter:  22,200, lr:2.000e-04> l_forw_fit: 1.2232e+01 l_forw_ce: 2.5495e+00 l_back_rec: 5.3580e+02 
23-09-11 15:58:53.436 - INFO: <epoch:445, iter:  22,300, lr:2.000e-04> l_forw_fit: 2.6561e+01 l_forw_ce: 5.2295e+00 l_back_rec: 5.9502e+02 
23-09-11 16:00:08.231 - INFO: <epoch:447, iter:  22,400, lr:2.000e-04> l_forw_fit: 1.1626e+01 l_forw_ce: 6.5276e-01 l_back_rec: 5.2617e+02 
23-09-11 16:01:20.911 - INFO: <epoch:449, iter:  22,500, lr:2.000e-04> l_forw_fit: 3.6728e+01 l_forw_ce: 5.2806e+00 l_back_rec: 7.5190e+02 
23-09-11 16:02:35.347 - INFO: <epoch:451, iter:  22,600, lr:2.000e-04> l_forw_fit: 9.0049e+00 l_forw_ce: 2.3042e+00 l_back_rec: 4.5918e+02 
23-09-11 16:03:49.039 - INFO: <epoch:453, iter:  22,700, lr:2.000e-04> l_forw_fit: 7.4976e+00 l_forw_ce: 1.0088e+00 l_back_rec: 4.1533e+02 
23-09-11 16:05:00.367 - INFO: <epoch:455, iter:  22,800, lr:2.000e-04> l_forw_fit: 1.1508e+01 l_forw_ce: 1.4149e+00 l_back_rec: 5.1731e+02 
23-09-11 16:06:09.921 - INFO: <epoch:457, iter:  22,900, lr:2.000e-04> l_forw_fit: 1.1961e+01 l_forw_ce: 4.2065e+00 l_back_rec: 5.1242e+02 
23-09-11 16:07:24.249 - INFO: <epoch:459, iter:  23,000, lr:2.000e-04> l_forw_fit: 1.2395e+01 l_forw_ce: 4.6911e+00 l_back_rec: 5.2116e+02 
23-09-11 16:08:38.850 - INFO: <epoch:461, iter:  23,100, lr:2.000e-04> l_forw_fit: 2.2654e+01 l_forw_ce: 6.0414e+00 l_back_rec: 6.2512e+02 
23-09-11 16:09:54.473 - INFO: <epoch:463, iter:  23,200, lr:2.000e-04> l_forw_fit: 1.6905e+01 l_forw_ce: 2.5700e+00 l_back_rec: 5.8382e+02 
23-09-11 16:11:12.018 - INFO: <epoch:465, iter:  23,300, lr:2.000e-04> l_forw_fit: 9.7896e+00 l_forw_ce: 2.4648e+00 l_back_rec: 4.5673e+02 
23-09-11 16:12:26.675 - INFO: <epoch:467, iter:  23,400, lr:2.000e-04> l_forw_fit: 2.8638e+01 l_forw_ce: 8.3577e-01 l_back_rec: 7.3611e+02 
23-09-11 16:13:43.065 - INFO: <epoch:469, iter:  23,500, lr:2.000e-04> l_forw_fit: 1.9655e+01 l_forw_ce: 5.6923e+00 l_back_rec: 6.4559e+02 
23-09-11 16:14:59.031 - INFO: <epoch:471, iter:  23,600, lr:2.000e-04> l_forw_fit: 1.7584e+01 l_forw_ce: 2.0713e+00 l_back_rec: 5.8603e+02 
23-09-11 16:16:12.780 - INFO: <epoch:473, iter:  23,700, lr:2.000e-04> l_forw_fit: 2.1075e+01 l_forw_ce: 4.7084e+00 l_back_rec: 6.8721e+02 
23-09-11 16:17:26.316 - INFO: <epoch:475, iter:  23,800, lr:2.000e-04> l_forw_fit: 8.9918e+00 l_forw_ce: 1.4385e+00 l_back_rec: 4.2553e+02 
23-09-11 16:18:39.170 - INFO: <epoch:477, iter:  23,900, lr:2.000e-04> l_forw_fit: 1.8673e+01 l_forw_ce: 2.5744e+00 l_back_rec: 6.7018e+02 
23-09-11 16:19:49.879 - INFO: <epoch:479, iter:  24,000, lr:2.000e-04> l_forw_fit: 8.5514e+00 l_forw_ce: 9.6314e-01 l_back_rec: 5.1129e+02 
23-09-11 16:21:02.620 - INFO: <epoch:481, iter:  24,100, lr:2.000e-04> l_forw_fit: 1.8538e+01 l_forw_ce: 6.1894e+00 l_back_rec: 7.0081e+02 
23-09-11 16:22:12.007 - INFO: <epoch:483, iter:  24,200, lr:2.000e-04> l_forw_fit: 1.4954e+01 l_forw_ce: 4.5996e+00 l_back_rec: 6.5795e+02 
23-09-11 16:23:19.003 - INFO: <epoch:485, iter:  24,300, lr:2.000e-04> l_forw_fit: 3.1993e+01 l_forw_ce: 2.5733e+00 l_back_rec: 6.6076e+02 
23-09-11 16:24:29.014 - INFO: <epoch:487, iter:  24,400, lr:2.000e-04> l_forw_fit: 1.6874e+01 l_forw_ce: 1.7691e+00 l_back_rec: 6.2266e+02 
23-09-11 16:25:37.750 - INFO: <epoch:489, iter:  24,500, lr:2.000e-04> l_forw_fit: 2.6306e+01 l_forw_ce: 3.2614e+00 l_back_rec: 8.1468e+02 
23-09-11 16:26:47.252 - INFO: <epoch:491, iter:  24,600, lr:2.000e-04> l_forw_fit: 2.2743e+01 l_forw_ce: 2.8730e+00 l_back_rec: 6.6198e+02 
23-09-11 16:27:55.281 - INFO: <epoch:493, iter:  24,700, lr:2.000e-04> l_forw_fit: 2.2022e+01 l_forw_ce: 5.1522e+03 l_back_rec: 5.8593e+02 
23-09-11 16:29:04.216 - INFO: <epoch:495, iter:  24,800, lr:2.000e-04> l_forw_fit: 2.1125e+01 l_forw_ce: 3.3240e+00 l_back_rec: 6.7275e+02 
23-09-11 16:30:12.514 - INFO: <epoch:497, iter:  24,900, lr:2.000e-04> l_forw_fit: 1.5764e+01 l_forw_ce: 3.6478e+00 l_back_rec: 6.5580e+02 
23-09-11 16:31:22.468 - INFO: <epoch:499, iter:  25,000, lr:2.000e-04> l_forw_fit: 2.1957e+01 l_forw_ce: 1.2598e+00 l_back_rec: 7.8794e+02 
23-09-11 16:31:23.065 - INFO: # Validation # PSNR: 3.6537e+01.
23-09-11 16:31:23.066 - INFO: Saving models and training states.
23-09-11 16:32:37.636 - INFO: <epoch:501, iter:  25,100, lr:2.000e-04> l_forw_fit: 1.5851e+01 l_forw_ce: 1.7870e+01 l_back_rec: 5.9740e+02 
23-09-11 16:33:47.202 - INFO: <epoch:503, iter:  25,200, lr:2.000e-04> l_forw_fit: 1.5613e+01 l_forw_ce: 1.1140e+00 l_back_rec: 5.6695e+02 
23-09-11 16:34:56.355 - INFO: <epoch:505, iter:  25,300, lr:2.000e-04> l_forw_fit: 1.6678e+01 l_forw_ce: 6.1646e+00 l_back_rec: 6.0877e+02 
23-09-11 16:36:08.002 - INFO: <epoch:507, iter:  25,400, lr:2.000e-04> l_forw_fit: 1.0693e+01 l_forw_ce: 5.5600e+00 l_back_rec: 5.3143e+02 
23-09-11 16:37:17.161 - INFO: <epoch:509, iter:  25,500, lr:2.000e-04> l_forw_fit: 1.8000e+01 l_forw_ce: 1.3387e+01 l_back_rec: 7.1700e+02 
23-09-11 16:38:27.538 - INFO: <epoch:511, iter:  25,600, lr:2.000e-04> l_forw_fit: 5.8641e+01 l_forw_ce: 7.0964e+00 l_back_rec: 9.0976e+02 
23-09-11 16:39:35.403 - INFO: <epoch:513, iter:  25,700, lr:2.000e-04> l_forw_fit: 1.5490e+01 l_forw_ce: 9.3917e-01 l_back_rec: 5.7312e+02 
23-09-11 16:40:45.120 - INFO: <epoch:515, iter:  25,800, lr:2.000e-04> l_forw_fit: 8.2399e+00 l_forw_ce: 6.9752e-01 l_back_rec: 4.5810e+02 
23-09-11 16:41:53.473 - INFO: <epoch:517, iter:  25,900, lr:2.000e-04> l_forw_fit: 1.1270e+01 l_forw_ce: 1.1326e+00 l_back_rec: 5.1678e+02 
23-09-11 16:43:02.744 - INFO: <epoch:519, iter:  26,000, lr:2.000e-04> l_forw_fit: 9.7699e+00 l_forw_ce: 7.1436e-01 l_back_rec: 4.6954e+02 
23-09-11 16:44:11.669 - INFO: <epoch:521, iter:  26,100, lr:2.000e-04> l_forw_fit: 2.2217e+01 l_forw_ce: 2.3241e+00 l_back_rec: 6.2240e+02 
23-09-11 16:45:20.023 - INFO: <epoch:523, iter:  26,200, lr:2.000e-04> l_forw_fit: 1.6940e+01 l_forw_ce: 1.0100e+00 l_back_rec: 6.1535e+02 
23-09-11 16:46:27.886 - INFO: <epoch:525, iter:  26,300, lr:2.000e-04> l_forw_fit: 2.4195e+01 l_forw_ce: 2.7161e+00 l_back_rec: 6.4196e+02 

The final PSNR is expected to reach 39.7, why do I feel that the convergence was not very good at the beginning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant