We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I encountered a problem of gradient explosion, which is detailed below. I checked the input and found no Inf or NaN. Please help me! optimizer settings: {'lr': 5.46875e-05, 'weight_decay': 0.0, 'eps': 1e-08, 'betas': [0.9, 0.999]} Use step level LR scheduler! Set warmup steps = 86135 Set warmup steps = 0 Max WD = 0.0500000, Min WD = 0.0500000 criterion = SoftTargetCrossEntropy() Auto resume checkpoint: Start training for 100 epochs NaN or Inf found in input tensor. NaN or Inf found in input tensor. NaN or Inf found in input tensor. Epoch: [0] [ 0/17227] eta: 3 days, 8:24:48 lr: 0.000000 min_lr: 0.000000 loss: 5.9916 (5.9916) loss_scale: 32768.0000 (32768.0000) weight_decay: 0.0500 (0.0500) grad_norm: inf (inf) time: 16.8043 data: 14.7567 max mem: 38837 Epoch: [0] [ 10/17227] eta: 14:30:57 lr: 0.000000 min_lr: 0.000000 loss: 5.9916 (5.9915) loss_scale: 16384.0000 (22341.8182) weight_decay: 0.0500 (0.0500) grad_norm: 6.5337 (inf) time: 3.0352 data: 2.0506 max mem: 39504 Epoch: [0] [ 20/17227] eta: 11:46:58 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 16384.0000 (19504.7619) weight_decay: 0.0500 (0.0500) grad_norm: 6.5185 (inf) time: 1.7482 data: 0.9025 max mem: 39504 Epoch: [0] [ 30/17227] eta: 10:44:42 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 16384.0000 (18233.8065) weight_decay: 0.0500 (0.0500) grad_norm: 6.5185 (inf) time: 1.8172 data: 1.0061 max mem: 39504 Epoch: [0] [ 40/17227] eta: 10:42:47 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 8192.0000 (15784.5854) weight_decay: 0.0500 (0.0500) grad_norm: 6.5201 (inf) time: 2.0117 data: 1.2029 max mem: 39504 Epoch: [0] [ 50/17227] eta: 9:55:55 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 8192.0000 (14295.8431) weight_decay: 0.0500 (0.0500) grad_norm: 6.5203 (inf) time: 1.8215 data: 1.0110 max mem: 39504 Epoch: [0] [ 60/17227] eta: 9:22:57 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 8192.0000 (13295.2131) weight_decay: 0.0500 (0.0500) grad_norm: 6.5211 (inf) time: 1.4009 data: 0.5881 max mem: 39504 Epoch: [0] [ 70/17227] eta: 9:24:00 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9915) loss_scale: 8192.0000 (12576.4507) weight_decay: 0.0500 (0.0500) grad_norm: 6.5454 (inf) time: 1.6939 data: 0.8822 max mem: 39504 Epoch: [0] [ 80/17227] eta: 9:10:05 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9915) loss_scale: 8192.0000 (12035.1605) weight_decay: 0.0500 (0.0500) grad_norm: 6.5321 (inf) time: 1.7946 data: 0.9825 max mem: 39504 Epoch: [0] [ 90/17227] eta: 8:57:07 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 8192.0000 (11612.8352) weight_decay: 0.0500 (0.0500) grad_norm: 6.5304 (inf) time: 1.5547 data: 0.7444 max mem: 39504 Epoch: [0] [ 100/17227] eta: 8:41:19 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 8192.0000 (11274.1386) weight_decay: 0.0500 (0.0500) grad_norm: 6.5197 (inf) time: 1.4273 data: 0.6158 max mem: 39504 Epoch: [0] [ 110/17227] eta: 8:49:36 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9915) loss_scale: 8192.0000 (10996.4685) weight_decay: 0.0500 (0.0500) grad_norm: 6.5197 (inf) time: 1.7466 data: 0.9347 max mem: 39504 Epoch: [0] [ 120/17227] eta: 8:39:56 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9915) loss_scale: 8192.0000 (10764.6942) weight_decay: 0.0500 (0.0500) grad_norm: 6.5323 (inf) time: 1.8099 data: 1.0032 max mem: 39504 Epoch: [0] [ 130/17227] eta: 8:39:45 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9915) loss_scale: 8192.0000 (10568.3053) weight_decay: 0.0500 (0.0500) grad_norm: 6.5418 (inf) time: 1.6443 data: 0.8360 max mem: 39504 Epoch: [0] [ 140/17227] eta: 8:38:59 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9915) loss_scale: 8192.0000 (10399.7730) weight_decay: 0.0500 (0.0500) grad_norm: 6.5378 (inf) time: 1.8150 data: 1.0048 max mem: 39504 Epoch: [0] [ 150/17227] eta: 8:31:00 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (10253.5629) weight_decay: 0.0500 (0.0500) grad_norm: 6.5324 (inf) time: 1.6078 data: 0.7990 max mem: 39504 Epoch: [0] [ 160/17227] eta: 8:27:28 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (10125.5155) weight_decay: 0.0500 (0.0500) grad_norm: 6.5337 (inf) time: 1.5137 data: 0.7066 max mem: 39504 Epoch: [0] [ 170/17227] eta: 8:23:23 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (10012.4444) weight_decay: 0.0500 (0.0500) grad_norm: 6.5281 (inf) time: 1.5844 data: 0.7751 max mem: 39504 Epoch: [0] [ 180/17227] eta: 8:13:24 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9911.8674) weight_decay: 0.0500 (0.0500) grad_norm: 6.5229 (inf) time: 1.3548 data: 0.5438 max mem: 39504 Epoch: [0] [ 190/17227] eta: 8:13:37 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9821.8220) weight_decay: 0.0500 (0.0500) grad_norm: 6.5277 (inf) time: 1.4622 data: 0.6526 max mem: 39504 Epoch: [0] [ 200/17227] eta: 8:13:39 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9740.7363) weight_decay: 0.0500 (0.0500) grad_norm: 6.5277 (inf) time: 1.7664 data: 0.9588 max mem: 39504 Epoch: [0] [ 210/17227] eta: 8:11:11 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9667.3365) weight_decay: 0.0500 (0.0500) grad_norm: 6.5234 (inf) time: 1.6697 data: 0.8599 max mem: 39504 Epoch: [0] [ 220/17227] eta: 8:06:56 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9600.5792) weight_decay: 0.0500 (0.0500) grad_norm: 6.5281 (inf) time: 1.4999 data: 0.6853 max mem: 39504 Epoch: [0] [ 230/17227] eta: 8:12:38 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9539.6017) weight_decay: 0.0500 (0.0500) grad_norm: 6.5281 (inf) time: 1.8146 data: 0.9983 max mem: 39504 Epoch: [0] [ 240/17227] eta: 8:10:12 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9483.6846) weight_decay: 0.0500 (0.0500) grad_norm: 6.5209 (inf) time: 1.8819 data: 1.0682 max mem: 39504 Epoch: [0] [ 250/17227] eta: 8:12:05 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9432.2231) weight_decay: 0.0500 (0.0500) grad_norm: 6.5226 (inf) time: 1.7399 data: 0.9249 max mem: 39504 Epoch: [0] [ 260/17227] eta: 8:09:03 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9384.7050) weight_decay: 0.0500 (0.0500) grad_norm: 6.5342 (inf) time: 1.7050 data: 0.8913 max mem: 39504 Epoch: [0] [ 270/17227] eta: 8:05:59 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9340.6937) weight_decay: 0.0500 (0.0500) grad_norm: 6.5350 (inf) time: 1.4748 data: 0.6655 max mem: 39504 Epoch: [0] [ 280/17227] eta: 8:06:40 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9299.8149) weight_decay: 0.0500 (0.0500) grad_norm: 6.5294 (inf) time: 1.6392 data: 0.8288 max mem: 39504 Epoch: [0] [ 290/17227] eta: 8:01:43 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9261.7457) weight_decay: 0.0500 (0.0500) grad_norm: 6.5184 (inf) time: 1.5291 data: 0.7187 max mem: 39504 Epoch: [0] [ 300/17227] eta: 8:03:57 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9226.2060) weight_decay: 0.0500 (0.0500) grad_norm: 6.5322 (inf) time: 1.6084 data: 0.7993 max mem: 39504 Epoch: [0] [ 310/17227] eta: 8:00:20 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9192.9518) weight_decay: 0.0500 (0.0500) grad_norm: 6.5246 (inf) time: 1.6614 data: 0.8532 max mem: 39504 Epoch: [0] [ 320/17227] eta: 8:01:15 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9161.7695) weight_decay: 0.0500 (0.0500) grad_norm: 6.5191 (inf) time: 1.5946 data: 0.7818 max mem: 39504 Epoch: [0] [ 330/17227] eta: 8:01:08 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9132.4713) weight_decay: 0.0500 (0.0500) grad_norm: 6.5284 (inf) time: 1.7839 data: 0.9708 max mem: 39504 Epoch: [0] [ 340/17227] eta: 7:59:40 lr: 0.000000 min_lr: 0.000000 loss: 5.9912 (5.9914) loss_scale: 8192.0000 (9104.8915) weight_decay: 0.0500 (0.0500) grad_norm: 6.5283 (inf) time: 1.6464 data: 0.8362 max mem: 39504 Epoch: [0] [ 350/17227] eta: 7:59:37 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9078.8832) weight_decay: 0.0500 (0.0500) grad_norm: 6.5145 (inf) time: 1.6499 data: 0.8373 max mem: 39504 Epoch: [0] [ 360/17227] eta: 7:55:51 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9054.3158) weight_decay: 0.0500 (0.0500) grad_norm: 6.5145 (inf) time: 1.4962 data: 0.6837 max mem: 39504 Epoch: [0] [ 370/17227] eta: 7:54:49 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9031.0728) weight_decay: 0.0500 (0.0500) grad_norm: 6.5290 (inf) time: 1.4254 data: 0.6160 max mem: 39504 Epoch: [0] [ 380/17227] eta: 7:53:45 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9009.0499) weight_decay: 0.0500 (0.0500) grad_norm: 6.5321 (inf) time: 1.5879 data: 0.7780 max mem: 39504 Epoch: [0] [ 390/17227] eta: 7:55:09 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (8988.1535) weight_decay: 0.0500 (0.0500) grad_norm: 6.5472 (inf) time: 1.7530 data: 0.9418 max mem: 39504 Epoch: [0] [ 400/17227] eta: 7:54:21 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (8968.2993) weight_decay: 0.0500 (0.0500) grad_norm: 6.5198 (inf) time: 1.7701 data: 0.9575 max mem: 39504 Epoch: [0] [ 410/17227] eta: 7:51:48 lr: 0.000000 min_lr: 0.000000 loss: 5.9912 (5.9914) loss_scale: 8192.0000 (8949.4112) weight_decay: 0.0500 (0.0500) grad_norm: 6.5082 (inf) time: 1.4883 data: 0.6721 max mem: 39504 Epoch: [0] [ 420/17227] eta: 7:52:38 lr: 0.000000 min_lr: 0.000000 loss: 5.9912 (5.9914) loss_scale: 8192.0000 (8931.4204) weight_decay: 0.0500 (0.0500) grad_norm: 6.5235 (inf) time: 1.6048 data: 0.7914 max mem: 39504 Epoch: [0] [ 430/17227] eta: 7:52:53 lr: 0.000000 min_lr: 0.000000 loss: 5.9912 (5.9914) loss_scale: 8192.0000 (8914.2645) weight_decay: 0.0500 (0.0500) grad_norm: 6.5302 (inf) time: 1.8107 data: 0.9991 max mem: 39504 Epoch: [0] [ 440/17227] eta: 7:50:55 lr: 0.000000 min_lr: 0.000000 loss: 5.9911 (5.9914) loss_scale: 8192.0000 (8897.8866) weight_decay: 0.0500 (0.0500) grad_norm: 6.5264 (inf) time: 1.5970 data: 0.7861 max mem: 39504 Epoch: [0] [ 450/17227] eta: 7:49:04 lr: 0.000000 min_lr: 0.000000 loss: 5.9911 (5.9914) loss_scale: 8192.0000 (8882.2350) weight_decay: 0.0500 (0.0500) grad_norm: 6.5222 (inf) time: 1.4259 data: 0.6114 max mem: 39504
The text was updated successfully, but these errors were encountered:
No branches or pull requests
I encountered a problem of gradient explosion, which is detailed below. I checked the input and found no Inf or NaN. Please help me!
optimizer settings: {'lr': 5.46875e-05, 'weight_decay': 0.0, 'eps': 1e-08, 'betas': [0.9, 0.999]}
Use step level LR scheduler!
Set warmup steps = 86135
Set warmup steps = 0
Max WD = 0.0500000, Min WD = 0.0500000
criterion = SoftTargetCrossEntropy()
Auto resume checkpoint:
Start training for 100 epochs
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
Epoch: [0] [ 0/17227] eta: 3 days, 8:24:48 lr: 0.000000 min_lr: 0.000000 loss: 5.9916 (5.9916) loss_scale: 32768.0000 (32768.0000) weight_decay: 0.0500 (0.0500) grad_norm: inf (inf) time: 16.8043 data: 14.7567 max mem: 38837
Epoch: [0] [ 10/17227] eta: 14:30:57 lr: 0.000000 min_lr: 0.000000 loss: 5.9916 (5.9915) loss_scale: 16384.0000 (22341.8182) weight_decay: 0.0500 (0.0500) grad_norm: 6.5337 (inf) time: 3.0352 data: 2.0506 max mem: 39504
Epoch: [0] [ 20/17227] eta: 11:46:58 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 16384.0000 (19504.7619) weight_decay: 0.0500 (0.0500) grad_norm: 6.5185 (inf) time: 1.7482 data: 0.9025 max mem: 39504
Epoch: [0] [ 30/17227] eta: 10:44:42 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 16384.0000 (18233.8065) weight_decay: 0.0500 (0.0500) grad_norm: 6.5185 (inf) time: 1.8172 data: 1.0061 max mem: 39504
Epoch: [0] [ 40/17227] eta: 10:42:47 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 8192.0000 (15784.5854) weight_decay: 0.0500 (0.0500) grad_norm: 6.5201 (inf) time: 2.0117 data: 1.2029 max mem: 39504
Epoch: [0] [ 50/17227] eta: 9:55:55 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 8192.0000 (14295.8431) weight_decay: 0.0500 (0.0500) grad_norm: 6.5203 (inf) time: 1.8215 data: 1.0110 max mem: 39504
Epoch: [0] [ 60/17227] eta: 9:22:57 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 8192.0000 (13295.2131) weight_decay: 0.0500 (0.0500) grad_norm: 6.5211 (inf) time: 1.4009 data: 0.5881 max mem: 39504
Epoch: [0] [ 70/17227] eta: 9:24:00 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9915) loss_scale: 8192.0000 (12576.4507) weight_decay: 0.0500 (0.0500) grad_norm: 6.5454 (inf) time: 1.6939 data: 0.8822 max mem: 39504
Epoch: [0] [ 80/17227] eta: 9:10:05 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9915) loss_scale: 8192.0000 (12035.1605) weight_decay: 0.0500 (0.0500) grad_norm: 6.5321 (inf) time: 1.7946 data: 0.9825 max mem: 39504
Epoch: [0] [ 90/17227] eta: 8:57:07 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 8192.0000 (11612.8352) weight_decay: 0.0500 (0.0500) grad_norm: 6.5304 (inf) time: 1.5547 data: 0.7444 max mem: 39504
Epoch: [0] [ 100/17227] eta: 8:41:19 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 8192.0000 (11274.1386) weight_decay: 0.0500 (0.0500) grad_norm: 6.5197 (inf) time: 1.4273 data: 0.6158 max mem: 39504
Epoch: [0] [ 110/17227] eta: 8:49:36 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9915) loss_scale: 8192.0000 (10996.4685) weight_decay: 0.0500 (0.0500) grad_norm: 6.5197 (inf) time: 1.7466 data: 0.9347 max mem: 39504
Epoch: [0] [ 120/17227] eta: 8:39:56 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9915) loss_scale: 8192.0000 (10764.6942) weight_decay: 0.0500 (0.0500) grad_norm: 6.5323 (inf) time: 1.8099 data: 1.0032 max mem: 39504
Epoch: [0] [ 130/17227] eta: 8:39:45 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9915) loss_scale: 8192.0000 (10568.3053) weight_decay: 0.0500 (0.0500) grad_norm: 6.5418 (inf) time: 1.6443 data: 0.8360 max mem: 39504
Epoch: [0] [ 140/17227] eta: 8:38:59 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9915) loss_scale: 8192.0000 (10399.7730) weight_decay: 0.0500 (0.0500) grad_norm: 6.5378 (inf) time: 1.8150 data: 1.0048 max mem: 39504
Epoch: [0] [ 150/17227] eta: 8:31:00 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (10253.5629) weight_decay: 0.0500 (0.0500) grad_norm: 6.5324 (inf) time: 1.6078 data: 0.7990 max mem: 39504
Epoch: [0] [ 160/17227] eta: 8:27:28 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (10125.5155) weight_decay: 0.0500 (0.0500) grad_norm: 6.5337 (inf) time: 1.5137 data: 0.7066 max mem: 39504
Epoch: [0] [ 170/17227] eta: 8:23:23 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (10012.4444) weight_decay: 0.0500 (0.0500) grad_norm: 6.5281 (inf) time: 1.5844 data: 0.7751 max mem: 39504
Epoch: [0] [ 180/17227] eta: 8:13:24 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9911.8674) weight_decay: 0.0500 (0.0500) grad_norm: 6.5229 (inf) time: 1.3548 data: 0.5438 max mem: 39504
Epoch: [0] [ 190/17227] eta: 8:13:37 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9821.8220) weight_decay: 0.0500 (0.0500) grad_norm: 6.5277 (inf) time: 1.4622 data: 0.6526 max mem: 39504
Epoch: [0] [ 200/17227] eta: 8:13:39 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9740.7363) weight_decay: 0.0500 (0.0500) grad_norm: 6.5277 (inf) time: 1.7664 data: 0.9588 max mem: 39504
Epoch: [0] [ 210/17227] eta: 8:11:11 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9667.3365) weight_decay: 0.0500 (0.0500) grad_norm: 6.5234 (inf) time: 1.6697 data: 0.8599 max mem: 39504
Epoch: [0] [ 220/17227] eta: 8:06:56 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9600.5792) weight_decay: 0.0500 (0.0500) grad_norm: 6.5281 (inf) time: 1.4999 data: 0.6853 max mem: 39504
Epoch: [0] [ 230/17227] eta: 8:12:38 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9539.6017) weight_decay: 0.0500 (0.0500) grad_norm: 6.5281 (inf) time: 1.8146 data: 0.9983 max mem: 39504
Epoch: [0] [ 240/17227] eta: 8:10:12 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9483.6846) weight_decay: 0.0500 (0.0500) grad_norm: 6.5209 (inf) time: 1.8819 data: 1.0682 max mem: 39504
Epoch: [0] [ 250/17227] eta: 8:12:05 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9432.2231) weight_decay: 0.0500 (0.0500) grad_norm: 6.5226 (inf) time: 1.7399 data: 0.9249 max mem: 39504
Epoch: [0] [ 260/17227] eta: 8:09:03 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9384.7050) weight_decay: 0.0500 (0.0500) grad_norm: 6.5342 (inf) time: 1.7050 data: 0.8913 max mem: 39504
Epoch: [0] [ 270/17227] eta: 8:05:59 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9340.6937) weight_decay: 0.0500 (0.0500) grad_norm: 6.5350 (inf) time: 1.4748 data: 0.6655 max mem: 39504
Epoch: [0] [ 280/17227] eta: 8:06:40 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9299.8149) weight_decay: 0.0500 (0.0500) grad_norm: 6.5294 (inf) time: 1.6392 data: 0.8288 max mem: 39504
Epoch: [0] [ 290/17227] eta: 8:01:43 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9261.7457) weight_decay: 0.0500 (0.0500) grad_norm: 6.5184 (inf) time: 1.5291 data: 0.7187 max mem: 39504
Epoch: [0] [ 300/17227] eta: 8:03:57 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9226.2060) weight_decay: 0.0500 (0.0500) grad_norm: 6.5322 (inf) time: 1.6084 data: 0.7993 max mem: 39504
Epoch: [0] [ 310/17227] eta: 8:00:20 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9192.9518) weight_decay: 0.0500 (0.0500) grad_norm: 6.5246 (inf) time: 1.6614 data: 0.8532 max mem: 39504
Epoch: [0] [ 320/17227] eta: 8:01:15 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9161.7695) weight_decay: 0.0500 (0.0500) grad_norm: 6.5191 (inf) time: 1.5946 data: 0.7818 max mem: 39504
Epoch: [0] [ 330/17227] eta: 8:01:08 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9132.4713) weight_decay: 0.0500 (0.0500) grad_norm: 6.5284 (inf) time: 1.7839 data: 0.9708 max mem: 39504
Epoch: [0] [ 340/17227] eta: 7:59:40 lr: 0.000000 min_lr: 0.000000 loss: 5.9912 (5.9914) loss_scale: 8192.0000 (9104.8915) weight_decay: 0.0500 (0.0500) grad_norm: 6.5283 (inf) time: 1.6464 data: 0.8362 max mem: 39504
Epoch: [0] [ 350/17227] eta: 7:59:37 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9078.8832) weight_decay: 0.0500 (0.0500) grad_norm: 6.5145 (inf) time: 1.6499 data: 0.8373 max mem: 39504
Epoch: [0] [ 360/17227] eta: 7:55:51 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9054.3158) weight_decay: 0.0500 (0.0500) grad_norm: 6.5145 (inf) time: 1.4962 data: 0.6837 max mem: 39504
Epoch: [0] [ 370/17227] eta: 7:54:49 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9031.0728) weight_decay: 0.0500 (0.0500) grad_norm: 6.5290 (inf) time: 1.4254 data: 0.6160 max mem: 39504
Epoch: [0] [ 380/17227] eta: 7:53:45 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9009.0499) weight_decay: 0.0500 (0.0500) grad_norm: 6.5321 (inf) time: 1.5879 data: 0.7780 max mem: 39504
Epoch: [0] [ 390/17227] eta: 7:55:09 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (8988.1535) weight_decay: 0.0500 (0.0500) grad_norm: 6.5472 (inf) time: 1.7530 data: 0.9418 max mem: 39504
Epoch: [0] [ 400/17227] eta: 7:54:21 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (8968.2993) weight_decay: 0.0500 (0.0500) grad_norm: 6.5198 (inf) time: 1.7701 data: 0.9575 max mem: 39504
Epoch: [0] [ 410/17227] eta: 7:51:48 lr: 0.000000 min_lr: 0.000000 loss: 5.9912 (5.9914) loss_scale: 8192.0000 (8949.4112) weight_decay: 0.0500 (0.0500) grad_norm: 6.5082 (inf) time: 1.4883 data: 0.6721 max mem: 39504
Epoch: [0] [ 420/17227] eta: 7:52:38 lr: 0.000000 min_lr: 0.000000 loss: 5.9912 (5.9914) loss_scale: 8192.0000 (8931.4204) weight_decay: 0.0500 (0.0500) grad_norm: 6.5235 (inf) time: 1.6048 data: 0.7914 max mem: 39504
Epoch: [0] [ 430/17227] eta: 7:52:53 lr: 0.000000 min_lr: 0.000000 loss: 5.9912 (5.9914) loss_scale: 8192.0000 (8914.2645) weight_decay: 0.0500 (0.0500) grad_norm: 6.5302 (inf) time: 1.8107 data: 0.9991 max mem: 39504
Epoch: [0] [ 440/17227] eta: 7:50:55 lr: 0.000000 min_lr: 0.000000 loss: 5.9911 (5.9914) loss_scale: 8192.0000 (8897.8866) weight_decay: 0.0500 (0.0500) grad_norm: 6.5264 (inf) time: 1.5970 data: 0.7861 max mem: 39504
Epoch: [0] [ 450/17227] eta: 7:49:04 lr: 0.000000 min_lr: 0.000000 loss: 5.9911 (5.9914) loss_scale: 8192.0000 (8882.2350) weight_decay: 0.0500 (0.0500) grad_norm: 6.5222 (inf) time: 1.4259 data: 0.6114 max mem: 39504
The text was updated successfully, but these errors were encountered: