General Training Questions (train / val curve, batch size, iter size) #8

ShengyuH · 2019-11-04T09:14:36Z

Hi Chris,

With the same optimizer, learning rate schedule, I got training curves like this. Compared to your implementation, I simply remove the data augmentation part, is the chromatic data augmentation that important in this case? Btw, I only have one 1080Ti, so I set the maximum number of points to be 0.6M then the batch size of each iteration varies. Have you compared with SparseConv in terms of training time on this semantic segmentation task? I played with that once and I remember it would be much faster.

chrischoy · 2019-11-04T09:25:21Z

I don't understand the question exactly, but

Data augmentation is pretty significant.

I have not compared with SparseConv. However, on some earlier versions, I removed optimal cuda kernels and reintroduced them back on a recent version. Change Log

It should be pretty fast now.

ShengyuH · 2019-11-04T09:31:32Z

After 50k iterations, the validation average IoU still has around 10% to go compared to your pre-trained model. So My question is how much does data augmentation help in this task. Do you have more numbers about the contribution from the data augmentation part?

Thanks for your quick replay.

chrischoy · 2019-11-04T09:38:38Z

Are you saying that at 50k the mIoU is 10% short of 72%, which is trained for 120k?

The batch size is very important to make the training very stable. Note that the learning rate is really high and if you can't train it on a GPU with larger VRAM,

use the option --iter_size to average the gradients. The name comes from Caffe where it accumulates gradients iter_size times.
reduce the learning rate accordingly to control the training speed. However, the stability of the gradient (controlled by batch_size) is more important than smaller learning rate.

The chromatic augmentation is pretty important, but I don't have any numerical comparison.

ShengyuH · 2019-11-04T09:54:52Z

I revisit the SparseConv, they don't have chromatic augmentation. I will introduce this part to my implementation to check the results. I will try to stabilize the gradient. Can you share your train/val loss curve if you still have it?

chrischoy · 2019-11-04T10:18:23Z

Shorter training for 60k with 3 conv1_kernel_size, but the trend is pretty much the same.
The score is the Overall Accuracy.

0-10k: https://pastebin.com/eej7Vg7D
10k-20k: https://pastebin.com/X04fhLYW
20k-30k: https://pastebin.com/qPJEpKU0
30k-40k: https://pastebin.com/H20w7W5K
40k-50k: https://pastebin.com/5fbCnQxp
50k-60k: https://pastebin.com/kHeYE1d7

ShengyuH · 2019-11-04T10:19:35Z

Thank you so much!

The provided loss and overall Acc:

chrischoy · 2019-11-06T20:03:27Z

Oh wow thanks for the graphs!

ShengyuH closed this as completed Nov 4, 2019

chrischoy changed the title ~~Training Issue~~ General Training Questions (train / val curve, batch size, iter size) Nov 4, 2019

chrischoy mentioned this issue Nov 4, 2019

Training time #1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General Training Questions (train / val curve, batch size, iter size) #8

General Training Questions (train / val curve, batch size, iter size) #8

ShengyuH commented Nov 4, 2019

chrischoy commented Nov 4, 2019 •

edited

Loading

ShengyuH commented Nov 4, 2019

chrischoy commented Nov 4, 2019 •

edited

Loading

ShengyuH commented Nov 4, 2019

chrischoy commented Nov 4, 2019 •

edited

Loading

ShengyuH commented Nov 4, 2019 •

edited

Loading

chrischoy commented Nov 6, 2019

General Training Questions (train / val curve, batch size, iter size) #8

General Training Questions (train / val curve, batch size, iter size) #8

Comments

ShengyuH commented Nov 4, 2019

chrischoy commented Nov 4, 2019 • edited Loading

ShengyuH commented Nov 4, 2019

chrischoy commented Nov 4, 2019 • edited Loading

ShengyuH commented Nov 4, 2019

chrischoy commented Nov 4, 2019 • edited Loading

ShengyuH commented Nov 4, 2019 • edited Loading

chrischoy commented Nov 6, 2019

chrischoy commented Nov 4, 2019 •

edited

Loading

chrischoy commented Nov 4, 2019 •

edited

Loading

chrischoy commented Nov 4, 2019 •

edited

Loading

ShengyuH commented Nov 4, 2019 •

edited

Loading