Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General Training Questions (train / val curve, batch size, iter size) #8

Closed
ShengyuH opened this issue Nov 4, 2019 · 7 comments
Closed

Comments

@ShengyuH
Copy link

ShengyuH commented Nov 4, 2019

Hi Chris,

With the same optimizer, learning rate schedule, I got training curves like this. Compared to your implementation, I simply remove the data augmentation part, is the chromatic data augmentation that important in this case? Btw, I only have one 1080Ti, so I set the maximum number of points to be 0.6M then the batch size of each iteration varies. Have you compared with SparseConv in terms of training time on this semantic segmentation task? I played with that once and I remember it would be much faster.

image

@chrischoy
Copy link
Owner

chrischoy commented Nov 4, 2019

I don't understand the question exactly, but

Data augmentation is pretty significant.

I have not compared with SparseConv. However, on some earlier versions, I removed optimal cuda kernels and reintroduced them back on a recent version. Change Log

It should be pretty fast now.

@ShengyuH
Copy link
Author

ShengyuH commented Nov 4, 2019

After 50k iterations, the validation average IoU still has around 10% to go compared to your pre-trained model. So My question is how much does data augmentation help in this task. Do you have more numbers about the contribution from the data augmentation part?

Thanks for your quick replay.

@chrischoy
Copy link
Owner

chrischoy commented Nov 4, 2019

Are you saying that at 50k the mIoU is 10% short of 72%, which is trained for 120k?

The batch size is very important to make the training very stable. Note that the learning rate is really high and if you can't train it on a GPU with larger VRAM,

  1. use the option --iter_size to average the gradients. The name comes from Caffe where it accumulates gradients iter_size times.

  2. reduce the learning rate accordingly to control the training speed. However, the stability of the gradient (controlled by batch_size) is more important than smaller learning rate.

The chromatic augmentation is pretty important, but I don't have any numerical comparison.

@ShengyuH
Copy link
Author

ShengyuH commented Nov 4, 2019

I revisit the SparseConv, they don't have chromatic augmentation. I will introduce this part to my implementation to check the results. I will try to stabilize the gradient. Can you share your train/val loss curve if you still have it?

@chrischoy
Copy link
Owner

chrischoy commented Nov 4, 2019

Shorter training for 60k with 3 conv1_kernel_size, but the trend is pretty much the same.
The score is the Overall Accuracy.

0-10k: https://pastebin.com/eej7Vg7D
10k-20k: https://pastebin.com/X04fhLYW
20k-30k: https://pastebin.com/qPJEpKU0
30k-40k: https://pastebin.com/H20w7W5K
40k-50k: https://pastebin.com/5fbCnQxp
50k-60k: https://pastebin.com/kHeYE1d7

@ShengyuH
Copy link
Author

ShengyuH commented Nov 4, 2019

Thank you so much!

The provided loss and overall Acc:

image

image

@ShengyuH ShengyuH closed this as completed Nov 4, 2019
@chrischoy chrischoy changed the title Training Issue General Training Questions (train / val curve, batch size, iter size) Nov 4, 2019
@chrischoy
Copy link
Owner

Oh wow thanks for the graphs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants