-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
General Training Questions (train / val curve, batch size, iter size) #8
Comments
I don't understand the question exactly, but Data augmentation is pretty significant. I have not compared with SparseConv. However, on some earlier versions, I removed optimal cuda kernels and reintroduced them back on a recent version. Change Log It should be pretty fast now. |
After 50k iterations, the validation average IoU still has around 10% to go compared to your pre-trained model. So My question is how much does data augmentation help in this task. Do you have more numbers about the contribution from the data augmentation part? Thanks for your quick replay. |
Are you saying that at 50k the mIoU is 10% short of 72%, which is trained for 120k? The batch size is very important to make the training very stable. Note that the learning rate is really high and if you can't train it on a GPU with larger VRAM,
The chromatic augmentation is pretty important, but I don't have any numerical comparison. |
I revisit the SparseConv, they don't have chromatic augmentation. I will introduce this part to my implementation to check the results. I will try to stabilize the gradient. Can you share your train/val loss curve if you still have it? |
Shorter training for 60k with 3 conv1_kernel_size, but the trend is pretty much the same. 0-10k: https://pastebin.com/eej7Vg7D |
Oh wow thanks for the graphs! |
Hi Chris,
With the same optimizer, learning rate schedule, I got training curves like this. Compared to your implementation, I simply remove the data augmentation part, is the chromatic data augmentation that important in this case? Btw, I only have one 1080Ti, so I set the maximum number of points to be 0.6M then the batch size of each iteration varies. Have you compared with SparseConv in terms of training time on this semantic segmentation task? I played with that once and I remember it would be much faster.
The text was updated successfully, but these errors were encountered: