You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thanks for the excellent example of using DistributedDataParallel in PyTorch; it is very easy to understand and is much better that Pytorch docs.
One important bit that is missing is making the gradient descent truly stochastic in the distributed case. From Pytoch docs, in order to achieve this, set_epoch must be called on the sampler. Otherwise, the data points will be sampled in the same order in every epoch, without shuffling (remember, DataLoader is constructed with shuffle=False). I have also discovered that it is very important to set the epoch to the same value in each worker, otherwise there is a chance that some data points will be visited multiple times, and others none at all.
I hope all this makes sense. I think that future readers will benefit from the addition I am proposing. Once again, thanks for the excellent doc.
The text was updated successfully, but these errors were encountered:
Hi,
thanks for the excellent example of using DistributedDataParallel in PyTorch; it is very easy to understand and is much better that Pytorch docs.
One important bit that is missing is making the gradient descent truly stochastic in the distributed case. From Pytoch docs, in order to achieve this,
set_epoch
must be called on the sampler. Otherwise, the data points will be sampled in the same order in every epoch, without shuffling (remember,DataLoader
is constructed withshuffle=False
). I have also discovered that it is very important to set the epoch to the same value in each worker, otherwise there is a chance that some data points will be visited multiple times, and others none at all.I hope all this makes sense. I think that future readers will benefit from the addition I am proposing. Once again, thanks for the excellent doc.
The text was updated successfully, but these errors were encountered: