Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call set_epoch on DistributedSampler #5

Open
tanhevg opened this issue Jul 17, 2020 · 0 comments
Open

Call set_epoch on DistributedSampler #5

tanhevg opened this issue Jul 17, 2020 · 0 comments

Comments

@tanhevg
Copy link

tanhevg commented Jul 17, 2020

Hi,

thanks for the excellent example of using DistributedDataParallel in PyTorch; it is very easy to understand and is much better that Pytorch docs.

One important bit that is missing is making the gradient descent truly stochastic in the distributed case. From Pytoch docs, in order to achieve this, set_epoch must be called on the sampler. Otherwise, the data points will be sampled in the same order in every epoch, without shuffling (remember, DataLoader is constructed with shuffle=False). I have also discovered that it is very important to set the epoch to the same value in each worker, otherwise there is a chance that some data points will be visited multiple times, and others none at all.

I hope all this makes sense. I think that future readers will benefit from the addition I am proposing. Once again, thanks for the excellent doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant