Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shuffling wav files in dataloader does not ensure that all the training files are checked in each epoch #73

Open
dkatsiros opened this issue Nov 16, 2020 · 2 comments

Comments

@dkatsiros
Copy link

As a results the model is trained on N*M utterances per epoch and not the whole training set. This affects the convergence as well as possible extensions of the code (e.g. early stopping).

where:
N=number of speakers per batch,
M=number of utterances per speaker per batch according to the referenced paper.

shuffle(wav_files)
wav_files = wav_files[0:self.utterance_number]

@dkatsiros dkatsiros changed the title Shuffling wav files in dataloader does not ensure that all the training files are checked at each epoch Shuffling wav files in dataloader does not ensure that all the training files are checked in each epoch Nov 16, 2020
@dkatsiros
Copy link
Author

For TIMIT dataset, where M=9 (I think) the dataloader may be ok. The issue appears in large datasets such as VoxCeleb1 or VoxCeleb2 where M>50.

@dkatsiros
Copy link
Author

@HarryVolek Can you check this please ? If that is the case I will pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant