Shuffling wav files in dataloader does not ensure that all the training files are checked in each epoch #73

dkatsiros · 2020-11-16T19:50:14Z

As a results the model is trained on N*M utterances per epoch and not the whole training set. This affects the convergence as well as possible extensions of the code (e.g. early stopping).

where:
N=number of speakers per batch,
M=number of utterances per speaker per batch according to the referenced paper.

PyTorch_Speaker_Verification/data_load.py

Lines 39 to 40 in 10e159a

 shuffle(wav_files) 

 wav_files = wav_files[0:self.utterance_number]

The text was updated successfully, but these errors were encountered:

dkatsiros · 2020-11-16T20:54:12Z

For TIMIT dataset, where M=9 (I think) the dataloader may be ok. The issue appears in large datasets such as VoxCeleb1 or VoxCeleb2 where M>50.

dkatsiros · 2020-11-16T21:02:28Z

@HarryVolek Can you check this please ? If that is the case I will pr

dkatsiros changed the title ~~Shuffling wav files in dataloader does not ensure that all the training files are checked at each epoch~~ Shuffling wav files in dataloader does not ensure that all the training files are checked in each epoch Nov 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shuffling wav files in dataloader does not ensure that all the training files are checked in each epoch #73

Shuffling wav files in dataloader does not ensure that all the training files are checked in each epoch #73

dkatsiros commented Nov 16, 2020

dkatsiros commented Nov 16, 2020

dkatsiros commented Nov 16, 2020

Shuffling wav files in dataloader does not ensure that all the training files are checked in each epoch #73

Shuffling wav files in dataloader does not ensure that all the training files are checked in each epoch #73

Comments

dkatsiros commented Nov 16, 2020

dkatsiros commented Nov 16, 2020

dkatsiros commented Nov 16, 2020