noisyspeech_synthesizer.py always slices from the start of the noise array #25

rg1990 · 2023-06-26T16:44:02Z

In noisyspeech_synthesizer.py, an array of audio samples are read from a noise file (line 78). On line 81, a slice of the noise array is taken from index 0 to len(clean) as:

noise = noise[0:len(clean)]

By always starting at index 0, in the case where the clean speech arrays are roughly the same length (~16000 samples) as in the speech commands case, it means that the number of unique noise arrays we see is equal to the number of noise files.

Even if we have one noise file with 10 hours of audio, we may only ever make use of the first 1 second of this data.

It would be better to pick a random starting index within the noise array from which to take a slice. For example
start_idx = np.random.randint(low=0, high=len(noise)-len(clean), size=1)
noise = noise[start_idx : start_idx+len(clean)]

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

noisyspeech_synthesizer.py always slices from the start of the noise array #25

noisyspeech_synthesizer.py always slices from the start of the noise array #25

rg1990 commented Jun 26, 2023

noisyspeech_synthesizer.py always slices from the start of the noise array #25

noisyspeech_synthesizer.py always slices from the start of the noise array #25

Comments

rg1990 commented Jun 26, 2023