A bug in model/tts.py #28

chep0k · 2023-06-21T12:27:46Z

Speaking formally, the shape of variable y_cut_mask from here, might not match the shape of variable y_cut at the last dimension (which is out_size for y_cut).
To comprehend, take a look at the function sequence_mask, which we invoke to create y_cut_mask. As parameter max_length is not provided, the length dimension will be of size max(length) (look here). Thus, if all sequences in a batch, provided to GradTTS.forward(...) are shorter than out_size, the last dimension of the shape of y_cut_mask will not match the last dimension of y_cut.
An easy experiment can show up an issue. Start training GradTTS with batch_size==1. In that case if there is any sequence shorter than out_size, training will fail with shape mismatch.
The fix I suggest is elementary: provide parameter max_length=out_size when calling sequence_mask here.
Moreover, we better skip cropping out mel when all sequences in a batch, provided to GradTTS.forward(...) are shorter than out_size. Concrete, I suggest to add condition y_max_length > out_size here.

The text was updated successfully, but these errors were encountered:

iooops · 2023-12-10T09:39:48Z

Agree here.. I encountered the same bug here using my own dataset when batch size is small

chep0k mentioned this issue Jul 6, 2023

Fix cropping mel logic #30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A bug in model/tts.py #28

A bug in model/tts.py #28

chep0k commented Jun 21, 2023 •

edited

Loading

iooops commented Dec 10, 2023

A bug in model/tts.py #28

A bug in model/tts.py #28

Comments

chep0k commented Jun 21, 2023 • edited Loading

iooops commented Dec 10, 2023

chep0k commented Jun 21, 2023 •

edited

Loading