You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I an my colleagues have faced the problem of appearing NANs in CTC loss and interruption of training due to too much infinite gradients when training Zipformer on our data.
After some debugging we found a reason for this behaviour and I would like to share this finding for it to be fixed in all related recipes.
In training script there is a piece of code which filters out pronunciations (token sequences) which are too long to be aligned successfully with feature sequences:
This code accounts for only non-blank tokens from SentencePiece tokenizer. However, when there are two or more identical tokens in a row, they must be separated by the blank token for CTC alignment! So in cases where condition above is satisfied it is still possible to fail computing CTC loss. So we suggest to fix this in such a way:
T = ((cut.num_frames - 7) // 2 + 1) // 2
tokens = sp.encode(cut.supervisions[0].text, out_type=str)
num_tokens = len(tokens)
for i in range(1, len(tokens)):
if tokens[i] == tokens[i - 1]:
num_tokens += 1
if T < num_tokens:
After this correction no NANs appears in CTC losses anymore.
It seems that similar bug exists in most of training scripts related to CTC-based training.
The text was updated successfully, but these errors were encountered:
Sorry I don't know how to make pull-requests indeed (
Audio is not very short, we faced this when working with grapheme tokenizer, the number of graphemes is sometimes indeed comparable to T
Hi,
I an my colleagues have faced the problem of appearing NANs in CTC loss and interruption of training due to too much infinite gradients when training Zipformer on our data.
After some debugging we found a reason for this behaviour and I would like to share this finding for it to be fixed in all related recipes.
In training script there is a piece of code which filters out pronunciations (token sequences) which are too long to be aligned successfully with feature sequences:
icefall/egs/librispeech/ASR/zipformer/train.py
Line 1326 in f84270c
This code accounts for only non-blank tokens from SentencePiece tokenizer. However, when there are two or more identical tokens in a row, they must be separated by the blank token for CTC alignment! So in cases where condition above is satisfied it is still possible to fail computing CTC loss. So we suggest to fix this in such a way:
After this correction no NANs appears in CTC losses anymore.
It seems that similar bug exists in most of training scripts related to CTC-based training.
The text was updated successfully, but these errors were encountered: