Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NANs in CTC loss: solution #1777

Open
kfmn opened this issue Oct 21, 2024 · 2 comments
Open

NANs in CTC loss: solution #1777

kfmn opened this issue Oct 21, 2024 · 2 comments

Comments

@kfmn
Copy link

kfmn commented Oct 21, 2024

Hi,

I an my colleagues have faced the problem of appearing NANs in CTC loss and interruption of training due to too much infinite gradients when training Zipformer on our data.
After some debugging we found a reason for this behaviour and I would like to share this finding for it to be fixed in all related recipes.

In training script there is a piece of code which filters out pronunciations (token sequences) which are too long to be aligned successfully with feature sequences:

if T < len(tokens):

This code accounts for only non-blank tokens from SentencePiece tokenizer. However, when there are two or more identical tokens in a row, they must be separated by the blank token for CTC alignment! So in cases where condition above is satisfied it is still possible to fail computing CTC loss. So we suggest to fix this in such a way:

        T = ((cut.num_frames - 7) // 2 + 1) // 2
        tokens = sp.encode(cut.supervisions[0].text, out_type=str)
        num_tokens = len(tokens)
        for i in range(1, len(tokens)):
            if tokens[i] == tokens[i - 1]:
                num_tokens += 1
        if T < num_tokens:

After this correction no NANs appears in CTC losses anymore.
It seems that similar bug exists in most of training scripts related to CTC-based training.

@csukuangfj
Copy link
Collaborator

Thanks for sharing!

Could you create a pull-request to integrate your proposal?


By the way, is the audio causing NaNs very short?

@kfmn
Copy link
Author

kfmn commented Oct 21, 2024

Sorry I don't know how to make pull-requests indeed (
Audio is not very short, we faced this when working with grapheme tokenizer, the number of graphemes is sometimes indeed comparable to T

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants