Replies: 1 comment 6 replies
-
I've noticed the blank ratio of the simple AM goes down (and to zero after loss starts spiking). Has someone else had this happen? |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm seeing training diverge surprisingly often while using hyperparameters that would work for a CTC+attn model.
I'm not using the icefall setup (essentially just using the compute_loss function) but was hoping someone might have experience in knowing what could be responsible.
I'm using what I think is a relatively high warm up period (10k), an even longer learning rate warmup and am rescaling the losses continuously like in the latest zipformer recipe. I'm using an adamw beta2 of 0.95 and my layernorm's are in fp32. I see loss spikes around 11 or 12k steps. Using the modified model, k2 version is very recent.
Curious whether others may have experienced this. I'm sure if I keep lowering the LR it will work but I'm already using a lower than normal LR.
Beta Was this translation helpful? Give feedback.
All reactions