Replies: 4 comments 1 reply
-
yes, i think using pretrained model for encoder initialization or simply supervised fine-tuning pre-trained ssl models has been a paradigm for low resource scenarios |
Beta Was this translation helpful? Give feedback.
-
i have no experience on speech translation, what model are you using for the experiment?
best
jin
… On Mar 27, 2024, at 18:13, iggygeek ***@***.***> wrote:
Well for Speech Translation, I naively tried to load the encoder from an ASR model, using the loading method I found in the fine-tuning script of Librispeech. Then I runned a regular training schedule but the training loss diverged in 1 or 2 steps. Do you have any insight?
|
Beta Was this translation helpful? Give feedback.
-
I am using the zipformer2 model. I just trained ASR and ST systems with ~100h of data. Now I want to initialize better the ST translation. |
Beta Was this translation helpful? Give feedback.
-
i think ST tasks might require a much more powerful decoder, the current
Zipformer2 recipes use “context less” decoder for asr tasks, which is quite
simple and with very little amount of parameters. i would suggest you to
still use Zipformer2 as the encoder of your model, but switch to a AED
architecture (i believe @yaozengwei has a working pr) or adopt a RNN based
decoder. you can still use a pre-trained encoder for initialization, i
believe that would be quite helpful.
again i have no experience on ST, so suggestions above are purely
empirical, hope this helps
Best Regards
Jin
…On Wed, 27 Mar 2024 at 22:33 iggygeek ***@***.***> wrote:
I am using the zipformer2 model. I just trained ASR and ST systems with
~100h of data. Now I want to initialize better the ST translation.
I loaded the encoder as :
ckpt = load_model_params(
ckpt=params.pre_path, model=model, init_modules=['encoder']
)
—
Reply to this email directly, view it on GitHub
<#1570 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOON42CGEBCSFDM7KSR55E3Y2LDDLAVCNFSM6AAAAABFJKMYROVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DSMRZGM2DE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
In a low resource setting, I was wondering if you had tried to initialize the encoder part from a pretrained model ?
Beta Was this translation helpful? Give feedback.
All reactions