-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation for adapter fine-tuning #1545
Conversation
.. code-block:: | ||
|
||
For dev, WER of different settings are: | ||
greedy_search 15.44 best for dev | ||
|
||
For test, WER of different settings are: | ||
greedy_search 15.42 best for test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
Do you have accuracy metrics for the case when you fine-tune whole ASR model on the same data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, fine-tuning the whole model gives us 13.31/13.39. You may want to have a look at #1484 for more reference numbers.
Hi, I conducted experiments with my own data and found that the training time for adapter-finetune and full-finetune are almost the same. I print weight values of different epochs of adapter-finetune, and only adapters' weights change. Is this normal? Thank you! |
yes this is normal.
you need to look into the implementation and technical details of adapter based finetuning to gain deeper understanding.
best regards
jin
… On May 28, 2024, at 15:26, 1215thebqtic ***@***.***> wrote:
Hi,
I conducted experiments with my own data and found that the training time for adapter-finetune and full-finetune are almost the same. I print weight values of different epochs of adapter-finetune, and only adapters' weights change. Is this normal? Thank you!
—
Reply to this email directly, view it on GitHub <#1545 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOON42CM5RETC7FIRVZABYTZEQWQVAVCNFSM6AAAAABEROZKC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZUGUZDCOBZGY>.
You are receiving this because you are subscribed to this thread.
|
The doc said adapter-finetune is much faster than full-finetune, so I'm confused. |
Could you show me the average time for finishing one epoch for both of your experiments? In my experiment, adapter-based training took ~25 min per epoch, and the full-finetune took ~30 min per epoch. |
adapter-finetune is 60min/epoch, full-finetune is 46min/epoch. Adapter finetune costs more time. I change base_lr=0.00005, lr_epochs=100, lr_batches=100000 for full-finetune. I use default values for adapter finetune. |
That's unusual, how big is your model? BTW, your base-lr seems very small (even for full fine-tuning), which might lead to poor performance. |
For model with adapter: Number of model parameters: 169281897, A total of 1234624 trainable parameters (0.729% of the whole model) |
Add documentation for #1512.