Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LoRA for Zipformer #1540

Merged
merged 17 commits into from
Mar 15, 2024
Merged

Conversation

marcoyang1998
Copy link
Collaborator

@marcoyang1998 marcoyang1998 commented Mar 11, 2024

This PR adds LoRA adaptation for Zipformer. For more details, please refer to the original LoRA paper (https://arxiv.org/abs/2106.09685).

To do:

  • Add LoRA support for QKV in self-attention
  • Add LoRA for feedforward module
  • Support exporting as a normal Zipformer
  • Benchmark against adapter

@marcoyang1998
Copy link
Collaborator Author

Add LoRA to the following layers:

  • QKV projection in self-attention
  • in_proj in Feedforward module
  • out_proj in Feedforward module

Experiment setup:

Use Zipformer pre-trained on LibriSpeech as initialization, fine-tune with LoRA on GigaSpeech subset small for 20 epochs

Experiment results:

Exp LoRA layers r trainable params WER on Giga
baseline, no finetune - - - 20.06/19.27
v1 QK 8 89,600 18.06/17.99
v2 QKV 4 98,560 18.06/17.99
v2 QKV 8 197,120 18.06/17.99
v3 QKV + FFW in_proj 4 364,288 15.99/16.17
v3 QKV + FFW in_proj 8 728,576 15.63/15.74
v4 QKV + full FFW 4 630,016 15.57/15.61
v4 QKV + full FFW 8 1,260,032 15.27/15.33

observations

  • Increasing r improves the performance
  • Beneficial to add more LoRA layers than increase r

Comparison with adapter

Exp num trainable WER on Giga
adapter 1.49M 15.05/15.18
LoRA 1.26M 15.27/15.33

@marcoyang1998 marcoyang1998 merged commit 2dfd5db into k2-fsa:master Mar 15, 2024
142 of 143 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant