Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi_zh-Hans Recipe #1238

Merged
merged 85 commits into from
Sep 13, 2023
Merged

Multi_zh-Hans Recipe #1238

merged 85 commits into from
Sep 13, 2023

Conversation

JinZr
Copy link
Collaborator

@JinZr JinZr commented Sep 1, 2023

This PR includes scripts for training Zipformer model using multiple Chinese datasets.

Included Training Sets

  1. THCHS-30
  2. AiShell-{1,2,4}
  3. ST-CMDS
  4. Primewords
  5. MagicData
  6. Aidatatang_200zh
  7. AliMeeting
  8. WeNetSpeech
  9. KeSpeech-ASR

Included Test Sets

  1. Aishell-{1,2,4}
  2. Aidatatang_200zh
  3. AliMeeting
  4. MagicData
  5. KeSpeech-ASR
  6. WeNetSpeech

@JinZr
Copy link
Collaborator Author

JinZr commented Sep 7, 2023

@csukuangfj all requested changes have been applied, thank you!

@JinZr JinZr merged commit 0f1bc6f into k2-fsa:master Sep 13, 2023
34 checks passed
@xiaoxi91
Copy link

Hi, I was exploring this recipe and noticed that the Chinese modeling unit has been adjusted from char to high-freq-char + byte-symbol. I'm curious about the rationale behind this change and would love to understand the thought process. Could you please shed some light on why this decision was made?

Moreover, I'm wondering if this modification is expected to improve performance in any way? If so, could you elaborate on how and in which specific scenarios this might be beneficial?

Thank you very much for your time and for sharing this valuable recipe! Looking forward to your insights.

@JinZr
Copy link
Collaborator Author

JinZr commented Aug 13, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants