Skip to content

Commit

Permalink
🌏 add more language models
Browse files Browse the repository at this point in the history
this adds some previously errornously excluded language models
  • Loading branch information
anuejn committed Oct 10, 2023
1 parent 5738a57 commit 7a76da3
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 2 deletions.
24 changes: 23 additions & 1 deletion server/app/models.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,13 @@ English:
size: 128M
type: transcription
compressed: true
- name: big-2
url: https://alphacephei.com/vosk/models/vosk-model-en-us-0.42-gigaspeech.zip
description: Accurate generic US English model trained by Kaldi on <a href="http://kaldi-asr.org/models/m14">Gigaspeech</a>.
Mostly for podcasts, not for telephony
size: 2.3G
type: transcription
compressed: true
Indian English:
- name: big
url: https://alphacephei.com/vosk/models/vosk-model-en-in-0.5.zip
Expand Down Expand Up @@ -168,6 +175,14 @@ Portuguese/Brazilian Portuguese:
size: 1.6G
type: transcription
compressed: true
Greek:
- name: big
url: https://alphacephei.com/vosk/models/vosk-model-el-gr-0.7.zip
description: Big narrowband Greek model for server processing, not extremely accurate
though
size: 1.1G
type: transcription
compressed: true
Turkish:
- name: small
url: https://alphacephei.com/vosk/models/vosk-model-small-tr-0.3.zip
Expand Down Expand Up @@ -249,6 +264,13 @@ Farsi:
size: 47M
type: transcription
compressed: true
- name: big
url: https://alphacephei.com/vosk/models/vosk-model-fa-0.5.zip
description: Model with large vocabulary, not yet accurate but better than before
(Persian)
size: 1G
type: transcription
compressed: true
- name: small-2
url: https://alphacephei.com/vosk/models/vosk-model-small-fa-0.5.zip
description: Bigger small model for desktop application (Persian)
Expand Down Expand Up @@ -375,7 +397,7 @@ Korean:
compressed: true
Breton:
- name: big
url: https://alphacephei.com/vosk/models/vosk-model-br-0.7.zip
url: https://alphacephei.com/vosk/models/vosk-model-br-0.8.zip
description: Breton model from <a href="https://github.com/gweltou/vosk-br">vosk-br</a>
project
size: 70M
Expand Down
4 changes: 3 additions & 1 deletion server/scripts/generate_models_list.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,15 +51,17 @@
for row in rows:
if strong := row.find("strong"):
current_lang = strong.text
print(current_lang)
else:
assert (
current_lang is not None
), "no previous language heading found, probably the format changed :("
raw = {k: v for k, v in zip(columns, row.find_all("td"))}

if current_lang == "English Other" or "not" in raw["Notes"].text.lower():
if current_lang == "English Other" or "not recommended" in raw["Notes"].text.lower():
continue


if current_lang == "Speaker identification model":
continue

Expand Down

0 comments on commit 7a76da3

Please sign in to comment.