VOSK STT Engine #280

aaronchantrill · 2020-06-28T12:20:12Z

Detailed Description

VOSK (https://alphacephei.com/vosk/) is a new open-source STT toolkit/engine built on Kaldi and which is optimized to run on Raspberry Pi. Building a language model is described here https://alphacephei.com/vosk/adaptation.html

Context

Learning to train and adapt the acoustic model, language model and dictionary is enormously helpful in speech recognition. The more you can reduce the total range of probabilities, the better the recognition becomes. Naomi has an advantage in that we have a list of phrases that can be used to build a language model directly from.

Possible Implementation

VOSK can be installed with a simple pip3 install vosk. The training tools are basically Kaldi, but it is not necessary to install Kaldi to use VOSK. The adaptation page shows a good start on developing a language model from the intent templates.

The text was updated successfully, but these errors were encountered:

aaronchantrill · 2022-10-15T17:16:14Z

I am working on this, and the reliability of VOSK is pretty amazing. It is also pretty lightweight and easy to install. I am currently trying to adapt the Language Model using the instructions at https://alphacephei.com/vosk/lm. From what I'm understanding right now, I need to convert all the speechhandler intent templates into a JSGF file, then use that to generate an ARPA statistical model, then interpolate that with the default VOSK language model. Phonetisaurus works for generating a custom dictionary, and the VOSK compile model (https://alphacephei.com/vosk/models/vosk-model-en-us-0.22-compile.zip) comes with a pre-trained fst file to use with Phonetisaurus for generating new pronunciations.

Akul2010 · 2023-01-08T02:20:11Z

I found this github link for making a custom model: https://github.com/matteo-39/vosk-build-model

aaronchantrill · 2024-03-04T16:04:01Z

@Akul2010 Sorry, I meant to get back to you earlier. That is a very interesting set of instructions for building a VOSK model, but overkill for anything we'd be doing. Using those instructions, you could add a whole new language to VOSK, which is awesome.

We just need to customize the Gr.fst and HCLr.fst with custom words and phrases. The process is described here: https://alphacephei.com/vosk/lm and supports English, French, German, and Russian and is pretty straightforward, but it requires installing both Kaldi and SRILM. Kaldi is usually pretty easy to install, although the last time I installed on a new Bookworm system I had to trick the installer into thinking that I had Python 2.7 installed since it still thinks it needs it for the install process, but it is no longer available through my package manager. I see some discussion that Python 2.7 was really only required for Pocketsphinx, which has also updated to Python 3, so hopefully Kaldi will drop that requirement soon. SRILM is open-source and available for academic and government use but is not freely available. You have to register an account to download it. On my Raspberry Pi I had to trick it into compiling on aarch64 by modifying make files as described here: https://github.com/G10DRAS/SRILM-on-RaspberryPi

There are other, free-er libraries that can be used instead of SRILM, including KenLM which is very lightweight and we are already using for building language models for Pocketsphinx (although with a much smaller vocabulary). I'm not sure about the process of converting a language model file to fst format, though.

Overall, the process of getting the Raspberry Pi set up is not simple, but once you have it set up all you have to do is drop your vocabulary into the db/extra.txt file, then run compile-graph.sh and wait for it to finish so you can pick up your new vocabulary G.fst and HCL.fst files from exp/chain/tdnn/graph.

The last time I tried this, I tried it on a couple of computers and kept running into memory issues. I finally got it working under WSL on a Windows machine with 32 GiB of ram. I'm getting ready to try again with my 8GiB Raspberry Pi 5.

aaronchantrill · 2024-03-05T22:02:00Z

The Raspberry Pi 5 was able to do it! It did cut off all communication for a little while and I'm not sure how long it took, but it was able to build a HCLG.fst file which I am using now and does recognize my custom vocabulary.

Akul2010 · 2024-03-05T22:04:42Z

Great! Do you plan on making it available in maybe the next few builds on Naomi?

aaronchantrill · 2024-03-06T01:43:46Z

@Akul2010 I think what makes sense is just to write up the steps required for generating a custom vocabulary for now and put a check in place that notifies you if there are any words in the current "languagemodel" file (ie, ~/.config/naomi/vocabularies/en-US/VOSK STT/default/languagemodel) that do not also appear in the vosk words.txt (ie, ~/.config/naomi/vosk/vosk-model-en-us-0.22-lgraph/graph/words.txt) file.

aaronchantrill · 2024-03-06T01:49:18Z

It would be good to see if we can use KenLM to generate a language model and then convert that to an HCLG.fst file. I'm really not comfortable requiring people to go out and register with SRI so they can download a copy of SRILM.

aaronchantrill · 2024-03-06T02:01:23Z

It is currently available from https://github.com/aaronchantrill/Naomi_VOSK_STT but I haven't added it to NPE yet because it's so difficult to customize the vocabulary. I think if I add the check to warn the user if the vocabulary they are using uses any words that Vosk does not currently know with a link to a detailed description of how to generate a custom Vosk vocabulary, that will be enough for me to feel good about adding it to the NPE.

aaronchantrill · 2024-03-16T20:12:21Z

@Akul2010 I have updated Naomi_VOSK_STT plugin at https://github.com/aaronchantrill/Naomi_VOSK_STT - it still doesn't do the language model adaptation automatically, but it does give you warnings if there are any words in your vocabulary that it doesn't know. I added a credit at the bottom for you since we never managed to get your pull request merged. Thanks! I'll be submitting this plugin to NPE later today, and will be recording a new "How to install Naomi" video soon.

Akul2010 · 2024-03-17T14:48:49Z

Great! Thank you!

aaronchantrill added Good First Issue! Priority: Medium Status: Available Type: Enhancement labels Jul 4, 2020

aaronchantrill added the Hacktoberfest Small or non-core issues that could be worked on by Hacktoberfest participants label Sep 22, 2020

aaronchantrill added Status: In Progress and removed Status: Available labels Oct 15, 2022

aaronchantrill mentioned this issue Nov 6, 2022

Speaker Recognition #367

Draft

8 tasks

aaronchantrill linked a pull request Nov 6, 2022 that will close this issue

Speaker Recognition #367

Draft

8 tasks

Akul2010 mentioned this issue Jan 2, 2023

Added a very simple implementation of Vosk STT - Needs Work #369

Closed

aaronchantrill closed this as completed Mar 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VOSK STT Engine #280

VOSK STT Engine #280

aaronchantrill commented Jun 28, 2020 •

edited

Loading

aaronchantrill commented Oct 15, 2022 •

edited

Loading

Akul2010 commented Jan 8, 2023

aaronchantrill commented Mar 4, 2024

aaronchantrill commented Mar 5, 2024

Akul2010 commented Mar 5, 2024

aaronchantrill commented Mar 6, 2024

aaronchantrill commented Mar 6, 2024

aaronchantrill commented Mar 6, 2024

aaronchantrill commented Mar 16, 2024

Akul2010 commented Mar 17, 2024

VOSK STT Engine #280

VOSK STT Engine #280

Comments

aaronchantrill commented Jun 28, 2020 • edited Loading

Detailed Description

Context

Possible Implementation

aaronchantrill commented Oct 15, 2022 • edited Loading

Akul2010 commented Jan 8, 2023

aaronchantrill commented Mar 4, 2024

aaronchantrill commented Mar 5, 2024

Akul2010 commented Mar 5, 2024

aaronchantrill commented Mar 6, 2024

aaronchantrill commented Mar 6, 2024

aaronchantrill commented Mar 6, 2024

aaronchantrill commented Mar 16, 2024

Akul2010 commented Mar 17, 2024

aaronchantrill commented Jun 28, 2020 •

edited

Loading

aaronchantrill commented Oct 15, 2022 •

edited

Loading