Speech recognition

Installation and usage

This is a Python 3 application that requires the sounddevice package to grab live frames from a mic. Install it with:

pip install sounddevice

Depending on the selected backend, additional dependencies might be required (see below).

Launch the program with --help to see available options. You can display and select the preferred input device with --list-devices and --device, respectively (otherwise the system default will be chosen).

This application opens two YARP ports: an <prefix>/rpc:s port that allows to request a dictionary/model change and to mute/unmute the microphone, and a <prefix>/result:o port that broadcasts the transcribed text. The default prefix is /speechRecognition, but it can be changed with the --prefix option.

PocketSphinx backend

Install the pocketsphinx package with:

pip install pocketsphinx

Then, launch the program with the --backend pocketsphinx --dictionary xxx --language xxx options. The dictionary and language combo relies on the adequate dictionary and model files being installed (check share/speechRecognition). For example, to use the waiter Spanish orders dictionary, put:

speechRecognition --backend pocketsphinx --dictionary waiter --language es

Vosk (Kaldi) backend

Install the vosk package with:

pip install vosk

Then, launch the program with the --backend vosk --model xxx options. Model files are downloaded on demand from the Vosk website. For example, to use the ~50 MB Spanish model, put:

speechRecognition --backend vosk --model small-es-0.42

To list and download the desired models offline and test the Vosk engine, you can use the vosk-transcriber application.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Speech recognition

Installation and usage

PocketSphinx backend

Vosk (Kaldi) backend

Files

README.md

Latest commit

History

README.md

File metadata and controls

Speech recognition

Installation and usage

PocketSphinx backend

Vosk (Kaldi) backend