This is a Python 3 application that requires the sounddevice
package to grab live frames from a mic. Install it with:
pip install sounddevice
Depending on the selected backend, additional dependencies might be required (see below).
Launch the program with --help
to see available options. You can display and select the preferred input device with --list-devices
and --device
, respectively (otherwise the system default will be chosen).
This application opens two YARP ports: an <prefix>/rpc:s
port that allows to request a dictionary/model change and to mute/unmute the microphone, and a <prefix>/result:o
port that broadcasts the transcribed text. The default prefix is /speechRecognition
, but it can be changed with the --prefix
option.
Install the pocketsphinx
package with:
pip install pocketsphinx
Then, launch the program with the --backend pocketsphinx --dictionary xxx --language xxx
options. The dictionary and language combo relies on the adequate dictionary and model files being installed (check share/speechRecognition). For example, to use the waiter Spanish orders dictionary, put:
speechRecognition --backend pocketsphinx --dictionary waiter --language es
Install the vosk
package with:
pip install vosk
Then, launch the program with the --backend vosk --model xxx
options. Model files are downloaded on demand from the Vosk website. For example, to use the ~50 MB Spanish model, put:
speechRecognition --backend vosk --model small-es-0.42
To list and download the desired models offline and test the Vosk engine, you can use the vosk-transcriber
application.