Build a prototype for automatic speech recognition (ASR) service using open sourced Whisper.
Installation on MacOS using Homebrew
brew install ffmpeg
brew install portaudio
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
To run the prototype, first the server then the client need to be started.
The server opens a websocket to receive an audio stream. Caches the data and does the transcription or translation using whisper.
python streaming_server.py
# Or docker
docker run -p 8765:8765 lingualogic/whisper-asr:0.2.1
# with GPU
docker run --gpus=all -p 8765:8765 lingualogic/whisper-asr:0.2.1
The client opens an microphone and send the audio stream via websocket. It is capable of detecting the end of speech and transmits this to the server in order to receive the result.
python streaming_client.py
# set translate task
python streaming_client.py --task translate