This is an implementation of OpenAI's Whisper for the purpose of speech-to-text via your default microphone, enabling direct output to your clipboard and/or CLI. In addition, the script records and inferences with the press of a desired keystroke combination.
Installation is as easy as:
pip install -U openai-whisper
pip install -r requirements.txt
I also strongly encourage the installation of PyTorch with CUDA:
pip3 install torch torchvision torchaudio --extra-index-url
If you choose to use PyTorch with your CPU instead, please run:
pip install torch
Most input arguments carry over from the base Whisper package.
Those of note are:
default is the 'small' model, some others are the 'tiny', 'base', 'medium', and 'large' models
default is set as None, 'Users/[username]/.cache/whisper' is used as the default dir
default is 'cuda', set this to 'cpu' if you don't have cuda installed with torch
default is transcribe (duh), but translation is also possible
default is set as None (enabling language detection), but I recommend setting this to English/French/Spanish/etc. to cut down on inference times
default is 0, depolarizes output distribution allowing for more 'creativity'
default is 0, this refers to CPU threads
Novel input arguments:
default is 1.0, amplifies the recording by a given floating point multiple
default is not set/False, prints text to your CLI/terminal
default is not set/False, prevents output from being sent to your clipboard
default is 'ctrl+shift+r', the combination that activates audio recording and inferencing