Skip to content

Whisper Advanced Parameters

jhj0517 edited this page Sep 25, 2023 · 6 revisions
Parameters Description
beam_size Parameter used in the beam search algorithm.
TLDR; Higher beam size, higher quality but slower transcribing. Smaller beam size, lower quality but faster transcribing.
log_prob_threshold Parameter related to how whisper handles the "silent" part of the audio. If the average log probability over sampled tokens is below this value, treat as failed.
TLDR; Lower this value if you want whisper to be more "sensitive" to small sounds. Adjust together with no_speech_threshold and see what happens.
no_speech_threshold Parameter related to how Whisper handles the "silent" part of the audio. If the no_speech probability is higher than this value AND the average log probability over sampled tokens is below log_prob_threshold, consider the segment as silent.
TLDR; Lower this value if you want Whisper to be more "sensitive" to small sounds. Adjust together with log_prob_threshold and see what happens.
compute_type Compute type such as float16 or float32. default to float16 if CUDA is enabled, else float32.
Clone this wiki locally