-
-
Notifications
You must be signed in to change notification settings - Fork 217
Whisper Advanced Parameters
jhj0517 edited this page May 17, 2024
·
6 revisions
Parameter | Description |
---|---|
beam_size |
Parameter used in the beam search algorithm. TLDR; Higher beam size, higher quality but slower transcription. Smaller beam size, lower quality but faster transcription. |
log_prob_threshold |
Parameter related to how whisper handles the "silent" part of the audio. If the average log probability over sampled tokens is below this value, treat as failed. TLDR; Lower this value if you want Whisper to be more "sensitive" to small sounds. Adjust together with no_speech_threshold and see what happens.
|
no_speech_threshold |
Parameter related to how Whisper handles the "silent" part of the audio. If the no_speech probability is higher than this value AND the average log probability over sampled tokens is below log_prob_threshold , consider the segment as silent. TLDR; Lower this value if you want Whisper to be more "sensitive" to small sounds. Adjust together with log_prob_threshold and see what happens.
|
compute_type |
Compute type such as float16 or float32 . default to float16 if CUDA is enabled, else float32 . |
condition_on_previous_text |
If True, the previous output of the model is provided as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop, such as repetition looping or timestamps going out of sync. TLDR; If failure loop (repetitive hallucination) occurs, consider setting this to False. |