You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't know if this is even technically possible (for at least two reasons), but I hope it is.
My biggest problem with voice typing is when the recognizer makes mistakes (or my tongue does). I can then go back and delete the mistaken text (using the very convenient "sliding backspace"), but then how do I add back the correct text?
In many, if not most, cases, just uttering a partial phrase will confuse Vosk, as it has no context to go on to guide it in guessing which words I'm actually saying... and it normally uses a lot of context, especially when you're not speaking your native language (but also when you are).
But that context isn't available, except... well, yes, it's available, it's typed there on the screen! So, can we
get the text before the part the user is speaking back from the screen (first potential problem: but it should be doable, since many regular virtual keyboards are able to offer correction suggestions even for words previously typed if you select them), and then,
feed it to Vosk as if it were what the user had just spoken, because it really is, and Vosk would then have a much better clue on how to interpret the newly-input audio? (second potential problem: does Vosk have this feature of feeding an initial text? Cfr OpenAI Whisper's prompts)
Let me make an example in case anything is unclear:
I say "The quick brown fox jumps over the lazy dog."
Sayboard interprets "The quick brown fax lumps over the lazy dog."
I delete "fax lumps", and then I say "fox jumps!"
Vosk understands "pox mumps", because initial consonants are notoriously hard to get, and after "pox", "mumps" might be reasonable.
I go back to point 3, and get increasingly frustrated.
Sorry for the example being silly, but it's hopefully clear: if at step 3, when I say "fox jumps", Vosk knew that there is "The quick brown" before that, it almost certainly wouldn't go with "pox mumps".
The text was updated successfully, but these errors were encountered:
Too bad, but fair enough. I also didn't find much, mainly this issue where the only shown implementation appears to be something else than context biasing, namely constraining to a set of words.
Which could come useful maybe for the "punctuation mode" and "spelling mode" discussed in #29, so maybe you can give it a look anyway.
I don't know if this is even technically possible (for at least two reasons), but I hope it is.
My biggest problem with voice typing is when the recognizer makes mistakes (or my tongue does). I can then go back and delete the mistaken text (using the very convenient "sliding backspace"), but then how do I add back the correct text?
In many, if not most, cases, just uttering a partial phrase will confuse Vosk, as it has no context to go on to guide it in guessing which words I'm actually saying... and it normally uses a lot of context, especially when you're not speaking your native language (but also when you are).
But that context isn't available, except... well, yes, it's available, it's typed there on the screen! So, can we
Let me make an example in case anything is unclear:
Sorry for the example being silly, but it's hopefully clear: if at step 3, when I say "fox jumps", Vosk knew that there is "The quick brown" before that, it almost certainly wouldn't go with "pox mumps".
The text was updated successfully, but these errors were encountered: