Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: consider context when filling in corrections #53

Open
LuccoJ opened this issue Nov 13, 2023 · 2 comments
Open

Request: consider context when filling in corrections #53

LuccoJ opened this issue Nov 13, 2023 · 2 comments

Comments

@LuccoJ
Copy link
Contributor

LuccoJ commented Nov 13, 2023

I don't know if this is even technically possible (for at least two reasons), but I hope it is.

My biggest problem with voice typing is when the recognizer makes mistakes (or my tongue does). I can then go back and delete the mistaken text (using the very convenient "sliding backspace"), but then how do I add back the correct text?

In many, if not most, cases, just uttering a partial phrase will confuse Vosk, as it has no context to go on to guide it in guessing which words I'm actually saying... and it normally uses a lot of context, especially when you're not speaking your native language (but also when you are).

But that context isn't available, except... well, yes, it's available, it's typed there on the screen! So, can we

  • get the text before the part the user is speaking back from the screen (first potential problem: but it should be doable, since many regular virtual keyboards are able to offer correction suggestions even for words previously typed if you select them), and then,
  • feed it to Vosk as if it were what the user had just spoken, because it really is, and Vosk would then have a much better clue on how to interpret the newly-input audio? (second potential problem: does Vosk have this feature of feeding an initial text? Cfr OpenAI Whisper's prompts)

Let me make an example in case anything is unclear:

  1. I say "The quick brown fox jumps over the lazy dog."
  2. Sayboard interprets "The quick brown fax lumps over the lazy dog."
  3. I delete "fax lumps", and then I say "fox jumps!"
  4. Vosk understands "pox mumps", because initial consonants are notoriously hard to get, and after "pox", "mumps" might be reasonable.
  5. I go back to point 3, and get increasingly frustrated.

Sorry for the example being silly, but it's hopefully clear: if at step 3, when I say "fox jumps", Vosk knew that there is "The quick brown" before that, it almost certainly wouldn't go with "pox mumps".

@ElishaAz
Copy link
Owner

The first point: that's actually what I do with auto capitalization

The second point: Vosk does not have this feature, as far as I can see

@LuccoJ
Copy link
Contributor Author

LuccoJ commented Nov 29, 2023

Too bad, but fair enough. I also didn't find much, mainly this issue where the only shown implementation appears to be something else than context biasing, namely constraining to a set of words.

Which could come useful maybe for the "punctuation mode" and "spelling mode" discussed in #29, so maybe you can give it a look anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants