-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pronunciation rules instead of g2p model #9
Comments
Your understanding is correct :) For some cases, I could see expanding on gruut's abbreviation system, which does the usual regex match/expand stuff. Additionally, the lexicon can be extended with custom words fairly easily. But these approaches obviously won't work for all cases. I'm not able to dig into eSpeak's code due to the license, so I would need to find a different resource for pronunciation rules. How would you detect things like a username, though? I know to pronounce yours as "control alt defeat", but that's based on quite a bit of knowledge outside of the letters. |
That specific example is definitely out of scope, but I've noticed for example that when encountering clusters of consonants inside one "word", or parts of different words within one "word", that espeak tends to do better. |
I was wondering - I think it's possible to integrate something like spaCy into this project to better suss our Names Entities such as time, location, organizations. With some training of the models or using the larger models, it'll even be able to detect usernames and other nouns. For example, in this example using the smallest English model, it can detect person, organization, time, location, and money - Sorry for the short link. I only put it there because the long URL is like... this... I looked at spaCy's license and it's MIT, allowing for commercial use, modification, private use, etc. So it would be a good fit, as far as I can tell. Also, I don't quite understand how CRFSuite models help in tagging, but perhaps spaCy could sit after that and just tag things that CRFSuite didn't catch? |
I actually started the first version of gruut with spaCy, so this would be bringing it full circle 😆 spaCy's named entity recognition is certainly much better than mine. My issue with spaCy originally was that its tokenizer broke apart words like "don't" into sub-words, and I had to disable a lot of functionality to stop this. Maybe there are more options now in newer versions? The CRFSuite models are used for part of speech tagging right now in English and French. If I used spaCy, I wouldn't need these anymore for tagging (they're still needed for guessing word pronunciations, however).
Do you know if spaCy would be able to split non-delimited compound words apart? Like "ctrlaltdefeat" to "ctrl", "alt", and "defeat". |
That's so neat! Well, spaCy is very powerful, but definitely not perfect. We might have to further train it to recognize usernames and also understand how to spell them out. I'll look into whether someone's written a document about that. There is a whole document about how to train it for new NERs here so perhaps we can go along those lines. There could also be a rule-based approach to look for the "@" which commonly precedes mentions of usernames. For now, I downloaded the spaCy English Large and transformer models, hoping that one of them might have some knowledge of Internet usernames, but they both fell flat. However, there is a rule-based approach to not splitting up "don't" when tokenizing the word. Here's some code about the same. Notice the
|
I just looked up part-of-speech tagging in CRFSuite and yeah, you're right that spaCy will be able to do that just fine if you go with it. About guessing pronunciation... Am I understanding it right that you train a pre-created corpus of g2p from Phonetisaurus to make it more customized and the role of the CRFSuite is just to load the data in I ask because I filed this here issue - rhasspy/gruut-ipa#6 - about some incorrect phonemes and I'm wondering what's the best way to address that issue. Is it coming from Phonetisaurus or from your finetuning of their model? |
If I understand correctly, one difference between espeak's g2p system and gruut is that if a word is not present in the dictionary, espeak uses a set of rules based on letter groups while gruut uses a prediction model.
For some purposes, the pronunciation rules used by espeak are better than the default trained g2p model (such as pronouncing usernames, abbreviations, acronyms, etc. - essentially anything where the "words" are dissimilar to the ones trained on by the g2p model).
Would it be possible to address this somehow, say by providing espeak-derived pronunciation rules as an alternative?
The text was updated successfully, but these errors were encountered: