Pronunciation rules instead of g2p model #9

ctlaltdefeat · 2021-07-03T14:07:28Z

If I understand correctly, one difference between espeak's g2p system and gruut is that if a word is not present in the dictionary, espeak uses a set of rules based on letter groups while gruut uses a prediction model.
For some purposes, the pronunciation rules used by espeak are better than the default trained g2p model (such as pronouncing usernames, abbreviations, acronyms, etc. - essentially anything where the "words" are dissimilar to the ones trained on by the g2p model).
Would it be possible to address this somehow, say by providing espeak-derived pronunciation rules as an alternative?

synesthesiam · 2021-07-04T14:57:38Z

Your understanding is correct :)

For some cases, I could see expanding on gruut's abbreviation system, which does the usual regex match/expand stuff. Additionally, the lexicon can be extended with custom words fairly easily. But these approaches obviously won't work for all cases.

I'm not able to dig into eSpeak's code due to the license, so I would need to find a different resource for pronunciation rules. How would you detect things like a username, though? I know to pronounce yours as "control alt defeat", but that's based on quite a bit of knowledge outside of the letters.

ctlaltdefeat · 2021-07-04T20:13:36Z

I know to pronounce yours as "control alt defeat", but that's based on quite a bit of knowledge outside of the letters.

That specific example is definitely out of scope, but I've noticed for example that when encountering clusters of consonants inside one "word", or parts of different words within one "word", that espeak tends to do better.
For example, espeak "correctly" deals with "qtpie" by pronouncing it "cue tee pie", whereas the output of gruut is k t p ˈi. Examples along those lines are relatively simple to come up with in domains of usernames, online lingo, etc.

nitinthewiz · 2021-08-26T16:29:55Z

I was wondering - I think it's possible to integrate something like spaCy into this project to better suss our Names Entities such as time, location, organizations. With some training of the models or using the larger models, it'll even be able to detect usernames and other nouns.

For example, in this example using the smallest English model, it can detect person, organization, time, location, and money -

https://bit.ly/3mCjAnN

Sorry for the short link. I only put it there because the long URL is like... this...

https://explosion.ai/demos/displacy-ent?text=ctlaltdefeat%20and%20%20Michael%20Hansen%20were%20talking%20this%20morning%20about%20pronunciation%20rules%20such%20as%20those%20used%20by%20eSpeak.%20These%20rules%20can%20detect%20entities%20like%20names%2C%20nouns%2C%20organizations%2C%20and%20locations%20better%20and%20thus%20lead%20to%20a%20better%20conversion.%20For%20example%2C%20AWS%2C%20USA%2C%20Amazon%2C%20gruut%2C%20and%20twenty%20dollars.&model=en_core_web_sm&ents=person%2Corg%2Cgpe%2Cloc%2Cproduct%2Cnorp%2Cdate%2Cper%2Cmisc%2Cmoney%2Ctime

I looked at spaCy's license and it's MIT, allowing for commercial use, modification, private use, etc.

So it would be a good fit, as far as I can tell.

Also, I don't quite understand how CRFSuite models help in tagging, but perhaps spaCy could sit after that and just tag things that CRFSuite didn't catch?

synesthesiam · 2021-08-30T17:50:26Z

I actually started the first version of gruut with spaCy, so this would be bringing it full circle 😆

spaCy's named entity recognition is certainly much better than mine. My issue with spaCy originally was that its tokenizer broke apart words like "don't" into sub-words, and I had to disable a lot of functionality to stop this. Maybe there are more options now in newer versions?

The CRFSuite models are used for part of speech tagging right now in English and French. If I used spaCy, I wouldn't need these anymore for tagging (they're still needed for guessing word pronunciations, however).

With some training of the models or using the larger models, it'll even be able to detect usernames and other nouns.

Do you know if spaCy would be able to split non-delimited compound words apart? Like "ctrlaltdefeat" to "ctrl", "alt", and "defeat".

nitinthewiz · 2021-08-30T18:23:34Z

That's so neat! Well, spaCy is very powerful, but definitely not perfect. We might have to further train it to recognize usernames and also understand how to spell them out. I'll look into whether someone's written a document about that. There is a whole document about how to train it for new NERs here so perhaps we can go along those lines. There could also be a rule-based approach to look for the "@" which commonly precedes mentions of usernames.

For now, I downloaded the spaCy English Large and transformer models, hoping that one of them might have some knowledge of Internet usernames, but they both fell flat.

However, there is a rule-based approach to not splitting up "don't" when tokenizing the word. Here's some code about the same. Notice the nlp.tokenizer.rules -

>>> import en_core_web_lg
>>> nlp = en_core_web_lg.load()
>>> nlp.tokenizer.rules = {key: value for key, value in nlp.tokenizer.rules.items() if "'" not in key and "’" not in key and "‘" not in key}
>>> doc = nlp("ctlaltdefeat and  Michael Hansen were talking this morning about pronunciation rules such as those used by eSpeak. These rules don't reflect the full power of NLP. For example, AWS, USA, Amazon, gruut, and twenty dollars.")
>>> for token in doc:
...     print(token.text)
... 
ctlaltdefeat
and
 
Michael
Hansen
were
talking
this
morning
about
pronunciation
rules
such
as
those
used
by
eSpeak
.
These
rules
don't
reflect
the
full
power
of
NLP
.
For
example
,
AWS
,
USA
,
Amazon
,
gruut
,
and
twenty
dollars
.
>>> 
>>> 
>>> 
>>> for ent in doc.ents:
...     print(ent.text, ent.start_char, ent.end_char, ent.label_)
... 
Michael Hansen 18 32 PERSON
this morning 46 58 TIME
NLP 159 162 ORG
AWS 177 180 ORG
USA 182 185 GPE
Amazon 187 193 ORG
gruut 195 200 ORG
twenty dollars 206 220 MONEY
>>>

nitinthewiz · 2021-08-30T19:59:03Z

I just looked up part-of-speech tagging in CRFSuite and yeah, you're right that spaCy will be able to do that just fine if you go with it.

About guessing pronunciation... Am I understanding it right that you train a pre-created corpus of g2p from Phonetisaurus to make it more customized and the role of the CRFSuite is just to load the data in gruut/g2p.py?

I ask because I filed this here issue - rhasspy/gruut-ipa#6 - about some incorrect phonemes and I'm wondering what's the best way to address that issue. Is it coming from Phonetisaurus or from your finetuning of their model?

Develop

synesthesiam added the enhancement New feature or request label Jul 4, 2021

synesthesiam pushed a commit that referenced this issue Jul 3, 2024

Merge pull request #9 from fedecosta/develop

f82e680

Develop

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pronunciation rules instead of g2p model #9

Pronunciation rules instead of g2p model #9

ctlaltdefeat commented Jul 3, 2021

synesthesiam commented Jul 4, 2021

ctlaltdefeat commented Jul 4, 2021

nitinthewiz commented Aug 26, 2021

synesthesiam commented Aug 30, 2021

nitinthewiz commented Aug 30, 2021

nitinthewiz commented Aug 30, 2021 •

edited

Loading

Pronunciation rules instead of g2p model #9

Pronunciation rules instead of g2p model #9

Comments

ctlaltdefeat commented Jul 3, 2021

synesthesiam commented Jul 4, 2021

ctlaltdefeat commented Jul 4, 2021

nitinthewiz commented Aug 26, 2021

synesthesiam commented Aug 30, 2021

nitinthewiz commented Aug 30, 2021

nitinthewiz commented Aug 30, 2021 • edited Loading

nitinthewiz commented Aug 30, 2021 •

edited

Loading