-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[G2pPack + Phonetic Assistant] Give same phonetic result for uppercase and lowercase graphemes #1209
base: master
Are you sure you want to change the base?
Conversation
It's not always correct to do this. Acronyms like CIA should be pronounced differently. |
It works like this in the classic phonemizers as well, so I wanted it to work the same across the board. Perhaps I could ignore all-caps instances though. |
That should be a decision per phonemizer. If it's a Japanese one that all ka, KA, Ka should be treated the same, sure. For English uppercase and lowercase shouldn't be treated the same. |
Currently, in the Phonetic Assistant as well as the DiffSinger G2P phonemizers, uppercase graphemes get a different phonetic result when compared to lowercase graphemes. This is inconvenient since the end user may sometimes capitalize words, and sometimes not. If the end user wants to use a different pronunciation, they can use number suffixes, e.g.
the(1)
.In theory, this issue could affect any G2P-powered function (such as phonemizers), but in practice it currently only affects the Phonetic Assistant as well as the DiffSinger G2P phonemizers.
What this PR does NOT do
SP
andAP
(this has been tested). If they are defined in the dictionary, or the dictionary contains no graphemes, they will work normally. (Note that they have to be defined in their uppercase form in the dsdict if there are any conflicting graphemes (e.g. lowercasesp
and/orap
) ; however, this is currently the case as well).KA
vs.ka
) will not be affected either, so you can still distinguish by capitalization manually if so desired.