Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Japanese monophone G2P (tailored to AI voicebanks/phonemizers) + add support to Diffsinger Japanese Phonemizer #1147

Merged
merged 5 commits into from
Jun 9, 2024

Conversation

lottev1991
Copy link
Contributor

@lottev1991 lottev1991 commented May 18, 2024

There has been demand for some sort of Japanese G2P implementation, so I decided to add one, specifically made for machine-learning phonemizers (it'll be basically useless for UTAU voicebanks/phonemizers, though I guess I won't stop anyone from messing around with that). This PR also adds G2P support to the DiffSinger Japanese Phonemizer (note that the old dicts will still work just fine; they'll just overwrite the G2P). This will make writing new dictionaries much easier. I was also partially inspired to do this by the recently added Korean G2P.

Note that the dictionaries were made for the standard Japanese dialect (e.g. no distinction between じ/ぢ and ず/づ), so you'll still need to override that in your custom dict.

Functionality

  • 4 base dicts (hiragana, katakana, romaji, "specials" (incl. pause and breath and accompanying kanji/lyrics)). By default, the katakana dict is for museion (devoiced vowels, which are capitalized (except for N which is always voiced)) but with phoneme replacements this can be easily changed if a voicebank does not support it;
  • Faux-diphthong support (e.g. "あい" , "あん" on one note). I can't guarantee that the dictionaries support all of the possible ones, but I tried my best; I'm willing to look it over again in the future based on feedback;
  • Aforementioned in-built support for pauses and breaths (e.g. "R" and "-" are SP, "息" and "吸" are AP). Exhales are currently not supported though I'm willing to add it if there's demand;
  • Some extended phoneme support (e.g. kw, gw, ng, ngy).

What it does not do

  • AI-based word predictions (in other words, there's no g2p.onnx file). I did not consider this necessary due to the relative simplicity of the Japanese language;
  • Multi-syllable support (save for faux-diphthongs in theory, although this function was ironically created for use on single syllables). I don't consider this is convenient for Japanese vocal synthesis, especially with how OpenUtau handles multi-syllabic words;
  • Convert kana to romaji and vice versa; the only conversion is on a phonemic basis, hence "monophone G2P" to avoid confusion. This G2P is therefore not meant as a substitute for WanaKana.

@stakira stakira merged commit a265891 into stakira:master Jun 9, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants