Add Japanese monophone G2P (tailored to AI voicebanks/phonemizers) + add support to Diffsinger Japanese Phonemizer #1147

lottev1991 · 2024-05-18T21:54:09Z

There has been demand for some sort of Japanese G2P implementation, so I decided to add one, specifically made for machine-learning phonemizers (it'll be basically useless for UTAU voicebanks/phonemizers, though I guess I won't stop anyone from messing around with that). This PR also adds G2P support to the DiffSinger Japanese Phonemizer (note that the old dicts will still work just fine; they'll just overwrite the G2P). This will make writing new dictionaries much easier. I was also partially inspired to do this by the recently added Korean G2P.

Note that the dictionaries were made for the standard Japanese dialect (e.g. no distinction between じ/ぢ and ず/づ), so you'll still need to override that in your custom dict.

Functionality

4 base dicts (hiragana, katakana, romaji, "specials" (incl. pause and breath and accompanying kanji/lyrics)). By default, the katakana dict is for museion (devoiced vowels, which are capitalized (except for N which is always voiced)) but with phoneme replacements this can be easily changed if a voicebank does not support it;
Faux-diphthong support (e.g. "あい" , "あん" on one note). I can't guarantee that the dictionaries support all of the possible ones, but I tried my best; I'm willing to look it over again in the future based on feedback;
Aforementioned in-built support for pauses and breaths (e.g. "R" and "-" are SP, "息" and "吸" are AP). Exhales are currently not supported though I'm willing to add it if there's demand;
Some extended phoneme support (e.g. kw, gw, ng, ngy).

What it does not do

AI-based word predictions (in other words, there's no g2p.onnx file). I did not consider this necessary due to the relative simplicity of the Japanese language;
Multi-syllable support (save for faux-diphthongs in theory, although this function was ironically created for use on single syllables). I don't consider this is convenient for Japanese vocal synthesis, especially with how OpenUtau handles multi-syllabic words;
Convert kana to romaji and vice versa; the only conversion is on a phonemic basis, hence "monophone G2P" to avoid confusion. This G2P is therefore not meant as a substitute for WanaKana.

…mizer

…reason)

lottev1991 added 5 commits May 18, 2024 23:26

Add Japanese monophone G2P + add support to Diffsinger Japanese Phone…

2cc6810

…mizer

Add workaround for dash notes (the G2P doesn't recognize it for some …

6a7d0ad

…reason)

Add some things I forgot

28ca7f3

Add some more stuff I forgot

1c30ebc

Add yet even more stuff I forgot

3c63ca7

stakira merged commit a265891 into stakira:master Jun 9, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Japanese monophone G2P (tailored to AI voicebanks/phonemizers) + add support to Diffsinger Japanese Phonemizer #1147

Add Japanese monophone G2P (tailored to AI voicebanks/phonemizers) + add support to Diffsinger Japanese Phonemizer #1147

lottev1991 commented May 18, 2024 •

edited

Loading

Add Japanese monophone G2P (tailored to AI voicebanks/phonemizers) + add support to Diffsinger Japanese Phonemizer #1147

Add Japanese monophone G2P (tailored to AI voicebanks/phonemizers) + add support to Diffsinger Japanese Phonemizer #1147

Conversation

lottev1991 commented May 18, 2024 • edited Loading

Functionality

What it does not do

lottev1991 commented May 18, 2024 •

edited

Loading