-
Notifications
You must be signed in to change notification settings - Fork 28
Phonemizer
The following applies to the phonemizers based on the DiffSinger variance phoneme duration model, including:
- DIFFS Located in the General category
- DIFFS ZH Mandarin Chinese
- DIFFS ZH-YUE Cantonese
- DIFFS JA Japanese
- DIFFS EN English
- DIFFS ES Spanish
- DIFFS IT Italian
- DIFFS KO Korean
- DIFFS PT Portuguese
- DIFFS RU Russian
Users can input lyrics in the following ways:
- Input the lyrics directly. By default, the pronunciation of the lyrics in the dictionary is used. If the lyrics are not in the dictionary, the phonemizer will try to interpret the lyrics as a sequence of phonemes. If the lyrics are wrong, the phonemizer will show error.
-
Lyrics [phoneme sequence]
. Where phoneme sequence should be separated by space. The phoneme sequence takes precedence. For exampleread[r iy d]
.
-
[Phoneme sequence]
. Separated by space. For example,[r iy d]
DiffSinger Variance phonemizers and renderers are independent of each other. Theoretically, any UTAU, NNSVS/ENUNU or DiffSinger voicebank developer can make their voicebank support DiffSinger Variance phonemizers. In addition, DiffSinger voicebanks can also use the other phonemizers , which won't affect audio rendering, auto-pitch and the other diffsinger features.
To support DiffSinger Variance phonemizers, you need to create a folder named dsdur
inside your voicebank that contains the following files:
- dsconfig.yaml
- phonemes.txt (list of phonemes)
- linguistic.onnx (linguistic encoder model)
- dur.onnx (phoneme duration model)
- Dictionary files
phonemes: phonemes.txt # phoneme list
linguistic: linguistic.onnx # language encoder models
dur: dur.onnx # phoneme duration model
hop_size: 512
sample_rate: 44100
predict_dur: true # predict_dur during training
OpenUtau yaml dictionary includes word to phoneme dictionary and type of each phoneme. The reason for using this dictionary format is that OpenUtau needs information about whether each phoneme is a vowel or a consonant to split syllables in order to support multisyllabic languages.
Use dict-to-opu.py to convert your diffsinger dictionary to openutau yaml dictionary. This converter will guess the type of each phoneme, but the result is only good for monosyllabic languages like Mandarin Chinese and Japanese. For the other languages, the entries
part is correct, but you should manually check the type of each phoneme in the symbols
part.
The dictionary format is as follows:
# Symbols part: The type of each phoneme. It needs to include all the phonemes supported by the sound source.
# type can be vowel, stop, affricate, aspirate, liquid, nasal, fricative, semivowel.
# OpenUTAU only cares whether a phoneme is vowel or semivowel. In all other cases, the phoneme is treated as a consonant.
symbols:
- symbol: SP
type: vowel
- symbol: AP
type: vowel
- symbol: a
type: vowel
- symbol: h
type: fricative
# entries part: dictionary from words to phonemes.
- grapheme: SP
phonemes: [SP]
- grapheme: AP
phonemes: [AP]
- grapheme: a
phones: [a]
- grapheme: ha
phones: [h, a]
Phonemer | Dictionary file name |
---|---|
DIFFS | dsdict.yaml |
DIFFS ZH | dsdict-zh.yaml |
DIFFS ZH-YEU | dsdict-zh-yue.yaml |
DIFFS JA | dsdict-ja.yaml |
DIFFS EN | dsdict-en.yaml |
DIFFS ES | dsdict-es.yaml |
DIFFS IT | dsdict-it.yaml |
DIFFS KO | dsdict-ko.yaml |
DIFFS PT | dsdict-pt.yaml |
DIFFS RU | dsdict-ru.yaml |
Some phonemizers are based on OpenUtau's built-in G2P module. When writing dictionaries for these phonemes, you can define in the replacements
section how to convert the phoneme sequences output by OpenUtau G2P into phonemes supported by the voicebank. The format of this section is as follows:
#symbols, entries are the same as above. The symbols part still uses the phoneme set used by the voicebank.
#replacements part: The mapping of the phonemes output by OpenUtau G2P to the phoneme set used by the voicebank.
replacements:
- {from: b, to: B2}
- {from: ch, to: CH2}
- {from: d, to: D2}
- {from: f, to: F2}
- {from: g, to: G2}
- {from: k, to: K2}
- {from: l, to: L2}
- {from: m, to: M2}
- {from: n, to: N2}
- {from: p, to: P2}
- {from: r, to: R2}
- {from: s, to: S2}
- {from: sh, to: SH2}
- {from: t, to: T2}
- {from: v, to: V2}
- {from: w, to: W2}
- {from: y, to: Y2}
- {from: z, to: Z2}
- {from: zh, to: ZH2}
- {from: z, to: Z2}
The symbols part of the dictionary takes precedence over G2P. If a word exists in a dictionary, the pronunciation defined by the dictionary is used instead of the pronunciation given by G2P.
The following are the phoneme sets used by each G2P
DIFFS EN (English)
- vowels:
aa, ae, ah, ao, aw, ay, eh, er, ey, ih, iy, ow, oy, uh, uw
- consonants:
b, ch, d, dh, f, g, hh, jh, k, l, m, n, ng, p, r, s, sh, t, th, v, w, y, z, zh
DIFFS ES (Spanish)
- vowels:
a, e, i, o, u
- consonants:
b, B, ch, d, D, f, g, G, gn, I, k, l, ll, m, n, p, r, rr, s, t, U, w, x, y, Y, z
DIFFS IT (Italian)
- vowels:
a, a1, e, e1, EE, i, i1, o, o1, OO, u, u1
- consonants:
b, d, dz, dZZ, f, g, JJ, k, l, LL, m, n, nf, ng, p, r, rr, s, SS, t, ts, tSS, v, w, y, z
DIFFS PT (Portuguese)
- vowels:
a, a~, e, e~, E, i, i~, o, o~, O, u, u~
- consonants:
b, d, dZ, f, g, j, j~, J, k, l, L, m, n, p, r, R, s, S, t, tS, v, w, w~, X, z, Z
DIFFS RU (Russian)
- vowels:
a, aa, ay, ee, i, ii, ja, je, jo, ju, oo, u, uj, uu, y, yy
- consonants:
b, bb, c, ch, d, dd, f, ff, g, gg, h, hh, j, k, kk, l, ll, m, mm, n, nn, p, pp, r, rr, s, sch, sh, ss, t, tt, v, vv, z, zh, zz