Skip to content

Developing new phonemizers

oxygen-dioxide edited this page May 19, 2023 · 10 revisions

Phonemizer API

The main method to implement is:

public abstract Phoneme[] Process(Note[] notes, Note? prevNeighbour, Note? nextNeighbour);
  • notes: A group of notes. The first note contains the lyric. The rest are extender notes whose lyric starts with +.
  • prevNeighbour and nextNeighbour: Useful info for creating diphones, if applicable. E.g., creating proper leading diphone in VCV.
  • returns: An array of phonemes, positioned relative to the first note.

Tips:

  • To load singer specific resouce, Implement resouce loading in SetSinger() and use singer.Location to look for files.
  • If uses expensive resource, load it lazily when the phonemizer is created the first time. Use your best adjudgement to decide its lifetime.

Phonemizer Development Guidelines

required features

A complete Phonemizer should:

  • Produce phonemes from the lyric, and previous / next notes if exist.
  • Distribute phonemes to positions relative to the first note of each group of notes.
  • (For Classic phonemizers) support multi-pitch and multi-color voicebanks.

Optional features

Considering the characteristics of different languages, the phonemizer doesn't necessarily have to implement all the following features. However, implementing these features can maintain a consistent user experience across various phonemizers.

These features can be quickly implemented by inheriting a phonemizer template, such as SyllableBasedPhonemizer.

polysyllabic word support

For polysyllabic languages ​​such as English, it should be supported to input lyrics on the first syllable, use +~ or +* in the following notes to extend the current syllable, and use + to distribute the next syllable.

image

Phonetic hint

Users can manually enter space-separated phoneme sequences (aka. Phonetic hint) in square brackets, such as read , read[r iy d] and [r iy d]. When both phonetic hint and word exist, the phonetic hint takes precedence.

image

G2p

G2p (Grapheme to phoneme) can convert lyrics in natural languages to phoneme sequences. OpenUTAU has built-in G2ps for multiple languages, implemented using a machine learning model, which can cover most of the words in the language, and can predict the pronunciation of new words that have not been seen before. Using a unified G2p can make the pronunciation of the same lyrics consistent on different phonemizers.

For languages ​​with a large number of words, and words cannot be converted into phoneme sequences through simple logic, such as English, French, and Russian, please use OpenUTAU's built-in G2p.

The following G2ps are included in OpenUTAU:

Custom pronunciation dictionary

On the basis of G2p, considering that some voicebanks have custom phonemes, a phonemizer should support custom dictionaries. Custom dictionaries can be loaded using G2pDictionary.

Here is an example of custom dictionary:

%YAML 1.2
---
symbols:
  - {symbol: aa, type: vowel}
  - {symbol: ae, type: vowel}
  - {symbol: ah, type: vowel}
  - {symbol: ao, type: vowel}
  - {symbol: aw, type: vowel}
  - {symbol: ay, type: vowel}
  - {symbol: b, type: stop}
  - {symbol: ch, type: affricate}
  - {symbol: d, type: stop}
  - {symbol: dh, type: fricative}
  - {symbol: eh, type: vowel}
  - {symbol: er, type: vowel}
  - {symbol: ey, type: vowel}
  - {symbol: f, type: fricative}
  - {symbol: g, type: stop}
  - {symbol: hh, type: aspirate}
  - {symbol: ih, type: vowel}
  - {symbol: iy, type: vowel}
  - {symbol: jh, type: affricate}
  - {symbol: k, type: stop}
  - {symbol: l, type: liquid}
  - {symbol: m, type: nasal}
  - {symbol: n, type: nasal}
  - {symbol: ng, type: nasal}
  - {symbol: ow, type: vowel}
  - {symbol: oy, type: vowel}
  - {symbol: p, type: stop}
  - {symbol: r, type: liquid}
  - {symbol: s, type: fricative}
  - {symbol: sh, type: fricative}
  - {symbol: t, type: stop}
  - {symbol: th, type: fricative}
  - {symbol: uh, type: vowel}
  - {symbol: uw, type: vowel}
  - {symbol: v, type: fricative}
  - {symbol: w, type: semivowel}
  - {symbol: y, type: semivowel}
  - {symbol: z, type: fricative}
  - {symbol: zh, type: fricative}
entries:
  - grapheme: openutau
    phonemes: [ow, p, eh, n, w, uw, t, ah, w, uw]