Skip to content

Multi‐Speaker Function Description

Gardanana edited this page Mar 11, 2024 · 1 revision

Use

After installing and selecting a source, you can use the voice color (CLR) parameter to quickly assign speakers to each word:

image

If you need to fine-tune the scale of each speaker, click the gear icon in the lower left corner → "Add all expressions suggested by renderers" to add the speaker control curve parameters (CL01, CL02, etc.) to the project. The range of the parameter is 0~100.

image

Specific mechanism: sum of the control curve parameters for each speaker. If the sum is less than 100%, it is complemented with the speaker specified in the CLR. If the sum is greater than 100%, it is scaled down so that the sum equals 100%.

Voicebank Packaging Methods

The following packing method is based on the FemaleTriplet voicebank as an example:

Place all the exported emb files in the audio folder (put together with dsconfig.yaml. The name of emb can be modified, but it must be in English)

image

Then add the following to dsconfig.yaml. Note: All text encoding is UTF-8. Indentation of the yaml file is critical. It is recommended to use VSCode with syntax checking to edit the yaml file:

# hidden_size used for training, default is 256
hidden_size: 256
# English name of the speaker, corresponds to the filename of the emb
speakers:
  - opencpop
  - qixuan
  - xiayezi

Add the following to character.yaml:

# Each speaker is a subbank
# color is the speaker's name, which must start with a two-digit number in the same order as in this file: 01,02,03,04,...
# suffix is the English name of the speaker, corresponding to the .emb file name
# prefix and tone_ranges can be included but are not necessary
subbanks:
- color: "01: Opencpop"
  prefix: ''
  suffix: opencpop
  tone_ranges:
  - C1-B7
- color: "02: 绮萱"
  prefix: ''
  suffix: qixuan
  tone_ranges:
  - C1-B7
- color: "03: 夏叶子"
  prefix: ''
  suffix: xiayezi
  tone_ranges:
  - C1-B7