Fine-tuning an English checkpoint (ckpt) using a Chinese speech dataset #613

JeffTSaoO · 2024-09-25T07:17:01Z

Hi,
I have a 85 hr of Chinese audio voice at 44100 hz to fintuning en-us/lessac/medium .ckpt, but effect not good.
And my loss_gen_all looks so high, loss_disc_all looks normal.

Questions:

Sample Rate Conversion: Is it advisable to convert the sample rate from 44,100 Hz to 22,050 Hz before fine-tuning? Could this conversion be contributing to the high loss_gen_all?
Language Adaptation: Since I am fine-tuning an English model with Chinese data, are there specific configurations or adjustments you recommend to improve performance?
Model Compatibility: Are there any known issues or limitations when fine-tuning the en-us/lessac/medium.ckpt model with a non-English dataset?

Any guidance or suggestions you could provide would be greatly appreciated.

Thank you for your time and assistance.

Kracozebr · 2024-11-14T13:26:24Z

What config are you using? May be you are using English phoneme in stand of Chinese?

xiasi0 · 2024-11-24T15:56:14Z

What was the result?
训练的声音如何？
我这里死活依赖构建不对，要不就是依赖版本高，要不就是某些高，某些低。各种错误。
@JeffTSaoO

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tuning an English checkpoint (ckpt) using a Chinese speech dataset #613

Fine-tuning an English checkpoint (ckpt) using a Chinese speech dataset #613

JeffTSaoO commented Sep 25, 2024

Kracozebr commented Nov 14, 2024

xiasi0 commented Nov 24, 2024

Fine-tuning an English checkpoint (ckpt) using a Chinese speech dataset #613

Fine-tuning an English checkpoint (ckpt) using a Chinese speech dataset #613

Comments

JeffTSaoO commented Sep 25, 2024

Kracozebr commented Nov 14, 2024

xiasi0 commented Nov 24, 2024