Poor results with Korean language dataset - Is Enhance limited to English? #44

jumoney-git · 2024-08-21T09:41:56Z

Hello,

I'm a beginner trying to use AI, and this is my first attempt at training the Resemble-ai / Enhance model.

I've recently completed training using a dataset of 1 million Korean voice samples. However, when I ran my trained model, the results were extremely disappointing.

This leads me to ask:

Is it possible that Enhance doesn't work well with Korean, despite being trained on Korean data?
Is Enhance designed to work only with English?

I would greatly appreciate any insights or guidance on this matter. If Enhance is indeed language-specific, it would be helpful to know if there are plans to support other languages in the future.

Thank you for your time and assistance.

jaehyun-ko · 2024-08-26T11:35:02Z

Which database did you use for fg/bg/rir respectively?
How does it sound when you only listen to the denoised result?

jumoney-git · 2024-08-27T00:14:00Z

Hello. It's so nice to meet a fellow Korean!
First of all, thank you so much for taking interest in my question.

Which database did you use for fg/bg/rir respectively?
FG = I used https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=data&dataSetSn=208, utilizing 100,000 audio files of voice data with lengths ranging from 1 to 3 seconds.
For RIR, I used https://github.com/RoyJames/room-impulse-responses from the example in the README.md, converting all wav files to npy files.
For BG, I used noise files from Musan.

How does it sound when you only listen to the denoised result?
Finally, at 1,000,000 steps, I think the noise was well removed in about 8 out of 10 samples, excluding 2 samples.

If you have any good tips for creating a Korean-specific model, I would greatly appreciate your help.

[The denoise stage log]

Reading hparams from config/denoiser.yaml
Found 100000 audio files in data/fg
Found 99990 foreground files and 930 background files
Found 10 foreground files and 930 background files
Train set: 99990 samples - Val set: 10 samples
Unable to find latest file at runs/denoiser/ds/G/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.

Training from step 0 to step 1000000

{
"step": 999998,
"G/losses/l1": 0.02123,
"G/loss": 0.02123,
"G/lr": 1e-05,
"G/grad_norm": 0.02765241637825966,
"elapsed_time": 0.3925
}
{
"step": 999999,
"G/losses/l1": 0.03506,
"G/loss": 0.03506,
"G/lr": 1e-05,
"G/grad_norm": 0.014176322147250175,
"elapsed_time": 0.3913
}
{
"step": 1000000,
"G/losses/l1": 0.04795,
"G/loss": 0.04795,
"G/lr": 1e-05,
"G/grad_norm": 0.020992033183574677,
"elapsed_time": 0.3915
}
Saved checkpoint to runs/denoiser/ds/G
Training finished
Saved checkpoint to runs/denoiser/ds/G

jaehyun-ko · 2024-09-12T09:48:41Z

I am also conducting experiments with a similar dataset (48KHz input). It seems necessary to monitor the convergence of Stage 1/2 during training. I have added a feature to log the progress using wandb in the version of the repository I forked. When training with more challenging conditions for LPF/BPF compared to the original, it seems to be learning well.

It may also be necessary to check whether the mean(Z) of LCFM converges to 0 properly.

Add a sample of korean.

jaehyun-ko · 2024-09-12T09:57:16Z

would also like to ask if you have experimented only with the denoiser, or if you have results from other stages as well.

The datasets I am using are OLKAVS(Clean Audio Only), MUSAN(Noise), and BUT(RIR).

jaehyun-ko · 2024-09-13T06:48:27Z

It seems good results produced from Resemble-Enhance Model, maybe some configs are mismatched in your reproduction experiment.
All discriminators should converge Real==Fake(1,1 for value)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor results with Korean language dataset - Is Enhance limited to English? #44

Poor results with Korean language dataset - Is Enhance limited to English? #44

jumoney-git commented Aug 21, 2024

jaehyun-ko commented Aug 26, 2024

jumoney-git commented Aug 27, 2024

jaehyun-ko commented Sep 12, 2024

jaehyun-ko commented Sep 12, 2024

jaehyun-ko commented Sep 13, 2024 •

edited

Loading

Poor results with Korean language dataset - Is Enhance limited to English? #44

Poor results with Korean language dataset - Is Enhance limited to English? #44

Comments

jumoney-git commented Aug 21, 2024

jaehyun-ko commented Aug 26, 2024

jumoney-git commented Aug 27, 2024

jaehyun-ko commented Sep 12, 2024

jaehyun-ko commented Sep 12, 2024

jaehyun-ko commented Sep 13, 2024 • edited Loading

jaehyun-ko commented Sep 13, 2024 •

edited

Loading