Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] XTTS v2.0 finetuning - wrong checkpoint links #3148

Closed
rlenain opened this issue Nov 6, 2023 · 4 comments
Closed

[Bug] XTTS v2.0 finetuning - wrong checkpoint links #3148

rlenain opened this issue Nov 6, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@rlenain
Copy link

rlenain commented Nov 6, 2023

Describe the bug

Hi there,

I believe that in the new XTTS v2.0 fine tuning recipe, there needs to be a change to the following lines:

TOKENIZER_FILE_LINK = "https://coqui.gateway.scarf.sh/hf-coqui/XTTS-v1/v2.0/vocab.json"
XTTS_CHECKPOINT_LINK = "https://coqui.gateway.scarf.sh/hf-coqui/XTTS-v1/v2.0/model.pth"

It's impossible to reach these URLs.

Thanks.

To Reproduce

python recipes/ljspeech/xtts_v2/train_gpt_xtts.py

Expected behavior

Training

Logs

/home/raph/repos/TTS/TTS/tts/layers/xtts/trainer/dataset.py:10: UserWarning: Torchaudio's I/O functions now support par-call bakcend dispatch. Importing backend implementation directly is no longer guaranteed to work. Please use `backend` keyword with load/save/info function, instead of calling the udnerlying implementation directly.
  from torchaudio.backend.soundfile_backend import load as torchaudio_soundfile_load
/home/raph/repos/TTS/TTS/tts/layers/xtts/trainer/dataset.py:11: UserWarning: Torchaudio's I/O functions now support par-call bakcend dispatch. Importing backend implementation directly is no longer guaranteed to work. Please use `backend` keyword with load/save/info function, instead of calling the udnerlying implementation directly.
  from torchaudio.backend.sox_io_backend import load as torchaudio_sox_load
/home/raph/miniconda3/envs/TTS/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Traceback (most recent call last):
  File "/home/raph/repos/TTS/recipes/ljspeech/xtts_v2/train_gpt_xtts.py", line 232, in <module>
    main()
  File "/home/raph/repos/TTS/recipes/ljspeech/xtts_v2/train_gpt_xtts.py", line 204, in main
    model = GPTTrainer.init_from_config(config)
  File "/home/raph/repos/TTS/TTS/tts/layers/xtts/trainer/gpt_trainer.py", line 500, in init_from_config
    return GPTTrainer(config)
  File "/home/raph/repos/TTS/TTS/tts/layers/xtts/trainer/gpt_trainer.py", line 79, in __init__
    self.xtts.tokenizer = VoiceBpeTokenizer(self.args.tokenizer_file)
  File "/home/raph/repos/TTS/TTS/tts/layers/xtts/tokenizer.py", line 540, in __init__
    self.tokenizer = Tokenizer.from_file(vocab_file)
Exception: expected value at line 1 column 1
 ~/repos/TTS  main !1 ?3  vim recipes/ljspeech


### Environment

```shell
{
    "CUDA": {
        "GPU": [
            "NVIDIA A100-PCIE-40GB",
            "NVIDIA A100-PCIE-40GB",
            "NVIDIA A100-PCIE-40GB",
            "NVIDIA A100-PCIE-40GB"
        ],
        "available": true,
        "version": "12.1"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.1.0+cu121",
        "TTS": "0.20.0",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.13",
        "version": "#98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023"
    }
}

Additional context

No response

@rlenain rlenain added the bug Something isn't working label Nov 6, 2023
@AWAS666
Copy link
Contributor

AWAS666 commented Nov 6, 2023

The most up to date links seem to be in models.json.

"https://coqui.gateway.scarf.sh/hf-coqui/XTTS-v2/main/model.pth",
"https://coqui.gateway.scarf.sh/hf-coqui/XTTS-v2/main/config.json",
"https://coqui.gateway.scarf.sh/hf-coqui/XTTS-v2/main/vocab.json",
"https://coqui.gateway.scarf.sh/hf-coqui/XTTS-v2/main/hash.md5"

as a quick fix, I'll create a PR for it.

@Edresson
Copy link
Contributor

Edresson commented Nov 6, 2023

It was fixed on #3149 . However, currently, XTTS v2.0 fine-tuning is not supported. It uses a new DVAE that is not implemented. We are working to fix this issue soon as possible.

@Edresson
Copy link
Contributor

Edresson commented Nov 7, 2023

The PR #3154 fixed this issue.

@Edresson Edresson closed this as completed Nov 7, 2023
@Yaodada12
Copy link

It was fixed on #3149 . However, currently, XTTS v2.0 fine-tuning is not supported. It uses a new DVAE that is not implemented. We are working to fix this issue soon as possible.

Can xtts v2.0 fine-tuning on a character's audio like RVC to achieve better performance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants