Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Voice Conversion to File - Input and parameter tensors are not at the same device, found input tensor at cpu and parameter tensor at cuda:0 #101

Open
andrea-mucci opened this issue Oct 9, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@andrea-mucci
Copy link

andrea-mucci commented Oct 9, 2024

Describe the bug

i have the following code:

# I generate the target audio file with cloning
path = self.model.tts_to_file(text=text, speaker_wav=speaker_wav, language=language,
                                      file_path=f"/tmp/output_{output_random}.wav")
# to increase the cloning quality i force the generated audio to be converted with the original speaker audio, i don't know if this 
# have sense or not, but the bug exist :-)
self.conversion.voice_conversion_to_file(path, speaker_wav, file_path=new_output_path)

To Reproduce

  1. Generate an audio with cloning
  2. take the generate audio and use the conversion method with the source audio

Expected behavior

No response

Logs

Traceback (most recent call last):
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/cog/server/worker.py", line 349, in _predict
result = predict(**payload)
^^^^^^^^^^^^^^^^^^
File "/src/predict.py", line 55, in predict
self.conversion.voice_conversion_to_file(path, speaker_wav, file_path=new_output_path)
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/TTS/api.py", line 377, in voice_conversion_to_file
wav = self.voice_conversion(source_wav=source_wav, target_wav=target_wav)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/TTS/api.py", line 358, in voice_conversion
wav = self.voice_converter.voice_conversion(source_wav=source_wav, target_wav=target_wav)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/TTS/utils/synthesizer.py", line 257, in voice_conversion
output_wav = self.vc_model.voice_conversion(source_wav, target_wav)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/TTS/vc/models/freevc.py", line 527, in voice_conversion
g_tgt = self.enc_spk_ex.embed_utterance(wav_tgt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/TTS/vc/modules/freevc/speaker_encoder/speaker_encoder.py", line 163, in embed_utterance
partial_embeds = self(mels).cpu().numpy()
^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/TTS/vc/modules/freevc/speaker_encoder/speaker_encoder.py", line 68, in forward
_, (hidden, _) = self.lstm(mels)
^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/torch/nn/modules/rnn.py", line 917, in forward
result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Input and parameter tensors are not at the same device, found input tensor at cpu and parameter tensor at cuda:0

Environment

- coqui-ai-TTS latest version
- Linux
- CUDA 12.4.1
- Python 3.12

Additional context

No response

@andrea-mucci andrea-mucci added the bug Something isn't working label Oct 9, 2024
@eginhard
Copy link
Member

eginhard commented Oct 9, 2024

Please share the full code, so that it's possible to reproduce. But what you're trying to do probably is not very useful, the FreeVC model isn't very good for a lot of applications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants