[Bug] Voice Conversion to File - Input and parameter tensors are not at the same device, found input tensor at cpu and parameter tensor at cuda:0 #101

andrea-mucci · 2024-10-09T19:06:07Z

Describe the bug

i have the following code:

# I generate the target audio file with cloning
path = self.model.tts_to_file(text=text, speaker_wav=speaker_wav, language=language,
                                      file_path=f"/tmp/output_{output_random}.wav")
# to increase the cloning quality i force the generated audio to be converted with the original speaker audio, i don't know if this 
# have sense or not, but the bug exist :-)
self.conversion.voice_conversion_to_file(path, speaker_wav, file_path=new_output_path)

To Reproduce

Generate an audio with cloning
take the generate audio and use the conversion method with the source audio

Expected behavior

No response

Logs

Traceback (most recent call last):
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/cog/server/worker.py", line 349, in _predict
result = predict(**payload)
^^^^^^^^^^^^^^^^^^
File "/src/predict.py", line 55, in predict
self.conversion.voice_conversion_to_file(path, speaker_wav, file_path=new_output_path)
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/TTS/api.py", line 377, in voice_conversion_to_file
wav = self.voice_conversion(source_wav=source_wav, target_wav=target_wav)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/TTS/api.py", line 358, in voice_conversion
wav = self.voice_converter.voice_conversion(source_wav=source_wav, target_wav=target_wav)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/TTS/utils/synthesizer.py", line 257, in voice_conversion
output_wav = self.vc_model.voice_conversion(source_wav, target_wav)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/TTS/vc/models/freevc.py", line 527, in voice_conversion
g_tgt = self.enc_spk_ex.embed_utterance(wav_tgt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/TTS/vc/modules/freevc/speaker_encoder/speaker_encoder.py", line 163, in embed_utterance
partial_embeds = self(mels).cpu().numpy()
^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/TTS/vc/modules/freevc/speaker_encoder/speaker_encoder.py", line 68, in forward
_, (hidden, _) = self.lstm(mels)
^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.12.6/lib/python3.12/site-packages/torch/nn/modules/rnn.py", line 917, in forward
result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Input and parameter tensors are not at the same device, found input tensor at cpu and parameter tensor at cuda:0

Environment

- coqui-ai-TTS latest version
- Linux
- CUDA 12.4.1
- Python 3.12

Additional context

No response

eginhard · 2024-10-09T20:57:38Z

Please share the full code, so that it's possible to reproduce. But what you're trying to do probably is not very useful, the FreeVC model isn't very good for a lot of applications.

andrea-mucci added the bug Something isn't working label Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Voice Conversion to File - Input and parameter tensors are not at the same device, found input tensor at cpu and parameter tensor at cuda:0 #101

[Bug] Voice Conversion to File - Input and parameter tensors are not at the same device, found input tensor at cpu and parameter tensor at cuda:0 #101

andrea-mucci commented Oct 9, 2024 •

edited

Loading

eginhard commented Oct 9, 2024

[Bug] Voice Conversion to File - Input and parameter tensors are not at the same device, found input tensor at cpu and parameter tensor at cuda:0 #101

[Bug] Voice Conversion to File - Input and parameter tensors are not at the same device, found input tensor at cpu and parameter tensor at cuda:0 #101

Comments

andrea-mucci commented Oct 9, 2024 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

eginhard commented Oct 9, 2024

andrea-mucci commented Oct 9, 2024 •

edited

Loading