You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey. I trained the Tacotron 2 synthesizer from Rayhane-mamah, the synthesized spectrograms sound good if you use the griffin limm algorithm. Unfortunately, the vocoder in his repository learns with an error, so I decided to use the r9y9 vocoder. Actually, I have a few questions:
If Tacotron 2 was trained on spectrograms that were obtained from processed audio with preemphasis, is it possible to train r9y9 wavenet on synthesized spectrograms from Tacotron 2? Or do you need to train Tacotron 2 again without preemphasis?
If Tacotron 2 was trained on spectrograms that were obtained from processed audio with mulaw, can they somehow be brought to a raw representation in order to use the checkpoint r9y9 wavenet trained on raw and retrain the network? Or for this, you need to completely retrain Tacotron 2 with other parameters?
How can a vocoder learn to generate audio similar to ground truth if it is trained on gta?
And yet, there was an idea to synthesize a spectrogram with trained Tacotron 2 and bring to it using the griffin limm algorithm, after which you need to prepare the audio in r9y9. How effective will this method be?
Thank!
The text was updated successfully, but these errors were encountered:
Hey. I trained the Tacotron 2 synthesizer from Rayhane-mamah, the synthesized spectrograms sound good if you use the griffin limm algorithm. Unfortunately, the vocoder in his repository learns with an error, so I decided to use the r9y9 vocoder. Actually, I have a few questions:
If Tacotron 2 was trained on spectrograms that were obtained from processed audio with preemphasis, is it possible to train r9y9 wavenet on synthesized spectrograms from Tacotron 2? Or do you need to train Tacotron 2 again without preemphasis?
If Tacotron 2 was trained on spectrograms that were obtained from processed audio with mulaw, can they somehow be brought to a raw representation in order to use the checkpoint r9y9 wavenet trained on raw and retrain the network? Or for this, you need to completely retrain Tacotron 2 with other parameters?
How can a vocoder learn to generate audio similar to ground truth if it is trained on gta?
And yet, there was an idea to synthesize a spectrogram with trained Tacotron 2 and bring to it using the griffin limm algorithm, after which you need to prepare the audio in r9y9. How effective will this method be?
Thank!
The text was updated successfully, but these errors were encountered: