You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found that the audio quality improves considerably with a slightly increased ResNet as suggested in https://arxiv.org/pdf/2005.05106.pdf. The shaky and metallic artefacts are reduced a lot.
Hi, great repo!
I found that the audio quality improves considerably with a slightly increased ResNet as suggested in https://arxiv.org/pdf/2005.05106.pdf. The shaky and metallic artefacts are reduced a lot.
Here is a comparison of your pretrained LJSpeech with a current model I am still training (for TTS I used https://github.com/as-ideas/ForwardTacotron)
Original (6400 epochs):
https://drive.google.com/file/d/1LOIB9B7LDX9g-kVu_p1anGJgJ5vjE27s/view?usp=sharing
Larger ResNet (2000 epochs):
https://drive.google.com/file/d/19_d2SQU1xZi-o90MJ8NcKhIS6AFwliH-/view?usp=sharing
If you are interested I could open a PR making the layers more flexible.
The text was updated successfully, but these errors were encountered: