-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed Up Numbers #9
Comments
Thank you for your suggestion. Actually, I don't have a desktop GPU, so the GPU performance information is based on discussions in the issue tracker. Our team focuses on Android devices, and this repository aims to deploy F5 on Android using ONNX Runtime. However, we found that F5's computation is too heavy to achieve real-time responses, even with Qualcomm NPU acceleration. As a result, we have only released the model export method. |
@DakeQQ I have a 4090 and am very keen to see if we can get a speedup of F5 for realtime chat on GPU. I've had some trouble getting inference working correctly on my end - i think due to my own error. I will try again over the next few days and update you. I would be more than happy for you to use me to get some decent speed tests. The current issue with f5 is the 1.5 second lag for first reply using the standard F5 implementation. If we could find a way to reduce that using onnx and perhaps deepspeed similar to the coqui xtts implementation (i know its an entirely different model), then the open source realtime tts community would be very happy indeed. https://github.com/daswer123/xtts-api-server/tree/main/xtts_api_server/RealtimeTTS |
@OrphBean Regarding the current 1.5-second lag in the first reply, your suggestion to leverage ONNX and possibly DeepSpeed is very insightful. While Coqui XTTS is indeed a different model, the inspiration from its approach could provide valuable ideas. We’ll definitely keep exploring ways to optimize this, and having someone like you to help with speed tests is a huge asset. Thanks again for your support and collaboration with the open-source community! |
I am unable to get this running - it keeps reverting to using CPU - I have tried a number of approaches without success. I am avail to run tests if you have any |
If your system is running Windows, you might want to consider using DirectML as it provides the most convenient setup for utilizing GPU resources. To do so, first ensure that you have the latest version of
Then, modify your code to use the DirectML execution provider: ort_session_B = onnxruntime.InferenceSession(onnx_model_B, sess_options=session_opts, providers=['DmlExecutionProvider']) This should help streamline the process of setting up GPU acceleration on a Windows system. |
hello, I am also trying to deploy a real time TTS for voice cloning on qualcomm device(8295). Plans which generate audio codecs autoregressively like maskgct, GPTsovitts can not meet the first reply requirement. can you suggest any other possible plans? |
I recommend checking out FireRedTTS. It delivers a process and performance comparable to F5-TTS while requiring only one-third of the computational resources. In my opinion, it stands out as one of the most promising repositories for achieving real-time, commercially viable voice cloning TTS. However, Qualcomm NPU support is essential for optimal performance. |
Thank u! I will give it a try. have you deployed FireRedTTS successfully on qualcomm chip? Using QNN? |
I'm still waiting for the repository to update with "human-like speech generation." Otherwise, it would take a lot of time to go through the double export process again. |
I found no training code there yet. Doesn't seem the team is eager to release it ... |
Hi! Thanks for exporting models to ONNX.
Could you please list the speedups you receive on using ONNX and ONNX-fp16 models on GPU? It would be helpful to compare our speedups with the original ones.
The text was updated successfully, but these errors were encountered: