Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed Up Numbers #9

Open
nityanandmathur opened this issue Nov 27, 2024 · 10 comments
Open

Speed Up Numbers #9

nityanandmathur opened this issue Nov 27, 2024 · 10 comments

Comments

@nityanandmathur
Copy link

Hi! Thanks for exporting models to ONNX.

Could you please list the speedups you receive on using ONNX and ONNX-fp16 models on GPU? It would be helpful to compare our speedups with the original ones.

@DakeQQ
Copy link
Owner

DakeQQ commented Nov 27, 2024

Thank you for your suggestion. Actually, I don't have a desktop GPU, so the GPU performance information is based on discussions in the issue tracker. Our team focuses on Android devices, and this repository aims to deploy F5 on Android using ONNX Runtime. However, we found that F5's computation is too heavy to achieve real-time responses, even with Qualcomm NPU acceleration. As a result, we have only released the model export method.
It would be great if you could share some speed test information. We would be happy to include it in the README.md and, of course, reference your name : )

@OrphBean
Copy link

OrphBean commented Nov 27, 2024

@DakeQQ I have a 4090 and am very keen to see if we can get a speedup of F5 for realtime chat on GPU. I've had some trouble getting inference working correctly on my end - i think due to my own error. I will try again over the next few days and update you. I would be more than happy for you to use me to get some decent speed tests.

The current issue with f5 is the 1.5 second lag for first reply using the standard F5 implementation. If we could find a way to reduce that using onnx and perhaps deepspeed similar to the coqui xtts implementation (i know its an entirely different model), then the open source realtime tts community would be very happy indeed.

https://github.com/daswer123/xtts-api-server/tree/main/xtts_api_server/RealtimeTTS

@DakeQQ
Copy link
Owner

DakeQQ commented Nov 28, 2024

@OrphBean
Thank you for sharing your thoughts and testing the potential of F5 with your 4090 GPU. It's great to see your enthusiasm for improving the speed and performance of real-time chat. Don't worry too much about the initial issues you faced—it happens to all of us when exploring new implementations. Your willingness to retry and share updates is truly appreciated.

Regarding the current 1.5-second lag in the first reply, your suggestion to leverage ONNX and possibly DeepSpeed is very insightful. While Coqui XTTS is indeed a different model, the inspiration from its approach could provide valuable ideas. We’ll definitely keep exploring ways to optimize this, and having someone like you to help with speed tests is a huge asset. Thanks again for your support and collaboration with the open-source community!

@OrphBean
Copy link

I am unable to get this running - it keeps reverting to using CPU - I have tried a number of approaches without success. I am avail to run tests if you have any

@DakeQQ
Copy link
Owner

DakeQQ commented Nov 29, 2024

If your system is running Windows, you might want to consider using DirectML as it provides the most convenient setup for utilizing GPU resources. To do so, first ensure that you have the latest version of onnxruntime-directml installed by running:

pip install onnxruntime-directml --upgrade

Then, modify your code to use the DirectML execution provider:

ort_session_B = onnxruntime.InferenceSession(onnx_model_B, sess_options=session_opts, providers=['DmlExecutionProvider'])

This should help streamline the process of setting up GPU acceleration on a Windows system.

@sheepHavingPurpleLeaf
Copy link

Thank you for your suggestion. Actually, I don't have a desktop GPU, so the GPU performance information is based on discussions in the issue tracker. Our team focuses on Android devices, and this repository aims to deploy F5 on Android using ONNX Runtime. However, we found that F5's computation is too heavy to achieve real-time responses, even with Qualcomm NPU acceleration. As a result, we have only released the model export method. It would be great if you could share some speed test information. We would be happy to include it in the README.md and, of course, reference your name : )

hello, I am also trying to deploy a real time TTS for voice cloning on qualcomm device(8295). Plans which generate audio codecs autoregressively like maskgct, GPTsovitts can not meet the first reply requirement. can you suggest any other possible plans?

@DakeQQ
Copy link
Owner

DakeQQ commented Nov 29, 2024

I recommend checking out FireRedTTS. It delivers a process and performance comparable to F5-TTS while requiring only one-third of the computational resources. In my opinion, it stands out as one of the most promising repositories for achieving real-time, commercially viable voice cloning TTS. However, Qualcomm NPU support is essential for optimal performance.

@sheepHavingPurpleLeaf
Copy link

I recommend checking out FireRedTTS. It delivers a process and performance comparable to F5-TTS while requiring only one-third of the computational resources. In my opinion, it stands out as one of the most promising repositories for achieving real-time, commercially viable voice cloning TTS. However, Qualcomm NPU support is essential for optimal performance.

Thank u! I will give it a try. have you deployed FireRedTTS successfully on qualcomm chip? Using QNN?

@DakeQQ
Copy link
Owner

DakeQQ commented Nov 29, 2024

I'm still waiting for the repository to update with "human-like speech generation." Otherwise, it would take a lot of time to go through the double export process again.

@lumpidu
Copy link

lumpidu commented Dec 21, 2024

I recommend checking out FireRedTTS. It delivers a process and performance comparable to F5-TTS while requiring only one-third of the computational resources. In my opinion, it stands out as one of the most promising repositories for achieving real-time, commercially viable voice cloning TTS. However, Qualcomm NPU support is essential for optimal performance.

I found no training code there yet. Doesn't seem the team is eager to release it ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants