Speed Up Numbers #9

nityanandmathur · 2024-11-27T14:06:35Z

Hi! Thanks for exporting models to ONNX.

Could you please list the speedups you receive on using ONNX and ONNX-fp16 models on GPU? It would be helpful to compare our speedups with the original ones.

DakeQQ · 2024-11-27T14:35:46Z

Thank you for your suggestion. Actually, I don't have a desktop GPU, so the GPU performance information is based on discussions in the issue tracker. Our team focuses on Android devices, and this repository aims to deploy F5 on Android using ONNX Runtime. However, we found that F5's computation is too heavy to achieve real-time responses, even with Qualcomm NPU acceleration. As a result, we have only released the model export method.
It would be great if you could share some speed test information. We would be happy to include it in the README.md and, of course, reference your name : )

OrphBean · 2024-11-27T18:14:08Z

@DakeQQ I have a 4090 and am very keen to see if we can get a speedup of F5 for realtime chat on GPU. I've had some trouble getting inference working correctly on my end - i think due to my own error. I will try again over the next few days and update you. I would be more than happy for you to use me to get some decent speed tests.

The current issue with f5 is the 1.5 second lag for first reply using the standard F5 implementation. If we could find a way to reduce that using onnx and perhaps deepspeed similar to the coqui xtts implementation (i know its an entirely different model), then the open source realtime tts community would be very happy indeed.

https://github.com/daswer123/xtts-api-server/tree/main/xtts_api_server/RealtimeTTS

DakeQQ · 2024-11-28T01:07:08Z

@OrphBean
Thank you for sharing your thoughts and testing the potential of F5 with your 4090 GPU. It's great to see your enthusiasm for improving the speed and performance of real-time chat. Don't worry too much about the initial issues you faced—it happens to all of us when exploring new implementations. Your willingness to retry and share updates is truly appreciated.

Regarding the current 1.5-second lag in the first reply, your suggestion to leverage ONNX and possibly DeepSpeed is very insightful. While Coqui XTTS is indeed a different model, the inspiration from its approach could provide valuable ideas. We’ll definitely keep exploring ways to optimize this, and having someone like you to help with speed tests is a huge asset. Thanks again for your support and collaboration with the open-source community!

OrphBean · 2024-11-28T17:17:03Z

I am unable to get this running - it keeps reverting to using CPU - I have tried a number of approaches without success. I am avail to run tests if you have any

DakeQQ · 2024-11-29T01:39:05Z

If your system is running Windows, you might want to consider using DirectML as it provides the most convenient setup for utilizing GPU resources. To do so, first ensure that you have the latest version of onnxruntime-directml installed by running:

pip install onnxruntime-directml --upgrade

Then, modify your code to use the DirectML execution provider:

ort_session_B = onnxruntime.InferenceSession(onnx_model_B, sess_options=session_opts, providers=['DmlExecutionProvider'])

This should help streamline the process of setting up GPU acceleration on a Windows system.

sheepHavingPurpleLeaf · 2024-11-29T07:31:54Z

Thank you for your suggestion. Actually, I don't have a desktop GPU, so the GPU performance information is based on discussions in the issue tracker. Our team focuses on Android devices, and this repository aims to deploy F5 on Android using ONNX Runtime. However, we found that F5's computation is too heavy to achieve real-time responses, even with Qualcomm NPU acceleration. As a result, we have only released the model export method. It would be great if you could share some speed test information. We would be happy to include it in the README.md and, of course, reference your name : )

hello, I am also trying to deploy a real time TTS for voice cloning on qualcomm device(8295). Plans which generate audio codecs autoregressively like maskgct, GPTsovitts can not meet the first reply requirement. can you suggest any other possible plans?

DakeQQ · 2024-11-29T07:45:12Z

I recommend checking out FireRedTTS. It delivers a process and performance comparable to F5-TTS while requiring only one-third of the computational resources. In my opinion, it stands out as one of the most promising repositories for achieving real-time, commercially viable voice cloning TTS. However, Qualcomm NPU support is essential for optimal performance.

sheepHavingPurpleLeaf · 2024-11-29T09:11:46Z

I recommend checking out FireRedTTS. It delivers a process and performance comparable to F5-TTS while requiring only one-third of the computational resources. In my opinion, it stands out as one of the most promising repositories for achieving real-time, commercially viable voice cloning TTS. However, Qualcomm NPU support is essential for optimal performance.

Thank u! I will give it a try. have you deployed FireRedTTS successfully on qualcomm chip? Using QNN？

DakeQQ · 2024-11-29T09:20:03Z

I'm still waiting for the repository to update with "human-like speech generation." Otherwise, it would take a lot of time to go through the double export process again.

lumpidu · 2024-12-21T22:56:44Z

I recommend checking out FireRedTTS. It delivers a process and performance comparable to F5-TTS while requiring only one-third of the computational resources. In my opinion, it stands out as one of the most promising repositories for achieving real-time, commercially viable voice cloning TTS. However, Qualcomm NPU support is essential for optimal performance.

I found no training code there yet. Doesn't seem the team is eager to release it ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed Up Numbers #9

Speed Up Numbers #9

nityanandmathur commented Nov 27, 2024

DakeQQ commented Nov 27, 2024

OrphBean commented Nov 27, 2024 •

edited

Loading

DakeQQ commented Nov 28, 2024

OrphBean commented Nov 28, 2024

DakeQQ commented Nov 29, 2024

sheepHavingPurpleLeaf commented Nov 29, 2024

DakeQQ commented Nov 29, 2024

sheepHavingPurpleLeaf commented Nov 29, 2024

DakeQQ commented Nov 29, 2024

lumpidu commented Dec 21, 2024

Speed Up Numbers #9

Speed Up Numbers #9

Comments

nityanandmathur commented Nov 27, 2024

DakeQQ commented Nov 27, 2024

OrphBean commented Nov 27, 2024 • edited Loading

DakeQQ commented Nov 28, 2024

OrphBean commented Nov 28, 2024

DakeQQ commented Nov 29, 2024

sheepHavingPurpleLeaf commented Nov 29, 2024

DakeQQ commented Nov 29, 2024

sheepHavingPurpleLeaf commented Nov 29, 2024

DakeQQ commented Nov 29, 2024

lumpidu commented Dec 21, 2024

OrphBean commented Nov 27, 2024 •

edited

Loading