Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

May i ask when you will upload the training code and pre trained model approximately? #2

Open
Janne-byti opened this issue Dec 28, 2023 · 2 comments

Comments

@Janne-byti
Copy link

Your work is excellent.
I have read your paper and it mentioned that the algorithm's performance exceeds all baseline models, and the number of model parameters is very small.
I would like to ask if you have tested the algorithm's performance on a voice dataset other than music?
Have you also tested the real-time running efficiency of algorithms on the computer? I have measured that the LPCNet algorithm has an RTF of 3 and cannot run in real-time. Additionally, the local RTF of the TFGAN algorithm is around 0.6 (which is consistent with the data provided in the author's paper).
According to the paper, should your algorithm have a real-time rate that is half of that of the TFGAN algorithm?

@EMMALIKECODE123
Copy link

Your work is excellent. I have read your paper and it mentioned that the algorithm's performance exceeds all baseline models, and the number of model parameters is very small. I would like to ask if you have tested the algorithm's performance on a voice dataset other than music? Have you also tested the real-time running efficiency of algorithms on the computer? I have measured that the LPCNet algorithm has an RTF of 3 and cannot run in real-time. Additionally, the local RTF of the TFGAN algorithm is around 0.6 (which is consistent with the data provided in the author's paper). According to the paper, should your algorithm have a real-time rate that is half of that of the TFGAN algorithm?

hello janne,
I am working on a PLC for voice too, and I want to run other person's pre-model first, I saw you have run LPCNET and TFGAN model, so can you give me a link of LPCNET and TFGAN, Thank you so much。
finally, i have found the FARCNET pre-model which can run with command
git checkout a454733
hope this can help you.

@ilic-mezza
Copy link
Collaborator

ilic-mezza commented Jan 3, 2024

Hi @Janne-byti,

Thank you very much for the interest!

First, I have fixed the issue with the corrupted pretrained model (which was supposed to be always available and about 5 MB).

Second, our training code is heavily under-optimized at the moment. It preprocesses and saves packets in an offline mode rather than loading and chunking audio files at training time. This is mainly due to the fact that PARCnet was developed with an early implementation of the linear predictor which was so slow that consituted a bottleneck during training.

We are thinking to refactor the training code before releasing it. At the same time, we are also considering to release it as-is, and update it at a later date. Either way, some version of the training code will soon be available.

As of now, we have not tried PARCnet on a speech dataset. PARCnet's linear branch performs LPC so it might prove to be a good fit for concealing speech packets too.

In our paper, we tested the real-time ratio of all the models on a laptop CPU, including LPCNet and TFGAN. See the last column of Table 2.

Although we re-trained the PLC version of LPCNet using the Python/Keras implementation, we tested using the efficient implementation written in C that can be found on the plc_challenge branch of the LPCNet repo (https://github.com/xiph/LPCNet/tree/plc_challenge). This allows LPCNet to run in real-time on our machine (AMD Ryzen 5900HS).

Moreover, we report the CPU time of PARCnet considering the neural network forward pass only. We did not implement the parallelization of the two branches yet. In our code, linear and nonlinear predictions take place sequentially. The actual RTR is thus expected to be higher than what reported. However, with a little bit of work, the two predictors could be run in parallel, so that the total execution time is dominated by the slowest model, i.e., the neural network.

Our neural network will be faster than TFGAN (not sure about x2 faster, but still...) After all, our neural network is a stripped-down version of TFGAN with nearly a quarter of the parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants