-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
May i ask when you will upload the training code and pre trained model approximately? #2
Comments
hello janne, |
Hi @Janne-byti, Thank you very much for the interest! First, I have fixed the issue with the corrupted pretrained model (which was supposed to be always available and about 5 MB). Second, our training code is heavily under-optimized at the moment. It preprocesses and saves packets in an offline mode rather than loading and chunking audio files at training time. This is mainly due to the fact that PARCnet was developed with an early implementation of the linear predictor which was so slow that consituted a bottleneck during training. We are thinking to refactor the training code before releasing it. At the same time, we are also considering to release it as-is, and update it at a later date. Either way, some version of the training code will soon be available. As of now, we have not tried PARCnet on a speech dataset. PARCnet's linear branch performs LPC so it might prove to be a good fit for concealing speech packets too. In our paper, we tested the real-time ratio of all the models on a laptop CPU, including LPCNet and TFGAN. See the last column of Table 2. Although we re-trained the PLC version of LPCNet using the Python/Keras implementation, we tested using the efficient implementation written in C that can be found on the Moreover, we report the CPU time of PARCnet considering the neural network forward pass only. We did not implement the parallelization of the two branches yet. In our code, linear and nonlinear predictions take place sequentially. The actual RTR is thus expected to be higher than what reported. However, with a little bit of work, the two predictors could be run in parallel, so that the total execution time is dominated by the slowest model, i.e., the neural network. Our neural network will be faster than TFGAN (not sure about x2 faster, but still...) After all, our neural network is a stripped-down version of TFGAN with nearly a quarter of the parameters. |
Your work is excellent.
I have read your paper and it mentioned that the algorithm's performance exceeds all baseline models, and the number of model parameters is very small.
I would like to ask if you have tested the algorithm's performance on a voice dataset other than music?
Have you also tested the real-time running efficiency of algorithms on the computer? I have measured that the LPCNet algorithm has an RTF of 3 and cannot run in real-time. Additionally, the local RTF of the TFGAN algorithm is around 0.6 (which is consistent with the data provided in the author's paper).
According to the paper, should your algorithm have a real-time rate that is half of that of the TFGAN algorithm?
The text was updated successfully, but these errors were encountered: