Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to reproduce on MEAD dataset #46

Open
HarryXD2018 opened this issue Jul 25, 2023 · 3 comments
Open

Fail to reproduce on MEAD dataset #46

HarryXD2018 opened this issue Jul 25, 2023 · 3 comments

Comments

@HarryXD2018
Copy link

Hi, what a nice work!

I am currently attempting to reproduce this work on the MEAD dataset. Stage 1 of the process has gone smoothly, however, I am encountering an issue in Stage 2. After 20 epochs of training, I am not observing any movement in the output, and it remains static.

Do you have any idea?

Many thanks!

@Doubiiu
Copy link
Owner

Doubiiu commented Jul 25, 2023

Hi, did you use some 3D face reconstruction methods to convert MEAD to 3D data? And you mean you have visually checked the results of stage1 (reconstruction) and it was well-done. Based on my early attempts on VOCASET and BIWI, It is harder to train stage2 as it depends on the results of stage1 and hypter-parameters/network architecture (e.g. transformer layers, no. of heads, etc.) of stage2 model, you may need to modify them if possible (Not easy to make VQ stuff work as expectation). Hope you can make it work as soon as possible~

@HarryXD2018
Copy link
Author

Thanks for the quick reply! 😄

Yes, I reconstructed the MEAD dataset with EMOCA, and I use the FLAME parameters to represent face shapes instead of vertices to reduce the data volume. Concretely, I visualized the stage 1 results, and slight jitters were observed, is that ok? I early stopped the training because it really took time.

If okay, I will mainly focus on the architecture engineering to make it work.

@Doubiiu
Copy link
Owner

Doubiiu commented Jul 25, 2023

I got it. I am not sure about the performance by mapping the audio to FLAME parameters using codetalker (or VQ-based method). Since the stage1 is the expected upper bound of the stage2 results, I think you'd better make sure it works normally (without artifacts and smoothly).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants