Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry regarding Training Method Details #60

Open
run-youngjoo opened this issue Sep 25, 2024 · 1 comment
Open

Inquiry regarding Training Method Details #60

run-youngjoo opened this issue Sep 25, 2024 · 1 comment

Comments

@run-youngjoo
Copy link

Dear Authors,

I hope you find this message.

First of all, I would like to express my appreciation for your excellent research. It is truly remarkable to see the quality you achieved using only self-attention fine-tune mechanisms.

As the training code was not provided, I have been experimenting by creating my own version.

However, unlike your setup, I am using A100 * 8 GPUs and have followed the configuration in your paper, specifically training with a batch size of 16 * 8, a learning rate of 1e-5, and 16,000 steps over about 10-hours.

While the qualitative results are promising, the quantitative performance is not quite reaching the expected in your paper (in my case, I obtained FID: 8.268, KID: 2.20, SSIM: 0.844, LPIPS: 0.07 on VITON-HD datatset).

I have also applied data augmentation, but it has not significantly improved the results.

I was wondering if there are any additional details or nuances in your training process that were not explicitly mentioned in the paper, which could help improve the performance.

Thank you for your time, and I greatly appreciate any insights you can provide.

@Zheng-Chong
Copy link
Owner

Zheng-Chong commented Sep 25, 2024

It is regrettable that since CatVTON is a collaborative project, we will not directly release our training code. As the training of diffusion models is a relatively complex task, we have described the training details in the paper as much as possible, but perhaps some minor details that were not elaborated might cause discrepancies in training. Additionally, regarding the issue of training time, I have tested on 8xA100, and the total training time (approximately 7 hours) is shorter than on 8xA800. Perhaps you can check if there are any redundant parts in the code that can be optimized

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants