You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, I would like to express my appreciation for your excellent research. It is truly remarkable to see the quality you achieved using only self-attention fine-tune mechanisms.
As the training code was not provided, I have been experimenting by creating my own version.
However, unlike your setup, I am using A100 * 8 GPUs and have followed the configuration in your paper, specifically training with a batch size of 16 * 8, a learning rate of 1e-5, and 16,000 steps over about 10-hours.
While the qualitative results are promising, the quantitative performance is not quite reaching the expected in your paper (in my case, I obtained FID: 8.268, KID: 2.20, SSIM: 0.844, LPIPS: 0.07 on VITON-HD datatset).
I have also applied data augmentation, but it has not significantly improved the results.
I was wondering if there are any additional details or nuances in your training process that were not explicitly mentioned in the paper, which could help improve the performance.
Thank you for your time, and I greatly appreciate any insights you can provide.
The text was updated successfully, but these errors were encountered:
It is regrettable that since CatVTON is a collaborative project, we will not directly release our training code. As the training of diffusion models is a relatively complex task, we have described the training details in the paper as much as possible, but perhaps some minor details that were not elaborated might cause discrepancies in training. Additionally, regarding the issue of training time, I have tested on 8xA100, and the total training time (approximately 7 hours) is shorter than on 8xA800. Perhaps you can check if there are any redundant parts in the code that can be optimized
Dear Authors,
I hope you find this message.
First of all, I would like to express my appreciation for your excellent research. It is truly remarkable to see the quality you achieved using only self-attention fine-tune mechanisms.
As the training code was not provided, I have been experimenting by creating my own version.
However, unlike your setup, I am using A100 * 8 GPUs and have followed the configuration in your paper, specifically training with a batch size of 16 * 8, a learning rate of 1e-5, and 16,000 steps over about 10-hours.
While the qualitative results are promising, the quantitative performance is not quite reaching the expected in your paper (in my case, I obtained FID: 8.268, KID: 2.20, SSIM: 0.844, LPIPS: 0.07 on VITON-HD datatset).
I have also applied data augmentation, but it has not significantly improved the results.
I was wondering if there are any additional details or nuances in your training process that were not explicitly mentioned in the paper, which could help improve the performance.
Thank you for your time, and I greatly appreciate any insights you can provide.
The text was updated successfully, but these errors were encountered: