Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about FID. #5

Open
DRJYYDS opened this issue Apr 10, 2023 · 10 comments
Open

Questions about FID. #5

DRJYYDS opened this issue Apr 10, 2023 · 10 comments

Comments

@DRJYYDS
Copy link

DRJYYDS commented Apr 10, 2023

Hi, this is an excellent repo. I may want to know what FID you obtain?

@junhsss
Copy link
Owner

junhsss commented Apr 14, 2023

Hi @DRJYYDS. I haven't computed FID scores, but just wrote a script for that.

@yuanzhi-zhu
Copy link

FYI, I got a fid of 20 using @junhsss 's code;

I also calculated with https://github.com/mseitzer/pytorch-fid using the same 10k generated images (calculated with the whole cifar10 dataset) and got a fid of 56.

@DRJYYDS
Copy link
Author

DRJYYDS commented Apr 25, 2023

FYI, I got a fid of 20 using @junhsss 's code;

I also calculated with https://github.com/mseitzer/pytorch-fid using the same 10k generated images (calculated with the whole cifar10 dataset) and got a fid of 56.

Get it! Thanks. It's interesting to see the FID gap. When you calculate the FID with pytorch-fid, didyou first save the picture and then read it? It's known that the fid score of Cifar-10 is sensitive to the format.

@yuanzhi-zhu
Copy link

yuanzhi-zhu commented Apr 25, 2023

Hi @DRJYYDS ,

sorry for the confusion.

In @junhsss 's implementation, the ground truth folder has 10k images, but the 56 fid was calculated using the whole cifar10 dataset (60k images in total).

I just retried the pytorch-fid with the same 10k images and got a fid of 21, which agrees well :)

PS: I have my own implementation of consistency models (slightly different from this one (unet architecture, LPIPS model, etc.)). The fid I got is 41 with pytorch-fid using 10k generated samples and 60k GT images (the model is trained with batch size=160
and a total steps=70k).

Both are still worse than the reported fid in the original paper (8.7 for one step and 5.8 for two steps).

@DRJYYDS
Copy link
Author

DRJYYDS commented Apr 26, 2023

Hi @DRJYYDS ,

sorry for the confusion.

In @junhsss 's implementation, the ground truth folder has 10k images, but the 56 fid was calculated using the whole cifar10 dataset (60k images in total).

I just retried the pytorch-fid with the same 10k images and got a fid of 21, which agrees well :)

PS: I have my own implementation of consistency models (slightly different from this one (unet architecture, LPIPS model, etc.)). The fid I got is 41 with pytorch-fid using 10k generated samples and 60k GT images (the model is trained with batch size=160 and a total steps=70k).

Both are still worse than the reported fid in the original paper (8.7 for one step and 5.8 for two steps).

Thanks for your reply!

I believe that you now have the correct FID. By the way, FID is calculated between 50k generated images and 50k real images in most papers' reported results.

It's interesting to see the performance gap between your implementation and @junhsss 's implementation and the reported fid in the original paper. It may indicate the Consistency Model is kind of sensitive to some settings (e.g, batchsize, schedule...).
My own implementation is also around 15 to 20. I believe that if you calculate the FID using 10k/50k generated images and 10k/50k real images with your implementation, it can be expected that you can get FID under 20 :)

@yuanzhi-zhu
Copy link

Thanks for your reply!

I believe that you now have the correct FID. By the way, FID is calculated between 50k generated images and 50k real images in most papers' reported results.

It's interesting to see the performance gap between your implementation and @junhsss 's implementation and the reported fid in the original paper. It may indicate the Consistency Model is kind of sensitive to some settings (e.g, batchsize, schedule...). My own implementation is also around 15 to 20. I believe that if you calculate the FID using 10k/50k generated images and 10k/50k real images with your implementation, it can be expected that you can get FID under 20 :)

Thank you for your information, I will try it with more samples :)

I will check more on the difference to the original implementation if I got spare time...

@DRJYYDS
Copy link
Author

DRJYYDS commented Apr 26, 2023

Thanks for your reply!
I believe that you now have the correct FID. By the way, FID is calculated between 50k generated images and 50k real images in most papers' reported results.
It's interesting to see the performance gap between your implementation and @junhsss 's implementation and the reported fid in the original paper. It may indicate the Consistency Model is kind of sensitive to some settings (e.g, batchsize, schedule...). My own implementation is also around 15 to 20. I believe that if you calculate the FID using 10k/50k generated images and 10k/50k real images with your implementation, it can be expected that you can get FID under 20 :)

Thank you for your information, I will try it with more samples :)

I will check more on the difference to the original implementation if I got spare time...

Good luck to you! If you got any questions, we can discuss them together, I also try to work on Consistency Model recently.

@mo666666
Copy link

Hi, guys, I would like to ask the number of iterations you adopt when you calculate FID. Is the high FID possibly due to the insufficient training?

@DRJYYDS
Copy link
Author

DRJYYDS commented Nov 21, 2023

Hi, guys, I would like to ask the number of iterations you adopt when you calculate FID. Is the high FID possibly due to the insufficient training?

Sure. According to the following work "Improved training technique for training consistency models" by Song Yang, the training on CIFAR-10 needs 8000 epochs.

@mo666666
Copy link

mo666666 commented Dec 8, 2023

I re-read the original paper of the consistency model and there is one point that confuses me. In Table 3, they propose to set the EMA decay rate of 0.9999 for CT on CIFAR-10 dataset. Does this mean that we need another EMA model (except for $\theta^{-}$) to evaluate the FID?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants