Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FID during training is largely different from eval separately for the same checkpoint #107

Open
vra opened this issue Aug 30, 2023 · 0 comments

Comments

@vra
Copy link

vra commented Aug 30, 2023

Hi @ryanrussell @ericryanchan , Thanks for opening source this awesome method!

When I retrain the eg3d model using python train.py --outdir=~/training-runs --cfg=ffhq --data ./FFHQ_512.zip --gpus=4 --batch=16 --gamma=1 --gen_pose_cond=True, I noticed that when evaluation during training, the FID is quite high (~95) even after 13000 kimgs:

ick 3250  kimg 13000.0  time 5d 01h 48m   sec/tick 128.5   sec/kimg 32.11   maintenance 0.1    cpumem 3.94   gpumem 8.73   reserved 10.29  augment 0.000
~/training-runs/00014-ffhq-FFHQ_512-gpus4-batch16-gamma1
Evaluating metrics...
{"results": {"fid50k_full": 95.52802728177237}, "metric": "fid50k_full", "total_time": 289.52356004714966, "total_time_str": "4m 50s", "num_gpus": 4, "snapshot_pkl": "network-snapshot-013000.pkl", "timestamp": 1693301505.8255823}
tick 3251  kimg 13004.0  time 5d 01h 55m   sec/tick 128.3   sec/kimg 32.08   maintenance 310.1  cpumem 3.86   gpumem 8.73   reserved 10.29  augment 0.000
tick 3252  kimg 13008.0  time 5d 01h 57m   sec/tick 128.4   sec/kimg 32.09   maintenance 0.1    cpumem 3.86   gpumem 8.73   reserved 10.29  augment 0.000

However, when I evaluate the same checkpoint using single evaluation script like this: python calc_metrics.py --metrics=fid50k_full --data ./FFHQ_512.zip --network ./network-snapshot-013000.pkl , the FID is quite low (~8):

generator features  items 50000   time 32m 32s      ms/item 41.20
{"results": {"fid50k_full": 8.188923527848317}, "metric": "fid50k_full", "total_time": 2188.9980747699738, "total_time_str": "36m 29s", "num_gpus": 1, "snapshot_pkl": ".\\network-snapshot-013000.pkl", "timestamp": 1693321722.8993306}

When I checked the fake image of this checkpoint, it is quite OK and nothing was wrong with it.

Could you give any comments or hints on the unnormal FID during training? Thanks in adavance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant