Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the repo pick the weights that perform best in val dataset to evaluate in test dataset? #7

Open
pierowu opened this issue Oct 18, 2023 · 2 comments

Comments

@pierowu
Copy link

pierowu commented Oct 18, 2023

Thank you for your solid work.
Does the repo implement the function that pick the model weights that perform best in val dataset to evaluate in test dataset?
From the code below, it seems that the repo directly choose the best results in test dataset as the final results?

acc1, logits = validate(test_dataloader, model, criterion, epoch, config, return_logits=True)
# remember best acc@1 and save checkpoint
if acc1 > best_acc1:
model_info['best_logits'] = logits
best_acc1 = max(acc1, best_acc1)
logging.info(f'=> Learning rate {config.TRAIN.LR}, L2 lambda {config.TRAIN.WD}: Best score: Acc@1 {best_acc1:.3f}')

@jkooy
Copy link
Collaborator

jkooy commented Oct 18, 2023

Hi, thanks for the interests!
I just been notified you raised the same question in the elevater toolkit https://github.com/Computer-Vision-in-the-Wild/Elevater_Toolkit_IC. So basically the best results on the test set are reported. And the best weights are selected on the test set. This is the same setting used in the elevater toolkit for the fair comparison.

@pierowu
Copy link
Author

pierowu commented Oct 18, 2023

Hi, thanks for the interests! I just been notified you raised the same question in the elevater toolkit https://github.com/Computer-Vision-in-the-Wild/Elevater_Toolkit_IC. So basically the best results on the test set are reported. And the best weights are selected on the test set. This is the same setting used in the elevater toolkit for the fair comparison.

It seems to bring some possibilities to overfit in the test set. However , since the elevater benchmark use this setting, maybe there is no better way but to follow it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants