Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature suggestion: model and dataset versioning with HuggingFace #121

Open
narekvslife opened this issue Jun 17, 2024 · 1 comment
Open

Comments

@narekvslife
Copy link

Hi dear Brain-Score Team!

While thinking on my own experiments, reading papers and talking to people, I can’t get rid of a feeling that we (as a field) assume that we understand ML methods better than we actually do.

I mostly speak about not controlling for many of the lower-level ML specifics that could contribute largely to the differences in final performance: interactions between optimizers, batch sizes, different subsets of the same dataset, operation precisions, hyper-parameters specific to architectures, and so on. These are not well understood by the ML community itself, and many of the choices are left as heuristics. Reproducibility is a known issue in modern ML.

I think BrainScore is in a position to bring neuroscientist the best practices from CS community and thus accelerate the progress through reproducibility and unification of tools.

One step towards this would be to manage model and dataset versioning with something like huggingface for example.

  • We will be able to extend the model cards that we provide now (which answer very little questions about what the model actually is) with a link to HF repo with exact weights and hyper-parameters.
  • The same thing is possible for datasets and their subsets
  • Since interactions with HF are standard and well integrated with many libraries, this would also make life easier for those, who want to take the best model from BrainScore and use it in further applications.

This way we not only (1) solve reproducibility and clarity of results but also (2) collect a lot of additional (meta)data that we could later analyze to find/build best alignment models and (3) allow users to do a search for models with their preferred settings.

@narekvslife narekvslife changed the title Feature suggestion: Feature suggestion: model and dataset versioning with HuggingFace Jun 17, 2024
@mike-ferguson
Copy link
Member

Hi @narekvslife - thanks for opening an issue and this is a great suggestion. We have talked previously in the past with linking models directly to our Github (for our "Brain Models", like CORnet-S for example), and HF for "Base Models", i.e. standard models like resnet-50 or MobileNets. We hope to add this soon as part of a refactoring of the way we are storing models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants