Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup GitHub Actions and Travis for automated submissions #428

Merged
merged 10 commits into from
Jan 4, 2024

Conversation

kvfairchild
Copy link
Contributor

@kvfairchild kvfairchild commented Jan 3, 2024

Adds two GitHub Actions workflows and supporting infrastructure to

  1. Approve and merge any PRs that adhere to these requirements (automerge_plugin-only_prs.yml):
  • Are labeled with automerge or automerge-web (all submissions originating from brain-score.org are tagged with automerge-web)
  • Don't contain any changes outside of child directories of plugin directories (/benchmarks, /models, /metrics, /data)
  • All Travis tests are passing

This workflow is triggered on a PR by the completion of Travis checks OR by the addition of a qualifying label by a collaborator (will not work from a fork). It first runs an action to confirm that the PR is labeled with either automerge or automerge-web; if so, it then checks Travis status and sets the travisok flag to one of True (Travis completed successfully), Wait (Travis has not finished running), or Fail (Travis completed with errors).

If travisok is

Wait: The workflow exits and will be triggered again on CI check completion. (Rather than only triggering the workflow on check completion, this status accommodates the case where a user has waited for CI checks to complete before adding the automerge label).
True: An action is run to identify all files changed by the PR. If no files have been touched outside of child directories of plugin directories (/benchmarks, /models, /metrics, /data), the PR is automatically approved and merged.
False: If the PR is tagged with automerge-web, the user ID associated with the Brain-Score account of the submitter is parsed from the PR title and passed to an action that 1) returns the email address associated with that account and 2) sends an email to that address that includes a link to the PR and requests that the test failures be addressed.

  1. Score any PRs merged to main that include changes to a benchmark or model directory (score_new_plugins.yml).
    This workflow is triggered on any merge to main. It first runs an action to identify all files changed by the PR, parses those files with a Python script in core (parse_changed_files.py) to extract any benchmark and model directories included in those changes, and passes the names of those directories to Jenkins for scoring along with:

author_email: the email address associated with the GitHub account of the PR author.
user_id: if the PR is submitted via brain-score.org, the user ID associated with the Brain-Score account of the submitter is included in, and extracted from, the PR title.
domain: the domain associated with the submitted plugins; this field is vision by default for all submissions to this repository.
public: whether the plugin is public or private; currently this field is True for all submissions by default, but this accommodates brain-score/core#26.
competition: currently None by default, but in the future this will allow PRs to be tagged with the name of a competition and filed accordingly.
model_type: the model type associated with the submitted plugins; this field is Brain_Model by default for all submissions to this repository.

@kvfairchild kvfairchild marked this pull request as ready for review January 4, 2024 01:52
@kvfairchild kvfairchild requested a review from mschrimpf January 4, 2024 02:47
Comment on lines +46 to +55
script:
- |
if [ ! -z "$TRAVIS_PULL_REQUEST_BRANCH" ]; then
CHANGED_FILES=$( git config remote.origin.fetch "+refs/heads/*:refs/remotes/origin/*" && git fetch && echo $(git diff --name-only origin/$TRAVIS_PULL_REQUEST_BRANCH origin/$TRAVIS_BRANCH -C $TRAVIS_BUILD_DIR) | tr '\n' ' ' ) &&
PLUGIN_ONLY=$( python -c "from brainscore_core.plugin_management.parse_plugin_changes import is_plugin_only; is_plugin_only(\"${CHANGED_FILES}\", \"brainscore_${DOMAIN}\")" )
fi
- |
if [ "$PLUGIN_ONLY" = "True" ]; then
bash ${TRAVIS_BUILD_DIR}/.github/workflows/travis_trigger.sh $GH_WORKFLOW_TRIGGER $TRAVIS_PULL_REQUEST_SHA;
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought all script now come from the import?

@mschrimpf mschrimpf merged commit e72292b into integrate_core Jan 4, 2024
4 of 5 checks passed
@mschrimpf mschrimpf deleted the kvf/automated_submissions branch January 4, 2024 09:29
mschrimpf added a commit that referenced this pull request Jan 4, 2024
Releases Brain-Score 2.0 which uses a plugin system to manage data, metrics, benchmarks, and models.

- rename package to `brainscore_vision`
- use `pyproject.toml` instead of `setup.py` (#383)
- refactor data packaging and benchmarks to plugin format (#353, #397)
- refactor metrics to plugin format; keep only overall result in `Score` object and `error` in attributes (#391)
- integrate model_tools as model_helpers (#381)
- add alexnet and pixel models (#408, #412) 
- automatically merge and score plugins after tests pass (#394, #414, #428) 
- validate tests and merge `master` (#393, #403, #424, #429)

-----------------------------

detailed commits:

* rename package brainscore -> brainscore_vision

* setup plugin registries (#349)

* add brain-score core dependency

* remote top-level `get_stimulus_set` import

* move temporary wontfix tests into todo package

* add registry import; remove lab prefix

* move `test_submission` to `todotests`

* add pytest configuration to ignore `todotests`

* move majajhong2015 ceiling test inside plugin

* move tests for assemblies, stimuli, and examples into todo

* fix assembly/stimulus_set loading

use brainio.get_{assembly,stimulus_set} instead of brainscore_vision

* use brainscore_vision entrypoint

* unify import

(nit)

Co-authored-by: kvfairchild <[email protected]>

Co-authored-by: kvfairchild <[email protected]>

* Sw/Restructuring vision (#353)

* Added benchmark and data folders (with corresponding init, benchmark/data_packaging, test) for sanghavi benchmarks, began to add others (geirhos2021, hermann2020, kar2019). Created test_helper to reduce code duplication within tests.

* Added data_packaging.py to sanghavi, sanhavijozwik, and sanghavimurty. Moved environment.yml (removed the name and prefix) and requirements.txt (need to see what brainio url should be changed to) as well.

* removed old sanghavi benchmarks

* removed lazy load from sanghavi benchmarks, added marques2020_cavanaugh benchmarks, added test_benchmark_registry to sanghavi benchmarks

* added marques2020_devalois1982a benchmark and data packaging

* added marques2020 benchmarks and data packacking, removed sanghavi and devalois1982 a and b from test___init__.py, removed finished (kind of) benchmark scripts, removed benchmarks from benchmark __init__.py

* Reformatted current data directories to combine inits, separate packagings, and combine tests

* completed kar2019 benchmark, updated all tests of past benchmarks

* reformatted rajalingham2018 and rajalingham2020 benchmarks, created their data folders and contents

* completed geirhos, cadena benchmarks and data packaging

* reformatted sanghavi and marques benchmarks

* created benchmarks for imagenet, imagenet_c, objectnet,

created data packaging for those as well as bashivankar2019 and kuzovkin2018,

updated test_helper with parameter types

* created benchmark helpers, updated tests of several benchmarks, moved several data packaging files to their respective directories

* Created benchmark and data packaging folders for each benchmark, moved corresponding benchmarks and data packaging to each

* Created benchmark and data packaging folders for each benchmark, moved corresponding benchmarks and data packaging to each

* updated imports of benchmark helpers, renamed benchmarks, updated imports of benchmark inits

* created david2004 data packaging, barbumay2019, majaj2015, deng2009, imagenetslim15000, seibert2019, rust2012,

* Added an s3 util file that allows for assemblies and stimuli sets to be loaded into data registry, began reformatting inits with new functionality, began filling in parameters from lookup csv

* reformatted inits of deng, imagenet, kars, kuzovkinm marques, rajalinghams, rust, sanghavi, seibert

* updated inits to with load_assembly_from_s3() and load_stimulus_set_from_s3() functions, filled these out with corresponding sha1's, updated tests, moved files from packaging and test directories into correct new directories, continued reformatting

* updated buckets of all assemblies and stimulus sets, removed for loop of geirhos to allow for string parsing, cleared out test directory, cleared out data packagnig directory

* created data helper, added all version ids to all assembly/stimulus set inits, changed to stimulus_set_registry in inits,

* Created BIBTEXs for all missing in data packaging, went through and fixed import errors/ other errors

* Deleted packaging notebooks, deleted other notebooks

* * Remove benchmark pools from brainscore_vision/__init__.py
* take benchmark pools out of evaluation.py (although some left for Martin to decide)
* changed path name in helper.py

* remove .idea

* move data helpers out of data directory

* unify benchmark and metric definition; delete mask benchmarks

* include stimulus set in assembly

* move s3 into data_helpers/

* name lookup helpers as legacy

* updated all version IDs manually

* * removed public_benchmark_helper.py
* removed unnecessary step in geirhos benchmark __init__.py

* * added stimulus set loaders to all data registries

* * reverted buckets in data packaging notebooks/.pys back to old version

* make merging simulus set meta optional

* import stimulus_set plugin

* type hint stimulus_set_registry

* specify `stimulus_set` registry prefix

following brain-score/core#25

---------

Co-authored-by: Martin Schrimpf <[email protected]>

* use pyproject.toml instead of setup.py (#383)

* use pyproject instead of setup
* explicitly set setuptools py-modules
* add networkx dependency again
* add scikit-learn dependency again
* update screen gray tests: lossless png and more flexible amount gray

* integrate model helpers (formerly model_tools) (#381)

* initial commit

* add unit tests

* add README, LICENSE, .travis

* move activations-related functions to this repo

* use conda to install frameworks; remove python 3.7 due to pytorch incompatibility

* source activate instead of conda

* ignore tf-slim for testing

* remove framework inference; fix keras and tensorflow pipeline

* test grayscale and alpha images

* add from_stimulus_set

* use immutable tuples for normalization reference

* fix stimuli_identifier default for storing activations

* test for explicit activations

* do not store StimulusSet activations by default

* enable logits retrieval

* require PCA to be hooked manually; add method to insert all attributes

* store StimulusSets by default

* add multilayer_mapping

* use regression from brain-score

* run tests on cpu

* disable tf caching in order to cut down on memory usage

travis tests fail due to OOM

* attempt to obtain more memory by requiring sudo

* treat logits as layer; create model directly from test provider

* CenterCrop instead of Resize by default in pytorch

* add option to disable multithreading

* skip memory intense (>7.5 GB) tests in travis

* download imagenet before travis script run

* use MT_ environment variables for imagenet path

* infer model class identifier rather than module

* add brain commitment utility (LayerModel, ModelCommitment); remove regression

* rename multilayer_mapping -> brain_transformation

* allow multi-layer to region map; remove redundant data

* re-use LayerModel in ModelCommitment

* remove @staticmethod to allow sub-classing

* remove unused variables

* separate LayerScores from LayerSelection

* add pixel-degree translation

* store converted stimuli in consistent path; hook onto activations extractor

also rename register_batch_hook -> register_batch_activations_hook

* revert erroneously committed device assignment to cpu

* keep awscli at 1.11.18 due to PyYAML dependency error

* install libpython-dev to deal with awscli dependency error

* --yes install libpython

* property-forward identifier

* attach PCA for layer selection; lazy layer commitment

* add channel metadata

* check for is_hooked before hooking

* ignore six in awscli installation to avoid PyYAML error

* fix filepath

* fix merging of convolutional and fully-connected activations

* remove ceiler stratification

* add layer packaging status updates

* separate layer-mapping and pixel-degrees

* remove out-dated wrapper logits assignment

* also separate unit tests for neural and stimuli

* add behavioral mapping to ImageNet synsets

* fix expected layer

* add timeout multi-layer test; combine layer assemblies manually

resolve #4

* update to public assemblies

* add TemporalIgnore mapping

* add ProbabilitiesMapping using logistic classifier

from mschrimpf/brain-score@244f9c3

* tie LogitsBehavior to imagenet specifically since no fitting is done

* use packaged behavioral data

* use `approx` to avoid floating-point arithmetic mismatches

* separate pytest flags; add private-access flag; add AWS access key

* set AWS environment keys as global

* pass time_bins for `brain_model.start_recording`

* list installed package versions for diagnostics

if the code doesn't work for the user, s/he can check travis for which versions did work

* update neural benchmarks import

* use pytest.mark instead of pytest.config

https://docs.pytest.org/en/latest/deprecations.html#pytest-config-global

* when possible, ignore local part of stimuli paths to align across machines

* expand LayerMappedModel to multiple layers for single region (#10)

* resize to target image size instead of center-crop (#11)

* separate _build_extractor method to allow CORnet's temporal interjection (#12)

* add tests that the package can be properly imported (#20)

these tests can always run and do not require e.g. special memory

* Add manifest file (#21)

* FIx little bugs

* Add manifest file to also install imagenet_classes.txt with pip

* Revert old changes

* use new public benchmarks from Brain-Score instead of self-built ones (#19)

depends on #175

* allow custom benchmarks for mapping (#22)

* add travis slack notifications (#23)

* allow changing normalization params for torch preprocessing; allow multiple probabilities readout layers (#28)

* allow changing normalize_mean/std for torchvision preprocessing; add ProbabilitiesMapping docs

resolves https://github.com/brain-score/model-tools/pull/27/files

* allow passing list of behavioral readout layers

* fix kwargs name

* Unhook methods and test fix (#26)

* Fix hook problem

* Fix failing test

* Move submission check module to model-tools project

Co-authored-by: Martin Schrimpf <[email protected]>

* Update tensorflow to V2 (#24)

Change tensorflow packages

* move stimuli-degree-resizing to brain-score; add BrainModel.visual_degrees; pytorch resize instead of center-crop (#9)

* move stimuli-degree-resizing to brain-score; add BrainModel.visual_degrees

* resize to target image size instead of center-crop

center-cropping would e.g. take only 224x224 pixels from a 1800x1800 px image

* fix benchmark import

* update resize parameter passing to tuple

otherwise, it would be resized to width only and maintain aspect ratio

* update public_benchmarks import

* default to 8 visual degrees instead of 10

* update layer selection with visual_degrees

* update test to place stimuli on screen

* add test for default visual degrees commitment

* Update setup.py with missing dependencies (#29)

* Add missing dependencies

* Change the submission check modules (#30)

* Change structure for submission checks.

* Improve model checking

Co-authored-by: Martin Schrimpf <[email protected]>

* update for brain-score/brainio_collection#32 (#32)

* reduce stimuli paths to unique set to avoid duplicate compute overhead (#33)

* reduce stimuli paths to unique set to avoid duplicate compute overhead

* output assembly for ImageNet task instead of synset list

* Bugfixing (#35)

* Fix hook problem

* Fix failing test

* Remove old test class

* Tiny change for reloading.

* Revert unhook functionality

* Revert change

* Move submission check module to model-tools project

* Add missing dependencies

* Add missing dependencies

* Add missing dependencies

* Change dependencies

* Some test fixes

* Some test fixes

* Change structure for submission checks.

* Improve model checking

* add database tests

* change tensorflow version

* Update check model, it was wrong

* Revert something

* Revert

* Change tf version

* stimulus set identifier is now name

* revert

Co-authored-by: Martin Schrimpf <[email protected]>

* fix coordinates on logits behavior (#36)

* accept number_of_trials in look_at (and ignore) (#38)

* accept number_of_trials in look_at (and ignore)

* add default number_of_trials=1

* add number_of_trials to PreRunLayers.look_at (#39)

* accept number_of_trials in look_at (and ignore)

* add default number_of_trials=1

* add number_of_trials to PreRunLayers.look_at

* Add fix for palletized images (#34)

* Add fix for palletized images

* Add tests for palletized image

* Fix typo (.__.)

* Add fix for keras version

Co-authored-by: Martin Schrimpf <[email protected]>

* do not include benchmark in storage identifier (#31)

the selection_identifier is the correct identifier, benchmark is already the benchmark implementation

* add check_submission/images; remove repeat_trials test (#43)

* add check_submission/images to MANIFEST

* remove repeat_trials

* Visual Transformer compatibility (#44)

* Adjusted to accomodate Transformer with 1D embedding

* added transformer model test

* added transformer to model_layers

* added transformer meta test

* changed transformer tests to contain dummy model

* added 1k output layer (s. logit) to transformer dummy

* revert FC coord_names to having channel, channel_x, channel_y where the latter are filled with nan values

* corrected layer naming

Co-authored-by: Martin Schrimpf <[email protected]>

* added comment explaining KeyError when changing flatten_coord_names (#45)

* remove redundant pandas dependency

already covered through brainio_base dependencies

* upgrade to python 3.7 (#48)

discontinue python 3.6, together with brain-score/brainio_base#16

* pass region-layer mapping in ModelCommitment constructor (#51)

* pass region-layer mapping in ModelCommitment constructor

this simplifies the commitment of layers and prepares for the implementation of stochastic models

* fix unit tests and submission check

* fix unit tests

* re-compute activations.test___init__.test_exact_activations[alexnet-rgb-{None,1000}] and save to netcdf
* migrate brain_transformation.test_behavior[alexnet,resnet34,resnet18].pkl to netcdf

migrated using
```
import pickle
import xarray as xr
from pathlib import Path

from brainio_base.assemblies import walk_coords

for path in [f'brain_transformation/identifier={model},stimuli_identifier=objectome-240.pkl'
             for model in ('alexnet', 'resnet34', 'resnet18')]:
    path = Path(path)
    f = open(path, 'rb')
    d = pickle.load(f)
    a = d['activations'] if 'activations' in d else d['data']
    path_nc = path.parent / (path.stem + '.nc')
    a = xr.DataArray(a)
    a = a.reset_index([dim for dim in a.dims if len(list(walk_coords(a[dim]))) > 1])
    a.to_netcdf(path_nc)
    print(f"saved {path_nc}")
```

* fix precomputed activations

migrate from pkl files directly instead of recomputing like before

* fix forwarding the `number_of_trials parameter` (#52)

* fix time_bin naming (level_0/1 -> start/end) (#53)

* fix time_bin naming (level_0/1 -> start/end)

* note xarray bug leading to merge mis-naming

* link result_caching directly to brain-score org

* provide BrainModel identifier from neural and behavioral components; do away with trials (#40)

* provide identifier as BrainModel

8e13735

* delete repeat_trials unit test

following #235

* Fixed model-tools dependency error (#54)

* Added AlexNet from examples into base_model.py to practice submission

* Fixed model-template dependency error (now points to Brain-Score repo instead of Martin's)

* Removed Dead code, rolled back tensorflow version to ==1.15

* Use BrainIO (#55)

* Update to use brainio_core package.

* Name change.

* Remove brainio-core.

* Force Jenkins re-run.

* Trigger re-run on Jenkins.

* remove check for ImageNet task (#60)

* remove check for ImageNet task

this will otherwise throw errors where the last layer has != 1000 neuroids (treated as logits). This has no effect on brain benchmarks and should imo thus not be a relevant check

* convert path to str, more instructive logs

* fix TestI2N.test_model sub-classing

* Wordnet decoder for Geirhos2021 benchmarks (#61)

counterpart to #323

* add wordnet_functions from https://github.com/bethgelab/model-vs-human/blob/745046c4d82ff884af618756bd6a5f47b6f36c45/modelvshuman/helper/wordnet_functions.py

* clean up code

but realizing that `is_hypernym` function is undefined

* implement logits to label for Geirhos et al. 2021

* use named axes for softmax

* add unit test for choice labels

* seed model

* seed custom model in init

* handle dimensions more flexibly

* use `.get_stimulus(stimulus_id)` (#62)

* use `.get_stimulus(stimulus_id)`

instead of `.get_image(image_id)`

* downgrade protobuf for keras version

attempting to fix keras import errors in http://braintree.mit.edu:8080/job/unittest_model_tools/132/

* rename image_id -> stimulus_id

* rename image_paths -> stimulus_paths

* add legacy support for LogitsBehavior class (#63)

* add legacy support for LogitsBehavior class

* fix `mock_stimulus_set` call

* make flatten coordinates more generic; add forward_kwargs option; image tensors (#64)

* fix tutorial link (#347)

* add documentation for testing on precomputed features (#355)

* add documentation for testing on precomputed features

* re-add erroneously deleted content

* fix retrieval of model predictions

* make files public using predefined ACL

* Add Imagenet index mappings for Zhu 2019 and Baker 2022 (#69)

* Updated baker 2022

* Accuracy metric works, Engineering benchmark added

* Finalized Baker2022 Benchmark

* Added Zhu/Baker imagenet indices that do not exist.

* Update model_tools/brain_transformation/behavior.py

Co-authored-by: Martin Schrimpf <[email protected]>

* Update model_tools/brain_transformation/behavior.py

Co-authored-by: Martin Schrimpf <[email protected]>

* Update model_tools/brain_transformation/behavior.py

Co-authored-by: Martin Schrimpf <[email protected]>

* Update model_tools/brain_transformation/behavior.py

Co-authored-by: Martin Schrimpf <[email protected]>

---------

Co-authored-by: Martin Schrimpf <[email protected]>

* include engineering benchmarks in standard tests again (#361)

* include engineering benchmarks in standard tests again

also fix typo

* fix pool keys access

* use new jenkins_id instead of id for submissions (#359)

* added support for jenkins_id write

* removed redundant comment

* added DB models specification changes

* Make sure directories created use jenkins_id and not id (#364)

* Hotfix: add jenkins_id to test_submission.py's tests (#366)

* Hotfix: add jenkins_id to test_submission.py's tests

* Fixed some tests

* Fixed more tests, another submission issue

* Islam2021 (#360)

* add Islam2021 packaging file

* add Dimensionality metric

* add Islam2021Dimensionality benchmark

* add islam2021 benchmarks to benchmark pool

* clean packaging file

* add Islam2021 benchmark tests

* add islam2021 stimuli test

* Fix typo in test_islam2021.py

* Add lookup.csv entry for neil.Islam2021

* Correct benchmark name in  brainscore/benchmarks/__init__.py

Co-authored-by: Martin Schrimpf <[email protected]>

* Change recorded bins to standard ones

Co-authored-by: Martin Schrimpf <[email protected]>

* Correct identifier of benchmark in islam2021.py

Co-authored-by: Martin Schrimpf <[email protected]>

* Fix parent benchmark name islam2021.py

Co-authored-by: Martin Schrimpf <[email protected]>

* Fix stimulus name in lookup.csv

Co-authored-by: Martin Schrimpf <[email protected]>

* Fix islam2021 test names in tests/test_benchmarks/test___init__.py

Co-authored-by: Martin Schrimpf <[email protected]>

* fix islam2021 stimuli test name in tests/test_stimuli.py

Co-authored-by: Martin Schrimpf <[email protected]>

* Fix islam2021 stimuli name in tests/test_stimuli.py

Co-authored-by: Martin Schrimpf <[email protected]>

* Fix benchmark_pool keys in test_islam2021.py

* Fix stimulus name

* Add private_access to islam tests

* Fix Islam2021 stimuli name in test_stimuli

* add private_access to test_islam2021 in test_stimuli

---------

Co-authored-by: Martin Schrimpf <[email protected]>

* Removed faulty model tools import line, removed other unused lines as well (#367)

* Add domain to new benchmark creation code for vision. (#368)

* Hotfix: Moved Islam to experimental pool, added domain to models.py (#369)

* Moved Islam to experimental pool, added domain to models.py

* Removed Islam2021 from engineering test pool

* let travis run private and public tests separately (#371)

previously, when private tests were run, travis ran private _and_ public tests, and then in a separate run only the public tests. This changes it so that _only_ private and _only_ public tests are run separately.

* remove outdated lookup_source (#372)

per @jjpr's advice

* add instructions for adding users to AWS (#373)

* document Python = 3.7 (instead of >= 3.7) for TF compatibility (#375)

* more extensively describe tasks and intended outputs (#374)

following the detailed description in [language](https://github.com/brain-score/language/blob/main/brainscore_language/artificial_subject.py)

* reorganize contents into subdirectories

* reorganize model_helpers for integration

* add missing access parameter for MajajHong2015

---------

Co-authored-by: franzigeiger <[email protected]>
Co-authored-by: Sachi Sanghavi <[email protected]>
Co-authored-by: pmcgrath249 <[email protected]>
Co-authored-by: Michael Ferguson <[email protected]>
Co-authored-by: jjpr-mit <[email protected]>
Co-authored-by: Michael Ferguson <[email protected]>
Co-authored-by: Tiago Gaspar Oliveira <[email protected]>
Co-authored-by: SusanWYS <[email protected]>

* First model added (hopefully many more to come!)

* Revert "First model added (hopefully many more to come!)"

This reverts commit 0d74c72.

* Metrics plugin format (#391)

* move all metrics / metric_helpers into plugin directories

* define ceiling class in `brainscore_vision/metrics`

* register metrics to make them loadable

* use loader methods for metrics and ceilings

* scalar score instead of aggregation; move/add tests

* refactor from `aggregation=['center', 'error']` to a scalar Score

* tests continued

* continue cleaning up scalar score instead of aggregation

* monkey-patch readthedocs

following #388

* add metric dependencies

numpy, scipy, scikit-learn

* remove tensorflow and keras from test dependencies

* monkey-patch readthedocs part 2

following #390

* fix previous commit's typo

* delete out-dated anatomy; fix metric loading

* delete outdated references

* fix ceiling values

* split up `test_setup.sh` into benchmark-specific s3 download (#393)

* split up `test_setup.sh` into benchmark-specific s3 download

* delete tests redundant with benchmark plugin

* add torch test dependency

* add torchvision dependency

* delete redundant setup.py

* move test into plugin

* mark `TestLayerSelection` as memory_intense

following OSError in https://app.travis-ci.com/github/brain-score/brain-score/jobs/612644470

* run plugin tests

* target TRAVIS_BRANCH for git diff

* checkout & diff on one line

* DEBUG: diff with FETCH_HEAD

* DEBUG integrate_core not found

* git diff against HEAD

* git diff against HEAD

* add pytest_check to test dependencies

* typo

* echo changed files

* config before fetch

* add conda

* remove outdated fixtures like `brainio_home`

* delete rdm tests

* continued test fixing

private access etc

* simplify more tests

* remove rdm/single benchmarks from registry

* fix metric suffix

* do not use credentials for `data_helpers` s3 download

* fix identifiers

* fall back to files in `brainio.contrib` bucket

* fix region parameter

* fix bucket for precomputed features download

* fix registry use; re-arrange Cadena data tests

---------

Co-authored-by: Katherine Fairchild <[email protected]>

* Prevent triggering Travis plugin tests for empty `git diff` results (#399)

* validate and fix tests (#403)

* remove redundant newline

* add missing imagenet2012.csv

* fix download on demand; unify testing

* optimize imports

* fix package name

* add models/__init__.py

* remove redundant/dead code

* use approx for cka equal 1

* Add models for testing model conversion helpers (based off #395) (#398)

* add s3 helpers for model download

* add integration tests and generic model test

* add migration scripts for converting zip submissions to 2.0

* add models (alexnet and pixels)

* add alexnet and pixels

* Fixed benchmark id in test_integration

* removed unnecessary migration csv files

---------

Co-authored-by: Martin Schrimpf <[email protected]>
Co-authored-by: Katherine Fairchild <[email protected]>
Co-authored-by: Khaled K Shehada <[email protected]>

* Revert "Add models for testing model conversion helpers (based off #395) (#398)"

This reverts commit 2f51df0.

* delete `lookup.csv` and entrypoint (#397)

* Add models for testing model conversion helpers (based off #395) (#408)

* add s3 helpers for model download

* add integration tests and generic model test

* add migration scripts for converting zip submissions to 2.0

* add models (alexnet and pixels)

* add alexnet and pixels

* Fixed benchmark id in test_integration

* removed unnecessary migration csv files

* Used a public benchmark id for integration testing

* Updated integration test expected scores for new benchmark id

* Marked integration tests memory-intense, otherwise not enough memory on Travis

* Updated test_models to test all existing models

---------

Co-authored-by: Martin Schrimpf <[email protected]>
Co-authored-by: Katherine Fairchild <[email protected]>

* integrate submission handling (#394)

* add vision endpoints

modeled after language

* add config and readme

same as language

* delete outdated components

* fix `conda_active` parameter

* write `comment` with layer commitment into `score.attrs`

* updated endpoints to be compatible with profiler functionality

* Added submission unit tests for endpoints and DB interaction

* Changed endpoint tests to use a public benchmark

---------

Co-authored-by: Khaled Shehada <[email protected]>
Co-authored-by: Khaled K Shehada <[email protected]>

* integrate remaining submission tests (#414)

* integrate submission tests

tests adapted:
* test_competition_field_set (from test_integration.test_competition_field)
* test_competition_field_not_set (from test_integration.test_competition_field_none)
* test_one_model_multiple_benchmarks (from test_integration.test_rerun_evaluation)
* add assertion for score comment (from test_integration.test_evaluation)

tests deleted because already present:
* test_integration.test_failure_evaluation
* all tests in test_submission.py

tests not incorporated:
* test_model_failure_evaluation -- this could be interesting to add in the future, when the scoring fails in the middle, leaving an in-progress database state

* format indent

* add `test_two_models_two_benchmarks` test

* fix querying

* use `None` instead of string `'None'` for competition field

following brain-score/core#63

* merge main into 2.0 integrate_core (#424)

* fix tutorial link (#347)

* add documentation for testing on precomputed features (#355)

* add documentation for testing on precomputed features

* re-add erroneously deleted content

* fix retrieval of model predictions

* make files public using predefined ACL

* include engineering benchmarks in standard tests again (#361)

* include engineering benchmarks in standard tests again

also fix typo

* fix pool keys access

* use new jenkins_id instead of id for submissions (#359)

* added support for jenkins_id write

* removed redundant comment

* added DB models specification changes

* Make sure directories created use jenkins_id and not id (#364)

* Hotfix: add jenkins_id to test_submission.py's tests (#366)

* Hotfix: add jenkins_id to test_submission.py's tests

* Fixed some tests

* Fixed more tests, another submission issue

* Islam2021 (#360)

* add Islam2021 packaging file

* add Dimensionality metric

* add Islam2021Dimensionality benchmark

* add islam2021 benchmarks to benchmark pool

* clean packaging file

* add Islam2021 benchmark tests

* add islam2021 stimuli test

* Fix typo in test_islam2021.py

* Add lookup.csv entry for neil.Islam2021

* Correct benchmark name in  brainscore/benchmarks/__init__.py

Co-authored-by: Martin Schrimpf <[email protected]>

* Change recorded bins to standard ones

Co-authored-by: Martin Schrimpf <[email protected]>

* Correct identifier of benchmark in islam2021.py

Co-authored-by: Martin Schrimpf <[email protected]>

* Fix parent benchmark name islam2021.py

Co-authored-by: Martin Schrimpf <[email protected]>

* Fix stimulus name in lookup.csv

Co-authored-by: Martin Schrimpf <[email protected]>

* Fix islam2021 test names in tests/test_benchmarks/test___init__.py

Co-authored-by: Martin Schrimpf <[email protected]>

* fix islam2021 stimuli test name in tests/test_stimuli.py

Co-authored-by: Martin Schrimpf <[email protected]>

* Fix islam2021 stimuli name in tests/test_stimuli.py

Co-authored-by: Martin Schrimpf <[email protected]>

* Fix benchmark_pool keys in test_islam2021.py

* Fix stimulus name

* Add private_access to islam tests

* Fix Islam2021 stimuli name in test_stimuli

* add private_access to test_islam2021 in test_stimuli

---------

Co-authored-by: Martin Schrimpf <[email protected]>

* Removed faulty model tools import line, removed other unused lines as well (#367)

* Add domain to new benchmark creation code for vision. (#368)

* Hotfix: Moved Islam to experimental pool, added domain to models.py (#369)

* Moved Islam to experimental pool, added domain to models.py

* Removed Islam2021 from engineering test pool

* let travis run private and public tests separately (#371)

previously, when private tests were run, travis ran private _and_ public tests, and then in a separate run only the public tests. This changes it so that _only_ private and _only_ public tests are run separately.

* remove outdated lookup_source (#372)

per @jjpr's advice

* add instructions for adding users to AWS (#373)

* document Python = 3.7 (instead of >= 3.7) for TF compatibility (#375)

* more extensively describe tasks and intended outputs (#374)

following the detailed description in [language](https://github.com/brain-score/language/blob/main/brainscore_language/artificial_subject.py)

* Add version to `.readthedocs.yml` (required) (#388)

* Add `build.os` to `readthedocs.yml` (#390)

* add build.os

* ubuntu 18.04

* ubuntu 20.04

* add submission creation to test_competition_field()

---------

Co-authored-by: Katherine Fairchild <[email protected]>

* merge main with Islam2021 benchmark

* add contributor code of conduct and badge (#405)

* Add odd_one_out task documentation (#404)

Co-authored-by: Martin Schrimpf <[email protected]>

* Domain-Transfer Benchmarks  (#416)

* preliminary analysis

* removed data assembly from exploration

* added dependencies for creating merged data assembly - no .nc files

* investigation on the validity of the merged assembly

* neural benchmark for domain-transfer

* finalized the merged assembly creation and added lines in lookup table

* added hook to cross regressed correlation to take care of background_id

* benchmarks and scoring files

* added results for analysis benchmark

* cleaned the analysis benchmark script

* addedd bibliography

* added the benchmarks to the pool

* addedd tests related script comments

* corrected typos

* Delete score-model-analysis.py 

not needed in the PR

* Delete score-model.py

not needed in the PR

* corrected names in the basic checks

* added a simple unit test for the neural assembly

* minor changes to fix Trevis run

* corrected accordingly to PR comments

* clean up

* add self test

---------

Co-authored-by: Ernesto Bocini <[email protected]>
Co-authored-by: Martin Schrimpf <[email protected]>

* remove space (#421)

* fix engineering/analysis benchmark for Igustibagus2024 (#423)

* autoformat

* autoimport

* code cosmetics

* fix ceiling

* add test for Igustibagus analysis

* aggregate score over domains

* add private_access flag

* finalize merging domain-transfer benchmarks

* mark as private

* update to use plugins

---------

Co-authored-by: Michael Ferguson <[email protected]>
Co-authored-by: Tiago Gaspar Oliveira <[email protected]>
Co-authored-by: SusanWYS <[email protected]>
Co-authored-by: kvfairchild <[email protected]>
Co-authored-by: Katherine Fairchild <[email protected]>
Co-authored-by: Linus <[email protected]>
Co-authored-by: Ernesto Bocini <[email protected]>
Co-authored-by: Ernesto Bocini <[email protected]>

* First pass of new submission docs (#396)

* First pass of new submission docs

* First round of Martin's PR comments

* Moved location of deb_schema.uml

* changed path for uml photo

* Update docs/source/modules/submission.rst

Co-authored-by: Martin Schrimpf <[email protected]>

* 2.0 updates to developer documentation (#418)

* 2.0 updates

* updated AWS env count number from 3 -> 2

* removed scoring process block

---------

Co-authored-by: Katherine Fairchild <[email protected]>
Co-authored-by: Mike Ferguson <[email protected]>

* fix links

* update model tutorial

quickstart only

* update benchmark tutorial

* fix links

---------

Co-authored-by: kvfairchild <[email protected]>
Co-authored-by: Martin Schrimpf <[email protected]>
Co-authored-by: Katherine Fairchild <[email protected]>
Co-authored-by: Martin Schrimpf <[email protected]>

* simplify alexnet and pixel models (#412)

* simplify alexnet and pixel models

from #408

* test jenkins testing for new plugins

* retrigger checks

* Trigger CI after Core update

* reset core path

---------

Co-authored-by: Katherine Fairchild <[email protected]>
Co-authored-by: Khaled K Shehada <[email protected]>

* update examples for 2.0 (#425)

* update data example

* update metrics example

* update benchmarks example

* combine data, metrics, and benchmarks notebooks

* add models example notebook

* add example for scoring

* fix README links (#426)

* register pytest markers in pyproject (#413)

* register pytest markers in pyproject

* remove duplicate markers definition

* americanize

Co-authored-by: kvfairchild <[email protected]>

* import script from core (#427)

Co-authored-by: Katherine Fairchild <[email protected]>

* remove lab identifiers from plugin identifiers (#402)

* remove lab identifiers from data plugins

* rename `BarbuMayo2019` -> `ObjectNet`

* remove lab prefix from standard region benchmarks

---------

Co-authored-by: kvfairchild <[email protected]>

* Setup GitHub Actions and Travis for automated submissions (#428)

* import script from core

* add action workflows

* trigger automerge for plugin-only web submissions

* model_type=Brain_Model

* python -> 3.7

* update repo to vision

* cleanup

* clarify imported scripts

---------

Co-authored-by: Katherine Fairchild <[email protected]>

---------

Co-authored-by: kvfairchild <[email protected]>
Co-authored-by: samwinebrake <[email protected]>
Co-authored-by: franzigeiger <[email protected]>
Co-authored-by: Sachi Sanghavi <[email protected]>
Co-authored-by: pmcgrath249 <[email protected]>
Co-authored-by: Michael Ferguson <[email protected]>
Co-authored-by: jjpr-mit <[email protected]>
Co-authored-by: Michael Ferguson <[email protected]>
Co-authored-by: Tiago Gaspar Oliveira <[email protected]>
Co-authored-by: SusanWYS <[email protected]>
Co-authored-by: Mike Ferguson <[email protected]>
Co-authored-by: Katherine Fairchild <[email protected]>
Co-authored-by: Khaled K Shehada <[email protected]>
Co-authored-by: Khaled Shehada <[email protected]>
Co-authored-by: Linus <[email protected]>
Co-authored-by: Ernesto Bocini <[email protected]>
Co-authored-by: Ernesto Bocini <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants