Add support for dataloader samplers #713

nkaenzig · 2024-11-22T10:31:33Z

Closes #712

Adds a samplers argument to eva.DataModule, so we can enable custom samplers in the dataloaders.
- This enables for instance label efficiency experiments, as we can reduce the number of training samples for the existing downstream tasks.
Adds a BalancedSampler which supports balanced class sample data loading for classification tasks.

How to use

Just add this to the init_args of the DataModule in your yaml config:

    samplers:
      train:
        class_path: eva.core.data.samplers.classification.BalancedSampler
        init_args:
          num_samples: 10

(For online mode, specify the sampler in samplers.train:, while for offline mode in samplers.predict:).

Make sure that shuffle: false for the dataloader config of the corresponding split.

nkaenzig added 4 commits November 22, 2024 10:10

add base class for map-style datasets

42e771b

added sampler to dataloader init in datamodule

5d26854

added RandomSampler

bd4767f

added missing docstrings & formatting

c8545a7

nkaenzig linked an issue Nov 22, 2024 that may be closed by this pull request

Add support for label efficiency evals #712

Open

nkaenzig added 3 commits November 22, 2024 14:18

Added BalancedSampler for classification

65ce5ad

reverted changes to bach yaml

6dcce48

added unit tests

db55b5a

nkaenzig marked this pull request as ready for review November 22, 2024 14:34

nkaenzig requested review from ioangatop and roman807 November 22, 2024 14:45

fixed support for offline mode

cd90487

nkaenzig self-assigned this Nov 27, 2024

fixed bug in predict_dataloader

572905d