diff --git a/dev/search/search_index.json b/dev/search/search_index.json index 120fb2a1..d4a4b8b8 100644 --- a/dev/search/search_index.json +++ b/dev/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Introduction","text":""},{"location":"#_1","title":"Introduction","text":"
Oncology FM Evaluation Framework by kaiko.ai
With the first release, eva supports performance evaluation for vision Foundation Models (\"FMs\") and supervised machine learning models on WSI-patch-level image classification task. Support for radiology (CT-scans) segmentation tasks will be added soon.
With eva we provide the open-source community with an easy-to-use framework that follows industry best practices to deliver a robust, reproducible and fair evaluation benchmark across FMs of different sizes and architectures.
Support for additional modalities and tasks will be added in future releases.
"},{"location":"#use-cases","title":"Use cases","text":""},{"location":"#1-evaluate-your-own-fms-on-public-benchmark-datasets","title":"1. Evaluate your own FMs on public benchmark datasets","text":"With a specified FM as input, you can run eva on several publicly available datasets & tasks. One evaluation run will download and preprocess the relevant data, compute embeddings, fit and evaluate a downstream head and report the mean and standard deviation of the relevant performance metrics.
Supported datasets & tasks include:
WSI patch-level pathology datasets
Radiology datasets
To evaluate FMs, eva provides support for different model-formats, including models trained with PyTorch, models available on HuggingFace and ONNX-models. For other formats custom wrappers can be implemented.
"},{"location":"#2-evaluate-ml-models-on-your-own-dataset-task","title":"2. Evaluate ML models on your own dataset & task","text":"If you have your own labeled dataset, all that is needed is to implement a dataset class tailored to your source data. Start from one of our out-of-the box provided dataset classes, adapt it to your data and run eva to see how different FMs perform on your task.
"},{"location":"#evaluation-results","title":"Evaluation results","text":"We evaluated the following FMs on the 4 supported WSI-patch-level image classification tasks. On the table below we report Balanced Accuracy for binary & multiclass tasks and show the average performance & standard deviation over 5 runs.
FM-backbone pretraining BACH CRC MHIST PCam/val PCam/test DINO ViT-S16 N/A 0.410 (\u00b10.009) 0.617 (\u00b10.008) 0.501 (\u00b10.004) 0.753 (\u00b10.002) 0.728 (\u00b10.003) DINO ViT-S16 ImageNet 0.695 (\u00b10.004) 0.935 (\u00b10.003) 0.831 (\u00b10.002) 0.864 (\u00b10.007) 0.849 (\u00b10.007) DINO ViT-B8 ImageNet 0.710 (\u00b10.007) 0.939 (\u00b10.001) 0.814 (\u00b10.003) 0.870 (\u00b10.003) 0.856 (\u00b10.004) DINOv2 ViT-L14 ImageNet 0.707 (\u00b10.008) 0.916 (\u00b10.002) 0.832 (\u00b10.003) 0.873 (\u00b10.001) 0.888 (\u00b10.001) Lunit - ViT-S16 TCGA 0.801 (\u00b10.005) 0.934 (\u00b10.001) 0.768 (\u00b10.004) 0.889 (\u00b10.002) 0.895 (\u00b10.006) Owkin - iBOT ViT-B16 TCGA 0.725 (\u00b10.004) 0.935 (\u00b10.001) 0.777 (\u00b10.005) 0.912 (\u00b10.002) 0.915 (\u00b10.003) UNI - DINOv2 ViT-L16 Mass-100k 0.814 (\u00b10.008) 0.950 (\u00b10.001) 0.837 (\u00b10.001) 0.936 (\u00b10.001) 0.938 (\u00b10.001) kaiko.ai - DINO ViT-S16 TCGA 0.797 (\u00b10.003) 0.943 (\u00b10.001) 0.828 (\u00b10.003) 0.903 (\u00b10.001) 0.893 (\u00b10.005) kaiko.ai - DINO ViT-S8 TCGA 0.834 (\u00b10.012) 0.946 (\u00b10.002) 0.832 (\u00b10.006) 0.897 (\u00b10.001) 0.887 (\u00b10.002) kaiko.ai - DINO ViT-B16 TCGA 0.810 (\u00b10.008) 0.960 (\u00b10.001) 0.826 (\u00b10.003) 0.900 (\u00b10.002) 0.898 (\u00b10.003) kaiko.ai - DINO ViT-B8 TCGA 0.865 (\u00b10.019) 0.956 (\u00b10.001) 0.809 (\u00b10.021) 0.913 (\u00b10.001) 0.921 (\u00b10.002) kaiko.ai - DINOv2 ViT-L14 TCGA 0.870 (\u00b10.005) 0.930 (\u00b10.001) 0.809 (\u00b10.001) 0.908 (\u00b10.001) 0.898 (\u00b10.002)
The runs use the default setup described in the section below.
eva trains the decoder on the \"train\" split and uses the \"validation\" split for monitoring, early stopping and checkpoint selection. Evaluation results are reported on the \"validation\" split and, if available, on the \"test\" split.
For more details on the FM-backbones and instructions to replicate the results, check out Replicate evaluations.
"},{"location":"#evaluation-setup","title":"Evaluation setup","text":"Note that the current version of eva implements the task- & model-independent and fixed default set up following the standard evaluation protocol proposed by [1] and described in the table below. We selected this approach to prioritize reliable, robust and fair FM-evaluation while being in line with common literature. Additionally, with future versions we are planning to allow the use of cross-validation and hyper-parameter tuning to find the optimal setup to achieve best possible performance on the implemented downstream tasks.
With a provided FM, eva computes embeddings for all input images (WSI patches) which are then used to train a downstream head consisting of a single linear layer in a supervised setup for each of the benchmark datasets. We use early stopping with a patience of 5% of the maximal number of epochs.
Backbone frozen Hidden layers none Dropout 0.0 Activation function none Number of steps 12,500 Base Batch size 4,096 Batch size dataset specific* Base learning rate 0.01 Learning Rate [Base learning rate] * [Batch size] / [Base batch size] Max epochs [Number of samples] * [Number of steps] / [Batch size] Early stopping 5% * [Max epochs] Optimizer SGD Momentum 0.9 Weight Decay 0.0 Nesterov momentum true LR Schedule Cosine without warmup* For smaller datasets (e.g. BACH with 400 samples) we reduce the batch size to 256 and scale the learning rate accordingly.
eva is distributed under the terms of the Apache-2.0 license.
"},{"location":"#next-steps","title":"Next steps","text":"Check out the User Guide to get started with eva
"},{"location":"CODE_OF_CONDUCT/","title":"Contributor Covenant Code of Conduct","text":""},{"location":"CODE_OF_CONDUCT/#our-pledge","title":"Our Pledge","text":"In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
"},{"location":"CODE_OF_CONDUCT/#our-standards","title":"Our Standards","text":"Examples of behavior that contributes to creating a positive environment include:
Examples of unacceptable behavior by participants include:
Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
"},{"location":"CODE_OF_CONDUCT/#scope","title":"Scope","text":"This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
"},{"location":"CODE_OF_CONDUCT/#enforcement","title":"Enforcement","text":"Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at eva@kaiko.ai. All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
"},{"location":"CODE_OF_CONDUCT/#attribution","title":"Attribution","text":"This Code of Conduct is adapted from the Contributor Covenant, version 1.4, available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
For answers to common questions about this code of conduct, see https://www.contributor-covenant.org/faq
"},{"location":"CONTRIBUTING/","title":"Contributing to eva","text":"eva is open source and community contributions are welcome!
"},{"location":"CONTRIBUTING/#contribution-process","title":"Contribution Process","text":""},{"location":"CONTRIBUTING/#github-issues","title":"GitHub Issues","text":"The eva contribution process generally starts with filing a GitHub issue.
eva defines four categories of issues: feature requests, bug reports, documentation fixes, and installation issues. In general, we recommend waiting for feedback from a eva maintainer or community member before proceeding to implement a feature or patch.
"},{"location":"CONTRIBUTING/#pull-requests","title":"Pull Requests","text":"After you have agreed upon an implementation strategy for your feature or patch with an eva maintainer, the next step is to introduce your changes as a pull request against the eva repository.
Steps to make a pull request:
main
branchmain
branch of https://github.com/kaiko-ai/evaOnce your pull request has been merged, your changes will be automatically included in the next eva release!
"},{"location":"DEVELOPER_GUIDE/","title":"Developer Guide","text":""},{"location":"DEVELOPER_GUIDE/#setting-up-a-dev-environment","title":"Setting up a DEV environment","text":"We use PDM as a package and dependency manager. You can set up a local python environment for development as follows: 1. Install package and dependency manager PDM following the instructions here. 2. Install system dependencies - For MacOS: brew install Cmake
- For Linux (Debian): sudo apt-get install build-essential cmake
3. Run pdm install -G dev
to install the python dependencies. This will create a virtual environment in eva/.venv
.
Add a new dependency to the core
submodule: pdm add <package_name>
Add a new dependency to the vision
submodule: pdm add -G vision -G all <package_name>
For more information about managing dependencies please look here.
"},{"location":"DEVELOPER_GUIDE/#continuous-integration-ci","title":"Continuous Integration (CI)","text":"For testing automation, we use nox
.
Installation: - with brew: brew install nox
- with pip: pip install --user --upgrade nox
(this way, you might need to run nox commands with python -m nox
or specify an alias)
Commands: - nox
to run all the automation tests. - nox -s fmt
to run the code formatting tests. - nox -s lint
to run the code lining tests. - nox -s check
to run the type-annotation tests. - nox -s test
to run the unit tests. - nox -s test -- tests/eva/metrics/test_average_loss.py
to run specific tests
This document contains our style guides used in eva
.
Our priority is consistency, so that developers can quickly ingest and understand the entire codebase without being distracted by style idiosyncrasies.
"},{"location":"STYLE_GUIDE/#general-coding-principles","title":"General coding principles","text":"Q: How to keep code readable and maintainable? - Don't Repeat Yourself (DRY) - Use the lowest possible visibility for a variable or method (i.e. make private if possible) -- see Information Hiding / Encapsulation
Q: How big should a function be? - Single Level of Abstraction Principle (SLAP) - High Cohesion and Low Coupling
TL;DR: functions should usually be quite small, and _do one thing_\n
"},{"location":"STYLE_GUIDE/#python-style-guide","title":"Python Style Guide","text":"In general we follow the following regulations: PEP8, the Google Python Style Guide and we expect type hints/annotations.
"},{"location":"STYLE_GUIDE/#docstrings","title":"Docstrings","text":"Our docstring style is derived from Google Python style.
def example_function(variable: int, optional: str | None = None) -> str:\n \"\"\"An example docstring that explains what this functions do.\n\n Docs sections can be referenced via :ref:`custom text here <anchor-link>`.\n\n Classes can be referenced via :class:`eva.data.datamodules.DataModule`.\n\n Functions can be referenced via :func:`eva.data.datamodules.call.call_method_if_exists`.\n\n Example:\n\n >>> from torch import nn\n >>> import eva\n >>> eva.models.modules.HeadModule(\n >>> head=nn.Linear(10, 2),\n >>> criterion=nn.CrossEntropyLoss(),\n >>> )\n\n Args:\n variable: A required argument.\n optional: An optional argument.\n\n Returns:\n A description of the output string.\n \"\"\"\n pass\n
"},{"location":"STYLE_GUIDE/#module-docstrings","title":"Module docstrings","text":"PEP-8 and PEP-257 indicate docstrings should have very specific syntax:
\"\"\"One line docstring that shouldn't wrap onto next line.\"\"\"\n
\"\"\"First line of multiline docstring that shouldn't wrap.\n\nSubsequent line or paragraphs.\n\"\"\"\n
"},{"location":"STYLE_GUIDE/#constants-docstrings","title":"Constants docstrings","text":"Public constants should usually have docstrings. Optional on private constants. Docstrings on constants go underneath
SOME_CONSTANT = 3\n\"\"\"Either a single-line docstring or multiline as per above.\"\"\"\n
"},{"location":"STYLE_GUIDE/#function-docstrings","title":"Function docstrings","text":"All public functions should have docstrings following the pattern shown below.
Each section can be omitted if there are no inputs, outputs, or no notable exceptions raised, respectively.
def fake_datamodule(\n n_samples: int, random: bool = True\n) -> eva.data.datamodules.DataModule:\n \"\"\"Generates a fake DataModule.\n\n It builds a :class:`eva.data.datamodules.DataModule` by generating\n a fake dataset with generated data while fixing the seed. It can\n be useful for debugging purposes.\n\n Args:\n n_samples: The number of samples of the generated datasets.\n random: Whether to generated randomly.\n\n Returns:\n A :class:`eva.data.datamodules.DataModule` with generated random data.\n\n Raises:\n ValueError: If `n_samples` is `0`.\n \"\"\"\n pass\n
"},{"location":"STYLE_GUIDE/#class-docstrings","title":"Class docstrings","text":"All public classes should have class docstrings following the pattern shown below.
class DataModule(pl.LightningDataModule):\n \"\"\"DataModule encapsulates all the steps needed to process data.\n\n It will initialize and create the mapping between dataloaders and\n datasets. During the `prepare_data`, `setup` and `teardown`, the\n datamodule will call the respectively methods from all the datasets,\n given that they are defined.\n \"\"\"\n\n def __init__(\n self,\n datasets: schemas.DatasetsSchema | None = None,\n dataloaders: schemas.DataloadersSchema | None = None,\n ) -> None:\n \"\"\"Initializes the datamodule.\n\n Args:\n datasets: The desired datasets. Defaults to `None`.\n dataloaders: The desired dataloaders. Defaults to `None`.\n \"\"\"\n pass\n
"},{"location":"datasets/","title":"Datasets","text":"eva provides native support for several public datasets. When possible, the corresponding dataset classes facilitate automatic download to disk, if not possible, this documentation provides download instructions.
"},{"location":"datasets/#vision-datasets-overview","title":"Vision Datasets Overview","text":""},{"location":"datasets/#whole-slide-wsi-and-microscopy-image-datasets","title":"Whole Slide (WSI) and microscopy image datasets","text":"Dataset #Patches Patch Size Magnification (\u03bcm/px) Task Cancer Type BACH 400 2048x1536 20x (0.5) Classification (4 classes) Breast CRC 107,180 224x224 20x (0.5) Classification (9 classes) Colorectal PatchCamelyon 327,680 96x96 10x (1.0) * Classification (2 classes) Breast MHIST 3,152 224x224 5x (2.0) * Classification (2 classes) Colorectal Polyp* Downsampled from 40x (0.25 \u03bcm/px) to increase the field of view.
"},{"location":"datasets/#radiology-datasets","title":"Radiology datasets","text":"Dataset #Images Image Size Task Download provided TotalSegmentator 1228 ~300 x ~300 x ~350 * Multilabel Classification (117 classes) Yes* 3D images of varying sizes
"},{"location":"datasets/bach/","title":"BACH","text":"The BACH dataset consists of microscopy and WSI images, of which we use only the microscopy images. These are 408 labeled images from 4 classes (\"Normal\", \"Benign\", \"Invasive\", \"InSitu\"). This dataset was used for the \"BACH Grand Challenge on Breast Cancer Histology images\".
"},{"location":"datasets/bach/#raw-data","title":"Raw data","text":""},{"location":"datasets/bach/#key-stats","title":"Key stats","text":"Modality Vision (microscopy images) Task Multiclass classification (4 classes) Cancer type Breast Data size total: 10.4GB / data in use: 7.37 GB (18.9 MB per image) Image dimension 1536 x 2048 x 3 Magnification (\u03bcm/px) 20x (0.42) Files format.tif
images Number of images 408 (102 from each class) Splits in use one labeled split"},{"location":"datasets/bach/#organization","title":"Organization","text":"The data ICIAR2018_BACH_Challenge.zip
from zenodo is organized as follows:
ICAR2018_BACH_Challenge\n\u251c\u2500\u2500 Photos # All labeled patches used by eva\n\u2502 \u251c\u2500\u2500 Normal\n\u2502 \u2502 \u251c\u2500\u2500 n032.tif\n\u2502 \u2502 \u2514\u2500\u2500 ...\n\u2502 \u251c\u2500\u2500 Benign\n\u2502 \u2502 \u2514\u2500\u2500 ...\n\u2502 \u251c\u2500\u2500 Invasive\n\u2502 \u2502 \u2514\u2500\u2500 ...\n\u2502 \u251c\u2500\u2500 InSitu\n\u2502 \u2502 \u2514\u2500\u2500 ...\n\u251c\u2500\u2500 WSI # WSIs, not in use\n\u2502 \u251c\u2500\u2500 ...\n\u2514\u2500\u2500 ...\n
"},{"location":"datasets/bach/#download-and-preprocessing","title":"Download and preprocessing","text":"The BACH
dataset class supports downloading the data during runtime by setting the init argument download=True
.
Note that in the provided BACH
-config files the download argument is set to false
. To enable automatic download you will need to open the config and set download: true
.
The splits are created from the indices specified in the BACH dataset class. These indices were picked to prevent data leakage due to images belonging to the same patient. Because the small dataset in combination with the patient ID constraint does not allow to split the data three-ways with sufficient amount of data in each split, we only create a train and val split and leave it to the user to submit predictions on the official test split to the BACH Challenge Leaderboard.
Splits Train Validation #Samples 268 (67%) 132 (33%)"},{"location":"datasets/bach/#relevant-links","title":"Relevant links","text":"Attribution-NonCommercial-ShareAlike 4.0 International
"},{"location":"datasets/crc/","title":"CRC","text":"The CRC-HE dataset consists of labeled patches (9 classes) from colorectal cancer (CRC) and normal tissue. We use the NCT-CRC-HE-100K
dataset for training and validation and the CRC-VAL-HE-7K for testing
.
The NCT-CRC-HE-100K-NONORM
consists of 100,000 images without applied color normalization. The CRC-VAL-HE-7K
consists of 7,180 image patches from 50 patients without overlap with NCT-CRC-HE-100K-NONORM
.
The tissue classes are: Adipose (ADI), background (BACK), debris (DEB), lymphocytes (LYM), mucus (MUC), smooth muscle (MUS), normal colon mucosa (NORM), cancer-associated stroma (STR) and colorectal adenocarcinoma epithelium (TUM)
"},{"location":"datasets/crc/#raw-data","title":"Raw data","text":""},{"location":"datasets/crc/#key-stats","title":"Key stats","text":"Modality Vision (WSI patches) Task Multiclass classification (9 classes) Cancer type Colorectal Data size total: 11.7GB (train), 800MB (val) Image dimension 224 x 224 x 3 Magnification (\u03bcm/px) 20x (0.5) Files format.tif
images Number of images 107,180 (100k train, 7.2k val) Splits in use NCT-CRC-HE-100K (train), CRC-VAL-HE-7K (val)"},{"location":"datasets/crc/#splits","title":"Splits","text":"We use the splits according to the data sources:
NCT-CRC-HE-100K
CRC-VAL-HE-7K
A test split is not provided. Because the patient information for the training data is not available, dividing the training data in a train/val split (and using the given val split as test split) is not possible without risking data leakage. eva therefore reports evaluation results for CRC HE on the validation split.
"},{"location":"datasets/crc/#organization","title":"Organization","text":"The data NCT-CRC-HE-100K.zip
, NCT-CRC-HE-100K-NONORM.zip
and CRC-VAL-HE-7K.zip
from zenodo are organized as follows:
NCT-CRC-HE-100K # All images used for training\n\u251c\u2500\u2500 ADI # All labeled patches belonging to the 1st class\n\u2502 \u251c\u2500\u2500 ADI-AAAFLCLY.tif\n\u2502 \u251c\u2500\u2500 ...\n\u251c\u2500\u2500 BACK # All labeled patches belonging to the 2nd class\n\u2502 \u251c\u2500\u2500 ...\n\u2514\u2500\u2500 ...\n\nNCT-CRC-HE-100K-NONORM # All images used for training\n\u251c\u2500\u2500 ADI # All labeled patches belonging to the 1st class\n\u2502 \u251c\u2500\u2500 ADI-AAAFLCLY.tif\n\u2502 \u251c\u2500\u2500 ...\n\u251c\u2500\u2500 BACK # All labeled patches belonging to the 2nd class\n\u2502 \u251c\u2500\u2500 ...\n\u2514\u2500\u2500 ...\n\nCRC-VAL-HE-7K # All images used for validation\n\u251c\u2500\u2500 ... # identical structure as for NCT-CRC-HE-100K-NONORM\n\u2514\u2500\u2500 ...\n
"},{"location":"datasets/crc/#download-and-preprocessing","title":"Download and preprocessing","text":"The CRC
dataset class supports downloading the data during runtime by setting the init argument download=True
.
Note that in the provided CRC
-config files the download argument is set to false
. To enable automatic download you will need to open the config and set download: true
.
CC BY 4.0 LEGAL CODE
"},{"location":"datasets/mhist/","title":"MHIST","text":"MHIST is a binary classification task which comprises of 3,152 hematoxylin and eosin (H&E)-stained Formalin Fixed Paraffin-Embedded (FFPE) fixed-size images (224 by 224 pixels) of colorectal polyps from the Department of Pathology and Laboratory Medicine at Dartmouth-Hitchcock Medical Center (DHMC).
The tissue classes are: Hyperplastic Polyp (HP), Sessile Serrated Adenoma (SSA). This classification task focuses on the clinically-important binary distinction between HPs and SSAs, a challenging problem with considerable inter-pathologist variability. HPs are typically benign, while sessile serrated adenomas are precancerous lesions that can turn into cancer if left untreated and require sooner follow-up examinations. Histologically, HPs have a superficial serrated architecture and elongated crypts, whereas SSAs are characterized by broad-based crypts, often with complex structure and heavy serration.
"},{"location":"datasets/mhist/#raw-data","title":"Raw data","text":""},{"location":"datasets/mhist/#key-stats","title":"Key stats","text":"Modality Vision (WSI patches) Task Binary classification (2 classes) Cancer type Colorectal Polyp Data size 354 MB Image dimension 224 x 224 x 3 Magnification (\u03bcm/px) 5x (2.0) * Files format.png
images Number of images 3,152 (2,175 train, 977 test) Splits in use annotations.csv (train / test) * Downsampled from 40x to increase the field of view.
"},{"location":"datasets/mhist/#organization","title":"Organization","text":"The contents from images.zip
and the file annotations.csv
from bmirds are organized as follows:
mhist # Root folder\n\u251c\u2500\u2500 images # All the dataset images\n\u2502 \u251c\u2500\u2500 MHIST_aaa.png\n\u2502 \u251c\u2500\u2500 MHIST_aab.png\n\u2502 \u251c\u2500\u2500 ...\n\u2514\u2500\u2500 annotations.csv # The dataset annotations file\n
"},{"location":"datasets/mhist/#download-and-preprocessing","title":"Download and preprocessing","text":"To download the dataset, please visit the access portal on BMIRDS and follow the instructions. You will then receive an email with all the relative links that you can use to download the data (images.zip
, annotations.csv
, Dataset Research Use Agreement.pdf
and MD5SUMs.txt
).
Please create a root folder, e.g. mhist
, and download all the files there, which unzipping the contents of images.zip
to a directory named images
inside your root folder (i.e. mhist/images
). Afterwards, you can (optionally) delete the images.zip
file.
We work with the splits provided by the data source. Since no \"validation\" split is provided, we use the \"test\" split as validation split.
annotations.csv
:: \"Partition\" == \"train\"annotations.csv
:: \"Partition\" == \"test\"The PatchCamelyon benchmark is an image classification dataset with 327,680 color images (96 x 96px) extracted from histopathologic scans of lymph node sections. Each image is annotated with a binary label indicating presence of metastatic tissue.
"},{"location":"datasets/patch_camelyon/#raw-data","title":"Raw data","text":""},{"location":"datasets/patch_camelyon/#key-stats","title":"Key stats","text":"Modality Vision (WSI patches) Task Binary classification Cancer type Breast Data size 8 GB Image dimension 96 x 96 x 3 Magnification (\u03bcm/px) 10x (1.0) * Files formath5
Number of images 327,680 (50% of each class) * The slides were acquired and digitized at 2 different medical centers using a 40x objective but under-sampled to 10x to increase the field of view.
"},{"location":"datasets/patch_camelyon/#splits","title":"Splits","text":"The data source provides train/validation/test splits
Splits Train Validation Test #Samples 262,144 (80%) 32,768 (10%) 32,768 (10%)"},{"location":"datasets/patch_camelyon/#organization","title":"Organization","text":"The PatchCamelyon data from zenodo is organized as follows:
\u251c\u2500\u2500 camelyonpatch_level_2_split_train_x.h5.gz # train images\n\u251c\u2500\u2500 camelyonpatch_level_2_split_train_y.h5.gz # train labels\n\u251c\u2500\u2500 camelyonpatch_level_2_split_valid_x.h5.gz # val images\n\u251c\u2500\u2500 camelyonpatch_level_2_split_valid_y.h5.gz # val labels\n\u251c\u2500\u2500 camelyonpatch_level_2_split_test_x.h5.gz # test images\n\u251c\u2500\u2500 camelyonpatch_level_2_split_test_y.h5.gz # test labels\n
"},{"location":"datasets/patch_camelyon/#download-and-preprocessing","title":"Download and preprocessing","text":"The dataset class PatchCamelyon
supports downloading the data during runtime by setting the init argument download=True
.
Note that in the provided PatchCamelyon
-config files the download argument is set to false
. To enable automatic download you will need to open the config and set download: true
.
Labels are provided by source files, splits are given by file names.
"},{"location":"datasets/patch_camelyon/#relevant-links","title":"Relevant links","text":"@misc{b_s_veeling_j_linmans_j_winkens_t_cohen_2018_2546921,\n author = {B. S. Veeling, J. Linmans, J. Winkens, T. Cohen, M. Welling},\n title = {Rotation Equivariant CNNs for Digital Pathology},\n month = sep,\n year = 2018,\n doi = {10.1007/978-3-030-00934-2_24},\n url = {https://doi.org/10.1007/978-3-030-00934-2_24}\n}\n
"},{"location":"datasets/patch_camelyon/#license","title":"License","text":"Creative Commons Zero v1.0 Universal
"},{"location":"datasets/total_segmentator/","title":"TotalSegmentator","text":"The TotalSegmentator dataset is a radiology image-segmentation dataset with 1228 3D images and corresponding masks with 117 different anatomical structures. It can be used for segmentation and multilabel classification tasks.
"},{"location":"datasets/total_segmentator/#raw-data","title":"Raw data","text":""},{"location":"datasets/total_segmentator/#key-stats","title":"Key stats","text":"Modality Vision (radiology, CT scans) Task Segmentation / multilabel classification (117 classes) Data size total: 23.6GB Image dimension ~300 x ~300 x ~350 (number of slices) x 1 (grey scale) * Files format.nii
(\"NIFTI\") images Number of images 1228 Splits in use one labeled split /* image resolution and number of slices per image vary
"},{"location":"datasets/total_segmentator/#organization","title":"Organization","text":"The data Totalsegmentator_dataset_v201.zip
from zenodo is organized as follows:
Totalsegmentator_dataset_v201\n\u251c\u2500\u2500 s0011 # one image\n\u2502 \u251c\u2500\u2500 ct.nii.gz # CT scan\n\u2502 \u251c\u2500\u2500 segmentations # directory with segmentation masks\n\u2502 \u2502 \u251c\u2500\u2500 adrenal_gland_left.nii.gz # segmentation mask 1st anatomical structure\n\u2502 \u2502 \u251c\u2500\u2500 adrenal_gland_right.nii.gz # segmentation mask 2nd anatomical structure\n\u2502 \u2502 \u2514\u2500\u2500 ...\n\u2514\u2500\u2500 ...\n
"},{"location":"datasets/total_segmentator/#download-and-preprocessing","title":"Download and preprocessing","text":"TotalSegmentator
supports download the data on runtime with the initialized argument download: bool = True
. TotalSegmentator
class creates a manifest file with one row/slice and the columns: path
, slice
, split
and additional 117 columns for each class.Creative Commons Attribution 4.0 International
"},{"location":"reference/","title":"Reference API","text":"Here is the Reference API, describing the classes, functions, parameters and attributes of the eva package.
To learn how to use eva, however, its best to get started with the User Guide
"},{"location":"reference/core/callbacks/","title":"Callbacks","text":""},{"location":"reference/core/callbacks/#writers","title":"Writers","text":""},{"location":"reference/core/callbacks/#eva.core.callbacks.writers.EmbeddingsWriter","title":"eva.core.callbacks.writers.EmbeddingsWriter
","text":" Bases: BasePredictionWriter
Callback for writing generated embeddings to disk.
This callback writes the embedding files in a separate process to avoid blocking the main process where the model forward pass is executed.
Parameters:
Name Type Description Defaultoutput_dir
str
The directory where the embeddings will be saved.
requiredbackbone
Module | None
A model to be used as feature extractor. If None
, it will be expected that the input batch returns the features directly.
None
dataloader_idx_map
Dict[int, str] | None
A dictionary mapping dataloader indices to their respective names (e.g. train, val, test).
None
group_key
str | None
The metadata key to group the embeddings by. If specified, the embedding files will be saved in subdirectories named after the group_key. If specified, the key must be present in the metadata of the input batch.
None
overwrite
bool
Whether to overwrite the output directory. Defaults to True.
True
Source code in src/eva/core/callbacks/writers/embeddings.py
def __init__(\n self,\n output_dir: str,\n backbone: nn.Module | None = None,\n dataloader_idx_map: Dict[int, str] | None = None,\n group_key: str | None = None,\n overwrite: bool = True,\n) -> None:\n \"\"\"Initializes a new EmbeddingsWriter instance.\n\n This callback writes the embedding files in a separate process to avoid blocking the\n main process where the model forward pass is executed.\n\n Args:\n output_dir: The directory where the embeddings will be saved.\n backbone: A model to be used as feature extractor. If `None`,\n it will be expected that the input batch returns the features directly.\n dataloader_idx_map: A dictionary mapping dataloader indices to their respective\n names (e.g. train, val, test).\n group_key: The metadata key to group the embeddings by. If specified, the\n embedding files will be saved in subdirectories named after the group_key.\n If specified, the key must be present in the metadata of the input batch.\n overwrite: Whether to overwrite the output directory. Defaults to True.\n \"\"\"\n super().__init__(write_interval=\"batch\")\n\n self._output_dir = output_dir\n self._backbone = backbone\n self._dataloader_idx_map = dataloader_idx_map or {}\n self._group_key = group_key\n self._overwrite = overwrite\n\n self._write_queue: multiprocessing.Queue\n self._write_process: eva_multiprocessing.Process\n
"},{"location":"reference/core/interface/","title":"Interface API","text":"Reference information for the Interface
API.
eva.Interface
","text":"A high-level interface for training and validating a machine learning model.
This class provides a convenient interface to connect a model, data, and trainer to train and validate a model.
"},{"location":"reference/core/interface/#eva.Interface.fit","title":"fit
","text":"Perform model training and evaluation out-of-place.
This method uses the specified trainer to fit the model using the provided data.
Example use cases:
Parameters:
Name Type Description Defaulttrainer
Trainer
The base trainer to use but not modify.
requiredmodel
ModelModule
The model module to use but not modify.
requireddata
DataModule
The data module.
required Source code insrc/eva/core/interface/interface.py
def fit(\n self,\n trainer: eva_trainer.Trainer,\n model: modules.ModelModule,\n data: datamodules.DataModule,\n) -> None:\n \"\"\"Perform model training and evaluation out-of-place.\n\n This method uses the specified trainer to fit the model using the provided data.\n\n Example use cases:\n\n - Using a model consisting of a frozen backbone and a head, the backbone will generate\n the embeddings on the fly which are then used as input features to train the head on\n the downstream task specified by the given dataset.\n - Fitting only the head network using a dataset that loads pre-computed embeddings.\n\n Args:\n trainer: The base trainer to use but not modify.\n model: The model module to use but not modify.\n data: The data module.\n \"\"\"\n trainer.run_evaluation_session(model=model, datamodule=data)\n
"},{"location":"reference/core/interface/#eva.Interface.predict","title":"predict
","text":"Perform model prediction out-of-place.
This method performs inference with a pre-trained foundation model to compute embeddings.
Parameters:
Name Type Description Defaulttrainer
Trainer
The base trainer to use but not modify.
requiredmodel
ModelModule
The model module to use but not modify.
requireddata
DataModule
The data module.
required Source code insrc/eva/core/interface/interface.py
def predict(\n self,\n trainer: eva_trainer.Trainer,\n model: modules.ModelModule,\n data: datamodules.DataModule,\n) -> None:\n \"\"\"Perform model prediction out-of-place.\n\n This method performs inference with a pre-trained foundation model to compute embeddings.\n\n Args:\n trainer: The base trainer to use but not modify.\n model: The model module to use but not modify.\n data: The data module.\n \"\"\"\n eva_trainer.infer_model(\n base_trainer=trainer,\n base_model=model,\n datamodule=data,\n return_predictions=False,\n )\n
"},{"location":"reference/core/interface/#eva.Interface.predict_fit","title":"predict_fit
","text":"Combines the predict and fit commands in one method.
This method performs the following two steps: 1. predict: perform inference with a pre-trained foundation model to compute embeddings. 2. fit: training the head network using the embeddings generated in step 1.
Parameters:
Name Type Description Defaulttrainer
Trainer
The base trainer to use but not modify.
requiredmodel
ModelModule
The model module to use but not modify.
requireddata
DataModule
The data module.
required Source code insrc/eva/core/interface/interface.py
def predict_fit(\n self,\n trainer: eva_trainer.Trainer,\n model: modules.ModelModule,\n data: datamodules.DataModule,\n) -> None:\n \"\"\"Combines the predict and fit commands in one method.\n\n This method performs the following two steps:\n 1. predict: perform inference with a pre-trained foundation model to compute embeddings.\n 2. fit: training the head network using the embeddings generated in step 1.\n\n Args:\n trainer: The base trainer to use but not modify.\n model: The model module to use but not modify.\n data: The data module.\n \"\"\"\n self.predict(trainer=trainer, model=model, data=data)\n self.fit(trainer=trainer, model=model, data=data)\n
"},{"location":"reference/core/data/dataloaders/","title":"Dataloaders","text":"Reference information for the Dataloader
classes.
eva.data.DataLoader
dataclass
","text":"The DataLoader
combines a dataset and a sampler.
It provides an iterable over the given dataset.
"},{"location":"reference/core/data/dataloaders/#eva.data.DataLoader.batch_size","title":"batch_size: int | None = 1
class-attribute
instance-attribute
","text":"How many samples per batch to load.
Set to None
for iterable dataset where dataset produces batches.
shuffle: bool = False
class-attribute
instance-attribute
","text":"Whether to shuffle the data at every epoch.
"},{"location":"reference/core/data/dataloaders/#eva.data.DataLoader.sampler","title":"sampler: samplers.Sampler | None = None
class-attribute
instance-attribute
","text":"Defines the strategy to draw samples from the dataset.
Can be any Iterable with __len__
implemented. If specified, shuffle must not be specified.
batch_sampler: samplers.Sampler | None = None
class-attribute
instance-attribute
","text":"Like sampler
, but returns a batch of indices at a time.
Mutually exclusive with batch_size
, shuffle
, sampler
and drop_last
.
num_workers: int = multiprocessing.cpu_count()
class-attribute
instance-attribute
","text":"How many workers to use for loading the data.
By default, it will use the number of CPUs available.
"},{"location":"reference/core/data/dataloaders/#eva.data.DataLoader.collate_fn","title":"collate_fn: Callable | None = None
class-attribute
instance-attribute
","text":"The batching process.
"},{"location":"reference/core/data/dataloaders/#eva.data.DataLoader.pin_memory","title":"pin_memory: bool = True
class-attribute
instance-attribute
","text":"Will copy Tensors into CUDA pinned memory before returning them.
"},{"location":"reference/core/data/dataloaders/#eva.data.DataLoader.drop_last","title":"drop_last: bool = False
class-attribute
instance-attribute
","text":"Drops the last incomplete batch.
"},{"location":"reference/core/data/dataloaders/#eva.data.DataLoader.persistent_workers","title":"persistent_workers: bool = True
class-attribute
instance-attribute
","text":"Will keep the worker processes after a dataset has been consumed once.
"},{"location":"reference/core/data/dataloaders/#eva.data.DataLoader.prefetch_factor","title":"prefetch_factor: int | None = 2
class-attribute
instance-attribute
","text":"Number of batches loaded in advance by each worker.
"},{"location":"reference/core/data/datamodules/","title":"Datamodules","text":"Reference information for the Datamodule
classes and functions.
eva.data.DataModule
","text":" Bases: LightningDataModule
DataModule encapsulates all the steps needed to process data.
It will initialize and create the mapping between dataloaders and datasets. During the prepare_data
, setup
and teardown
, the datamodule will call the respective methods from all datasets, given that they are defined.
Parameters:
Name Type Description Defaultdatasets
DatasetsSchema | None
The desired datasets.
None
dataloaders
DataloadersSchema | None
The desired dataloaders.
None
Source code in src/eva/core/data/datamodules/datamodule.py
def __init__(\n self,\n datasets: schemas.DatasetsSchema | None = None,\n dataloaders: schemas.DataloadersSchema | None = None,\n) -> None:\n \"\"\"Initializes the datamodule.\n\n Args:\n datasets: The desired datasets.\n dataloaders: The desired dataloaders.\n \"\"\"\n super().__init__()\n\n self.datasets = datasets or self.default_datasets\n self.dataloaders = dataloaders or self.default_dataloaders\n
"},{"location":"reference/core/data/datamodules/#eva.data.DataModule.default_datasets","title":"default_datasets: schemas.DatasetsSchema
property
","text":"Returns the default datasets.
"},{"location":"reference/core/data/datamodules/#eva.data.DataModule.default_dataloaders","title":"default_dataloaders: schemas.DataloadersSchema
property
","text":"Returns the default dataloader schema.
"},{"location":"reference/core/data/datamodules/#eva.data.datamodules.call.call_method_if_exists","title":"eva.data.datamodules.call.call_method_if_exists
","text":"Calls a desired method
from the datasets if exists.
Parameters:
Name Type Description Defaultobjects
Iterable[Any]
An iterable of objects.
requiredmethod
str
The dataset method name to call if exists.
required Source code insrc/eva/core/data/datamodules/call.py
def call_method_if_exists(objects: Iterable[Any], /, method: str) -> None:\n \"\"\"Calls a desired `method` from the datasets if exists.\n\n Args:\n objects: An iterable of objects.\n method: The dataset method name to call if exists.\n \"\"\"\n for _object in _recursive_iter(objects):\n if hasattr(_object, method):\n fn = getattr(_object, method)\n fn()\n
"},{"location":"reference/core/data/datamodules/#eva.data.datamodules.schemas.DatasetsSchema","title":"eva.data.datamodules.schemas.DatasetsSchema
dataclass
","text":"Datasets schema used in DataModule.
"},{"location":"reference/core/data/datamodules/#eva.data.datamodules.schemas.DatasetsSchema.train","title":"train: TRAIN_DATASET = None
class-attribute
instance-attribute
","text":"Train dataset.
"},{"location":"reference/core/data/datamodules/#eva.data.datamodules.schemas.DatasetsSchema.val","title":"val: EVAL_DATASET = None
class-attribute
instance-attribute
","text":"Validation dataset.
"},{"location":"reference/core/data/datamodules/#eva.data.datamodules.schemas.DatasetsSchema.test","title":"test: EVAL_DATASET = None
class-attribute
instance-attribute
","text":"Test dataset.
"},{"location":"reference/core/data/datamodules/#eva.data.datamodules.schemas.DatasetsSchema.predict","title":"predict: EVAL_DATASET = None
class-attribute
instance-attribute
","text":"Predict dataset.
"},{"location":"reference/core/data/datamodules/#eva.data.datamodules.schemas.DatasetsSchema.tolist","title":"tolist
","text":"Returns the dataclass as a list and optionally filters it given the stage.
Source code insrc/eva/core/data/datamodules/schemas.py
def tolist(self, stage: str | None = None) -> List[EVAL_DATASET]:\n \"\"\"Returns the dataclass as a list and optionally filters it given the stage.\"\"\"\n match stage:\n case \"fit\":\n return [self.train, self.val]\n case \"validate\":\n return [self.val]\n case \"test\":\n return [self.test]\n case \"predict\":\n return [self.predict]\n case None:\n return [self.train, self.val, self.test, self.predict]\n case _:\n raise ValueError(f\"Invalid stage `{stage}`.\")\n
"},{"location":"reference/core/data/datasets/","title":"Datasets","text":"Reference information for the Dataset
base class.
eva.core.data.Dataset
","text":" Bases: TorchDataset
Base dataset class.
"},{"location":"reference/core/data/datasets/#eva.core.data.Dataset.prepare_data","title":"prepare_data
","text":"Encapsulates all disk related tasks.
This method is preferred for downloading and preparing the data, for example generate manifest files. If implemented, it will be called via :class:eva.core.data.datamodules.DataModule
, which ensures that is called only within a single process, making it multi-processes safe.
src/eva/core/data/datasets/base.py
def prepare_data(self) -> None:\n \"\"\"Encapsulates all disk related tasks.\n\n This method is preferred for downloading and preparing the data, for\n example generate manifest files. If implemented, it will be called via\n :class:`eva.core.data.datamodules.DataModule`, which ensures that is called\n only within a single process, making it multi-processes safe.\n \"\"\"\n
"},{"location":"reference/core/data/datasets/#eva.core.data.Dataset.setup","title":"setup
","text":"Setups the dataset.
This method is preferred for creating datasets or performing train/val/test splits. If implemented, it will be called via :class:eva.core.data.datamodules.DataModule
at the beginning of fit (train + validate), validate, test, or predict and it will be called from every process (i.e. GPU) across all the nodes in DDP.
src/eva/core/data/datasets/base.py
def setup(self) -> None:\n \"\"\"Setups the dataset.\n\n This method is preferred for creating datasets or performing\n train/val/test splits. If implemented, it will be called via\n :class:`eva.core.data.datamodules.DataModule` at the beginning of fit\n (train + validate), validate, test, or predict and it will be called\n from every process (i.e. GPU) across all the nodes in DDP.\n \"\"\"\n self.configure()\n self.validate()\n
"},{"location":"reference/core/data/datasets/#eva.core.data.Dataset.configure","title":"configure
","text":"Configures the dataset.
This method is preferred to configure the dataset; assign values to attributes, perform splits etc. This would be called from the method ::method::setup
, before calling the ::method::validate
.
src/eva/core/data/datasets/base.py
def configure(self):\n \"\"\"Configures the dataset.\n\n This method is preferred to configure the dataset; assign values\n to attributes, perform splits etc. This would be called from the\n method ::method::`setup`, before calling the ::method::`validate`.\n \"\"\"\n
"},{"location":"reference/core/data/datasets/#eva.core.data.Dataset.validate","title":"validate
","text":"Validates the dataset.
This method aims to check the integrity of the dataset and verify that is configured properly. This would be called from the method ::method::setup
, after calling the ::method::configure
.
src/eva/core/data/datasets/base.py
def validate(self):\n \"\"\"Validates the dataset.\n\n This method aims to check the integrity of the dataset and verify\n that is configured properly. This would be called from the method\n ::method::`setup`, after calling the ::method::`configure`.\n \"\"\"\n
"},{"location":"reference/core/data/datasets/#eva.core.data.Dataset.teardown","title":"teardown
","text":"Cleans up the data artifacts.
Used to clean-up when the run is finished. If implemented, it will be called via :class:eva.core.data.datamodules.DataModule
at the end of fit (train + validate), validate, test, or predict and it will be called from every process (i.e. GPU) across all the nodes in DDP.
src/eva/core/data/datasets/base.py
def teardown(self) -> None:\n \"\"\"Cleans up the data artifacts.\n\n Used to clean-up when the run is finished. If implemented, it will\n be called via :class:`eva.core.data.datamodules.DataModule` at the end\n of fit (train + validate), validate, test, or predict and it will be\n called from every process (i.e. GPU) across all the nodes in DDP.\n \"\"\"\n
"},{"location":"reference/core/data/datasets/#embeddings-datasets","title":"Embeddings datasets","text":""},{"location":"reference/core/data/datasets/#eva.core.data.datasets.EmbeddingsClassificationDataset","title":"eva.core.data.datasets.EmbeddingsClassificationDataset
","text":" Bases: EmbeddingsDataset
Embeddings dataset class for classification tasks.
Expects a manifest file listing the paths of .pt files that contain tensor embeddings of shape [embedding_dim] or [1, embedding_dim].
Parameters:
Name Type Description Defaultroot
str
Root directory of the dataset.
requiredmanifest_file
str
The path to the manifest file, which is relative to the root
argument.
split
Literal['train', 'val', 'test'] | None
The dataset split to use. The split
column of the manifest file will be splitted based on this value.
None
column_mapping
Dict[str, str]
Defines the map between the variables and the manifest columns. It will overwrite the default_column_mapping
with the provided values, so that column_mapping
can contain only the values which are altered or missing.
default_column_mapping
embeddings_transforms
Callable | None
A function/transform that transforms the embedding.
None
target_transforms
Callable | None
A function/transform that transforms the target.
None
Source code in src/eva/core/data/datasets/embeddings/classification/embeddings.py
def __init__(\n self,\n root: str,\n manifest_file: str,\n split: Literal[\"train\", \"val\", \"test\"] | None = None,\n column_mapping: Dict[str, str] = base.default_column_mapping,\n embeddings_transforms: Callable | None = None,\n target_transforms: Callable | None = None,\n) -> None:\n \"\"\"Initialize dataset.\n\n Expects a manifest file listing the paths of .pt files that contain\n tensor embeddings of shape [embedding_dim] or [1, embedding_dim].\n\n Args:\n root: Root directory of the dataset.\n manifest_file: The path to the manifest file, which is relative to\n the `root` argument.\n split: The dataset split to use. The `split` column of the manifest\n file will be splitted based on this value.\n column_mapping: Defines the map between the variables and the manifest\n columns. It will overwrite the `default_column_mapping` with\n the provided values, so that `column_mapping` can contain only the\n values which are altered or missing.\n embeddings_transforms: A function/transform that transforms the embedding.\n target_transforms: A function/transform that transforms the target.\n \"\"\"\n super().__init__(\n root=root,\n manifest_file=manifest_file,\n split=split,\n column_mapping=column_mapping,\n embeddings_transforms=embeddings_transforms,\n target_transforms=target_transforms,\n )\n
"},{"location":"reference/core/data/datasets/#eva.core.data.datasets.MultiEmbeddingsClassificationDataset","title":"eva.core.data.datasets.MultiEmbeddingsClassificationDataset
","text":" Bases: EmbeddingsDataset
Dataset class for where a sample corresponds to multiple embeddings.
Example use case: Slide level dataset where each slide has multiple patch embeddings.
Expects a manifest file listing the paths of .pt
files containing tensor embeddings.
The manifest must have a column_mapping[\"multi_id\"]
column that contains the unique identifier group of embeddings. For oncology datasets, this would be usually the slide id. Each row in the manifest file points to a .pt file that can contain one or multiple embeddings. There can also be multiple rows for the same multi_id
, in which case the embeddings from the different .pt files corresponding to that same multi_id
will be stacked along the first dimension.
Parameters:
Name Type Description Defaultroot
str
Root directory of the dataset.
requiredmanifest_file
str
The path to the manifest file, which is relative to the root
argument.
split
Literal['train', 'val', 'test']
The dataset split to use. The split
column of the manifest file will be splitted based on this value.
column_mapping
Dict[str, str]
Defines the map between the variables and the manifest columns. It will overwrite the default_column_mapping
with the provided values, so that column_mapping
can contain only the values which are altered or missing.
default_column_mapping
embeddings_transforms
Callable | None
A function/transform that transforms the embedding.
None
target_transforms
Callable | None
A function/transform that transforms the target.
None
Source code in src/eva/core/data/datasets/embeddings/classification/multi_embeddings.py
def __init__(\n self,\n root: str,\n manifest_file: str,\n split: Literal[\"train\", \"val\", \"test\"],\n column_mapping: Dict[str, str] = base.default_column_mapping,\n embeddings_transforms: Callable | None = None,\n target_transforms: Callable | None = None,\n):\n \"\"\"Initialize dataset.\n\n Expects a manifest file listing the paths of `.pt` files containing tensor embeddings.\n\n The manifest must have a `column_mapping[\"multi_id\"]` column that contains the\n unique identifier group of embeddings. For oncology datasets, this would be usually\n the slide id. Each row in the manifest file points to a .pt file that can contain\n one or multiple embeddings. There can also be multiple rows for the same `multi_id`,\n in which case the embeddings from the different .pt files corresponding to that same\n `multi_id` will be stacked along the first dimension.\n\n Args:\n root: Root directory of the dataset.\n manifest_file: The path to the manifest file, which is relative to\n the `root` argument.\n split: The dataset split to use. The `split` column of the manifest\n file will be splitted based on this value.\n column_mapping: Defines the map between the variables and the manifest\n columns. It will overwrite the `default_column_mapping` with\n the provided values, so that `column_mapping` can contain only the\n values which are altered or missing.\n embeddings_transforms: A function/transform that transforms the embedding.\n target_transforms: A function/transform that transforms the target.\n \"\"\"\n super().__init__(\n manifest_file=manifest_file,\n root=root,\n split=split,\n column_mapping=column_mapping,\n embeddings_transforms=embeddings_transforms,\n target_transforms=target_transforms,\n )\n\n self._multi_ids: List[int]\n
"},{"location":"reference/core/data/transforms/","title":"Transforms","text":""},{"location":"reference/core/data/transforms/#eva.data.transforms.ArrayToTensor","title":"eva.data.transforms.ArrayToTensor
","text":"Converts a numpy array to a torch tensor.
"},{"location":"reference/core/data/transforms/#eva.data.transforms.ArrayToFloatTensor","title":"eva.data.transforms.ArrayToFloatTensor
","text":" Bases: ArrayToTensor
Converts a numpy array to a torch tensor and casts it to float.
"},{"location":"reference/core/data/transforms/#eva.data.transforms.Pad2DTensor","title":"eva.data.transforms.Pad2DTensor
","text":"Pads a 2D tensor to a fixed dimension accross the first dimension.
Parameters:
Name Type Description Defaultpad_size
int
The size to pad the tensor to. If the tensor is larger than this size, no padding will be applied.
requiredpad_value
int | float
The value to use for padding.
float('-inf')
Source code in src/eva/core/data/transforms/padding/pad_2d_tensor.py
def __init__(self, pad_size: int, pad_value: int | float = float(\"-inf\")):\n \"\"\"Initialize the transformation.\n\n Args:\n pad_size: The size to pad the tensor to. If the tensor is larger than this size,\n no padding will be applied.\n pad_value: The value to use for padding.\n \"\"\"\n self._pad_size = pad_size\n self._pad_value = pad_value\n
"},{"location":"reference/core/data/transforms/#eva.data.transforms.SampleFromAxis","title":"eva.data.transforms.SampleFromAxis
","text":"Samples n_samples entries from a tensor along a given axis.
Parameters:
Name Type Description Defaultn_samples
int
The number of samples to draw.
requiredseed
int
The seed to use for sampling.
42
axis
int
The axis along which to sample.
0
Source code in src/eva/core/data/transforms/sampling/sample_from_axis.py
def __init__(self, n_samples: int, seed: int = 42, axis: int = 0):\n \"\"\"Initialize the transformation.\n\n Args:\n n_samples: The number of samples to draw.\n seed: The seed to use for sampling.\n axis: The axis along which to sample.\n \"\"\"\n self._seed = seed\n self._n_samples = n_samples\n self._axis = axis\n self._generator = self._get_generator()\n
"},{"location":"reference/core/loggers/loggers/","title":"Loggers","text":""},{"location":"reference/core/loggers/loggers/#eva.core.loggers.DummyLogger","title":"eva.core.loggers.DummyLogger
","text":" Bases: DummyLogger
Dummy logger class.
This logger is currently used as a placeholder when saving results to remote storage, as common lightning loggers do not work with azure blob storage:
https://github.com/Lightning-AI/pytorch-lightning/issues/18861 https://github.com/Lightning-AI/pytorch-lightning/issues/19736
Simply disabling the loggers when pointing to remote storage doesn't work because callbacks such as LearningRateMonitor or ModelCheckpoint require a logger to be present.
Parameters:
Name Type Description Defaultsave_dir
str
The save directory (this logger does not save anything, but callbacks might use this path to save their outputs).
required Source code insrc/eva/core/loggers/dummy.py
def __init__(self, save_dir: str) -> None:\n \"\"\"Initializes the logger.\n\n Args:\n save_dir: The save directory (this logger does not save anything,\n but callbacks might use this path to save their outputs).\n \"\"\"\n super().__init__()\n self._save_dir = save_dir\n
"},{"location":"reference/core/loggers/loggers/#eva.core.loggers.DummyLogger.save_dir","title":"save_dir: str
property
","text":"Returns the save directory.
"},{"location":"reference/core/metrics/","title":"Metrics","text":"Reference information for the Metrics
classes.
eva.metrics.AverageLoss
","text":" Bases: Metric
Average loss metric tracker.
Source code insrc/eva/core/metrics/average_loss.py
def __init__(self) -> None:\n \"\"\"Initializes the metric.\"\"\"\n super().__init__()\n\n self.add_state(\"value\", default=torch.tensor(0), dist_reduce_fx=\"sum\")\n self.add_state(\"total\", default=torch.tensor(0), dist_reduce_fx=\"sum\")\n
"},{"location":"reference/core/metrics/binary_balanced_accuracy/","title":"Binary Balanced Accuracy","text":""},{"location":"reference/core/metrics/binary_balanced_accuracy/#eva.metrics.BinaryBalancedAccuracy","title":"eva.metrics.BinaryBalancedAccuracy
","text":" Bases: BinaryStatScores
Computes the balanced accuracy for binary classification.
"},{"location":"reference/core/metrics/binary_balanced_accuracy/#eva.metrics.BinaryBalancedAccuracy.compute","title":"compute
","text":"Compute accuracy based on inputs passed in to update
previously.
src/eva/core/metrics/binary_balanced_accuracy.py
def compute(self) -> Tensor:\n \"\"\"Compute accuracy based on inputs passed in to ``update`` previously.\"\"\"\n tp, fp, tn, fn = self._final_state()\n sensitivity = _safe_divide(tp, tp + fn)\n specificity = _safe_divide(tn, tn + fp)\n return 0.5 * (sensitivity + specificity)\n
"},{"location":"reference/core/metrics/core/","title":"Core","text":""},{"location":"reference/core/metrics/core/#eva.metrics.MetricModule","title":"eva.metrics.MetricModule
","text":" Bases: Module
The metrics module.
Allows to store and keep track of train
, val
and test
metrics.
Parameters:
Name Type Description Defaulttrain
MetricCollection | None
The training metric collection.
requiredval
MetricCollection | None
The validation metric collection.
requiredtest
MetricCollection | None
The test metric collection.
required Source code insrc/eva/core/metrics/structs/module.py
def __init__(\n self,\n train: collection.MetricCollection | None,\n val: collection.MetricCollection | None,\n test: collection.MetricCollection | None,\n) -> None:\n \"\"\"Initializes the metrics for the Trainer.\n\n Args:\n train: The training metric collection.\n val: The validation metric collection.\n test: The test metric collection.\n \"\"\"\n super().__init__()\n\n self._train = train or self.default_metric_collection\n self._val = val or self.default_metric_collection\n self._test = test or self.default_metric_collection\n
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricModule.default_metric_collection","title":"default_metric_collection: collection.MetricCollection
property
","text":"Returns the default metric collection.
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricModule.training_metrics","title":"training_metrics: collection.MetricCollection
property
","text":"Returns the metrics of the train dataset.
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricModule.validation_metrics","title":"validation_metrics: collection.MetricCollection
property
","text":"Returns the metrics of the validation dataset.
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricModule.test_metrics","title":"test_metrics: collection.MetricCollection
property
","text":"Returns the metrics of the test dataset.
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricModule.from_metrics","title":"from_metrics
classmethod
","text":"Initializes a metric module from a list of metrics.
Parameters:
Name Type Description Defaulttrain
MetricModuleType | None
Metrics for the training stage.
requiredval
MetricModuleType | None
Metrics for the validation stage.
requiredtest
MetricModuleType | None
Metrics for the test stage.
requiredseparator
str
The separator between the group name of the metric and the metric itself.
'/'
Source code in src/eva/core/metrics/structs/module.py
@classmethod\ndef from_metrics(\n cls,\n train: MetricModuleType | None,\n val: MetricModuleType | None,\n test: MetricModuleType | None,\n *,\n separator: str = \"/\",\n) -> MetricModule:\n \"\"\"Initializes a metric module from a list of metrics.\n\n Args:\n train: Metrics for the training stage.\n val: Metrics for the validation stage.\n test: Metrics for the test stage.\n separator: The separator between the group name of the metric\n and the metric itself.\n \"\"\"\n return cls(\n train=_create_collection_from_metrics(train, prefix=\"train\" + separator),\n val=_create_collection_from_metrics(val, prefix=\"val\" + separator),\n test=_create_collection_from_metrics(test, prefix=\"test\" + separator),\n )\n
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricModule.from_schema","title":"from_schema
classmethod
","text":"Initializes a metric module from the metrics schema.
Parameters:
Name Type Description Defaultschema
MetricsSchema
The dataclass metric schema.
requiredseparator
str
The separator between the group name of the metric and the metric itself.
'/'
Source code in src/eva/core/metrics/structs/module.py
@classmethod\ndef from_schema(\n cls,\n schema: schemas.MetricsSchema,\n *,\n separator: str = \"/\",\n) -> MetricModule:\n \"\"\"Initializes a metric module from the metrics schema.\n\n Args:\n schema: The dataclass metric schema.\n separator: The separator between the group name of the metric\n and the metric itself.\n \"\"\"\n return cls.from_metrics(\n train=schema.training_metrics,\n val=schema.evaluation_metrics,\n test=schema.evaluation_metrics,\n separator=separator,\n )\n
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricsSchema","title":"eva.metrics.MetricsSchema
dataclass
","text":"Metrics schema.
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricsSchema.common","title":"common: MetricModuleType | None = None
class-attribute
instance-attribute
","text":"Holds the common train and evaluation metrics.
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricsSchema.train","title":"train: MetricModuleType | None = None
class-attribute
instance-attribute
","text":"The exclusive training metrics.
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricsSchema.evaluation","title":"evaluation: MetricModuleType | None = None
class-attribute
instance-attribute
","text":"The exclusive evaluation metrics.
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricsSchema.training_metrics","title":"training_metrics: MetricModuleType | None
property
","text":"Returns the training metics.
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricsSchema.evaluation_metrics","title":"evaluation_metrics: MetricModuleType | None
property
","text":"Returns the evaluation metics.
"},{"location":"reference/core/metrics/defaults/","title":"Defaults","text":""},{"location":"reference/core/metrics/defaults/#eva.metrics.BinaryClassificationMetrics","title":"eva.metrics.BinaryClassificationMetrics
","text":" Bases: MetricCollection
Default metrics for binary classification tasks.
The metrics instantiated here are:
Parameters:
Name Type Description Defaultthreshold
float
Threshold for transforming probability to binary (0,1) predictions
0.5
ignore_index
int | None
Specifies a target value that is ignored and does not contribute to the metric calculation.
None
prefix
str | None
A string to append in front of the keys of the output dict.
None
postfix
str | None
A string to append after the keys of the output dict.
None
Source code in src/eva/core/metrics/defaults/classification/binary.py
def __init__(\n self,\n threshold: float = 0.5,\n ignore_index: int | None = None,\n prefix: str | None = None,\n postfix: str | None = None,\n) -> None:\n \"\"\"Initializes the binary classification metrics.\n\n The metrics instantiated here are:\n\n - BinaryAUROC\n - BinaryAccuracy\n - BinaryBalancedAccuracy\n - BinaryF1Score\n - BinaryPrecision\n - BinaryRecall\n\n Args:\n threshold: Threshold for transforming probability to binary (0,1) predictions\n ignore_index: Specifies a target value that is ignored and does not\n contribute to the metric calculation.\n prefix: A string to append in front of the keys of the output dict.\n postfix: A string to append after the keys of the output dict.\n \"\"\"\n super().__init__(\n metrics=[\n classification.BinaryAUROC(\n ignore_index=ignore_index,\n ),\n classification.BinaryAccuracy(\n threshold=threshold,\n ignore_index=ignore_index,\n ),\n binary_balanced_accuracy.BinaryBalancedAccuracy(\n threshold=threshold,\n ignore_index=ignore_index,\n ),\n classification.BinaryF1Score(\n threshold=threshold,\n ignore_index=ignore_index,\n ),\n classification.BinaryPrecision(\n threshold=threshold,\n ignore_index=ignore_index,\n ),\n classification.BinaryRecall(\n threshold=threshold,\n ignore_index=ignore_index,\n ),\n ],\n prefix=prefix,\n postfix=postfix,\n compute_groups=[\n [\n \"BinaryAccuracy\",\n \"BinaryBalancedAccuracy\",\n \"BinaryF1Score\",\n \"BinaryPrecision\",\n \"BinaryRecall\",\n ],\n [\n \"BinaryAUROC\",\n ],\n ],\n )\n
"},{"location":"reference/core/metrics/defaults/#eva.metrics.MulticlassClassificationMetrics","title":"eva.metrics.MulticlassClassificationMetrics
","text":" Bases: MetricCollection
Default metrics for multi-class classification tasks.
The metrics instantiated here are:
Parameters:
Name Type Description Defaultnum_classes
int
Integer specifying the number of classes.
requiredaverage
Literal['macro', 'weighted', 'none']
Defines the reduction that is applied over labels.
'macro'
ignore_index
int | None
Specifies a target value that is ignored and does not contribute to the metric calculation.
None
prefix
str | None
A string to append in front of the keys of the output dict.
None
postfix
str | None
A string to append after the keys of the output dict.
None
Source code in src/eva/core/metrics/defaults/classification/multiclass.py
def __init__(\n self,\n num_classes: int,\n average: Literal[\"macro\", \"weighted\", \"none\"] = \"macro\",\n ignore_index: int | None = None,\n prefix: str | None = None,\n postfix: str | None = None,\n) -> None:\n \"\"\"Initializes the multi-class classification metrics.\n\n The metrics instantiated here are:\n\n - MulticlassAccuracy\n - MulticlassPrecision\n - MulticlassRecall\n - MulticlassF1Score\n - MulticlassAUROC\n\n Args:\n num_classes: Integer specifying the number of classes.\n average: Defines the reduction that is applied over labels.\n ignore_index: Specifies a target value that is ignored and does not\n contribute to the metric calculation.\n prefix: A string to append in front of the keys of the output dict.\n postfix: A string to append after the keys of the output dict.\n \"\"\"\n super().__init__(\n metrics=[\n classification.MulticlassAUROC(\n num_classes=num_classes,\n average=average,\n ignore_index=ignore_index,\n ),\n classification.MulticlassAccuracy(\n num_classes=num_classes,\n average=average,\n ignore_index=ignore_index,\n ),\n classification.MulticlassF1Score(\n num_classes=num_classes,\n average=average,\n ignore_index=ignore_index,\n ),\n classification.MulticlassPrecision(\n num_classes=num_classes,\n average=average,\n ignore_index=ignore_index,\n ),\n classification.MulticlassRecall(\n num_classes=num_classes,\n average=average,\n ignore_index=ignore_index,\n ),\n ],\n prefix=prefix,\n postfix=postfix,\n compute_groups=[\n [\n \"MulticlassAccuracy\",\n \"MulticlassF1Score\",\n \"MulticlassPrecision\",\n \"MulticlassRecall\",\n ],\n [\n \"MulticlassAUROC\",\n ],\n ],\n )\n
"},{"location":"reference/core/models/modules/","title":"Modules","text":"Reference information for the model Modules
API.
eva.models.modules.ModelModule
","text":" Bases: LightningModule
The base model module.
Parameters:
Name Type Description Defaultmetrics
MetricsSchema | None
The metric groups to track.
None
postprocess
BatchPostProcess | None
A list of helper functions to apply after the loss and before the metrics calculation to the model predictions and targets.
None
Source code in src/eva/core/models/modules/module.py
def __init__(\n self,\n metrics: metrics_lib.MetricsSchema | None = None,\n postprocess: batch_postprocess.BatchPostProcess | None = None,\n) -> None:\n \"\"\"Initializes the basic module.\n\n Args:\n metrics: The metric groups to track.\n postprocess: A list of helper functions to apply after the\n loss and before the metrics calculation to the model\n predictions and targets.\n \"\"\"\n super().__init__()\n\n self._metrics = metrics or self.default_metrics\n self._postprocess = postprocess or self.default_postprocess\n\n self.metrics = metrics_lib.MetricModule.from_schema(self._metrics)\n
"},{"location":"reference/core/models/modules/#eva.models.modules.ModelModule.default_metrics","title":"default_metrics: metrics_lib.MetricsSchema
property
","text":"The default metrics.
"},{"location":"reference/core/models/modules/#eva.models.modules.ModelModule.default_postprocess","title":"default_postprocess: batch_postprocess.BatchPostProcess
property
","text":"The default post-processes.
"},{"location":"reference/core/models/modules/#eva.models.modules.ModelModule.metrics_device","title":"metrics_device: torch.device
property
","text":"Returns the device by which the metrics should be calculated.
We allocate the metrics to CPU when operating on single device, as it is much faster, but to GPU when employing multiple ones, as DDP strategy requires the metrics to be allocated to the module's GPU.
"},{"location":"reference/core/models/modules/#eva.models.modules.HeadModule","title":"eva.models.modules.HeadModule
","text":" Bases: ModelModule
Neural Net Head Module for training on features.
It can be used for supervised (mini-batch) stochastic gradient descent downstream tasks such as classification, regression and segmentation.
Parameters:
Name Type Description Defaulthead
MODEL_TYPE
The neural network that would be trained on the features.
requiredcriterion
Callable[..., Tensor]
The loss function to use.
requiredbackbone
MODEL_TYPE | None
The feature extractor. If None
, it will be expected that the input batch returns the features directly.
None
optimizer
OptimizerCallable
The optimizer to use.
Adam
lr_scheduler
LRSchedulerCallable
The learning rate scheduler to use.
ConstantLR
metrics
MetricsSchema | None
The metric groups to track.
None
postprocess
BatchPostProcess | None
A list of helper functions to apply after the loss and before the metrics calculation to the model predictions and targets.
None
Source code in src/eva/core/models/modules/head.py
def __init__(\n self,\n head: MODEL_TYPE,\n criterion: Callable[..., torch.Tensor],\n backbone: MODEL_TYPE | None = None,\n optimizer: OptimizerCallable = optim.Adam,\n lr_scheduler: LRSchedulerCallable = lr_scheduler.ConstantLR,\n metrics: metrics_lib.MetricsSchema | None = None,\n postprocess: batch_postprocess.BatchPostProcess | None = None,\n) -> None:\n \"\"\"Initializes the neural net head module.\n\n Args:\n head: The neural network that would be trained on the features.\n criterion: The loss function to use.\n backbone: The feature extractor. If `None`, it will be expected\n that the input batch returns the features directly.\n optimizer: The optimizer to use.\n lr_scheduler: The learning rate scheduler to use.\n metrics: The metric groups to track.\n postprocess: A list of helper functions to apply after the\n loss and before the metrics calculation to the model\n predictions and targets.\n \"\"\"\n super().__init__(metrics=metrics, postprocess=postprocess)\n\n self.head = head\n self.criterion = criterion\n self.backbone = backbone\n self.optimizer = optimizer\n self.lr_scheduler = lr_scheduler\n
"},{"location":"reference/core/models/modules/#eva.models.modules.InferenceModule","title":"eva.models.modules.InferenceModule
","text":" Bases: ModelModule
An lightweight model module to perform inference.
Parameters:
Name Type Description Defaultbackbone
MODEL_TYPE
The network to be used for inference.
required Source code insrc/eva/core/models/modules/inference.py
def __init__(self, backbone: MODEL_TYPE) -> None:\n \"\"\"Initializes the module.\n\n Args:\n backbone: The network to be used for inference.\n \"\"\"\n super().__init__(metrics=None)\n\n self.backbone = backbone\n
"},{"location":"reference/core/models/networks/","title":"Networks","text":"Reference information for the model Networks
API.
eva.models.networks.MLP
","text":" Bases: Module
A Multi-layer Perceptron (MLP) network.
Parameters:
Name Type Description Defaultinput_size
int
The number of input features.
requiredoutput_size
int
The number of output features.
requiredhidden_layer_sizes
tuple[int, ...] | None
A list specifying the number of units in each hidden layer.
None
dropout
float
Dropout probability for hidden layers.
0.0
hidden_activation_fn
Type[Module] | None
Activation function to use for hidden layers. Default is ReLU.
ReLU
output_activation_fn
Type[Module] | None
Activation function to use for the output layer. Default is None.
None
Source code in src/eva/core/models/networks/mlp.py
def __init__(\n self,\n input_size: int,\n output_size: int,\n hidden_layer_sizes: tuple[int, ...] | None = None,\n hidden_activation_fn: Type[torch.nn.Module] | None = nn.ReLU,\n output_activation_fn: Type[torch.nn.Module] | None = None,\n dropout: float = 0.0,\n) -> None:\n \"\"\"Initializes the MLP.\n\n Args:\n input_size: The number of input features.\n output_size: The number of output features.\n hidden_layer_sizes: A list specifying the number of units in each hidden layer.\n dropout: Dropout probability for hidden layers.\n hidden_activation_fn: Activation function to use for hidden layers. Default is ReLU.\n output_activation_fn: Activation function to use for the output layer. Default is None.\n \"\"\"\n super().__init__()\n\n self.input_size = input_size\n self.output_size = output_size\n self.hidden_layer_sizes = hidden_layer_sizes if hidden_layer_sizes is not None else ()\n self.hidden_activation_fn = hidden_activation_fn\n self.output_activation_fn = output_activation_fn\n self.dropout = dropout\n\n self._network = self._build_network()\n
"},{"location":"reference/core/models/networks/#eva.models.networks.MLP.forward","title":"forward
","text":"Defines the forward pass of the MLP.
Parameters:
Name Type Description Defaultx
Tensor
The input tensor.
requiredReturns:
Type DescriptionTensor
The output of the network.
Source code insrc/eva/core/models/networks/mlp.py
def forward(self, x: torch.Tensor) -> torch.Tensor:\n \"\"\"Defines the forward pass of the MLP.\n\n Args:\n x: The input tensor.\n\n Returns:\n The output of the network.\n \"\"\"\n return self._network(x)\n
"},{"location":"reference/core/models/networks/#wrappers","title":"Wrappers","text":""},{"location":"reference/core/models/networks/#eva.models.networks.wrappers.BaseModel","title":"eva.models.networks.wrappers.BaseModel
","text":" Bases: Module
Base class for model wrappers.
Parameters:
Name Type Description Defaulttensor_transforms
Callable | None
The transforms to apply to the output tensor produced by the model.
None
Source code in src/eva/core/models/networks/wrappers/base.py
def __init__(self, tensor_transforms: Callable | None = None) -> None:\n \"\"\"Initializes the model.\n\n Args:\n tensor_transforms: The transforms to apply to the output\n tensor produced by the model.\n \"\"\"\n super().__init__()\n\n self._output_transforms = tensor_transforms\n
"},{"location":"reference/core/models/networks/#eva.models.networks.wrappers.BaseModel.load_model","title":"load_model
abstractmethod
","text":"Loads the model.
Source code insrc/eva/core/models/networks/wrappers/base.py
@abc.abstractmethod\ndef load_model(self) -> Callable[..., torch.Tensor]:\n \"\"\"Loads the model.\"\"\"\n raise NotImplementedError\n
"},{"location":"reference/core/models/networks/#eva.models.networks.wrappers.BaseModel.model_forward","title":"model_forward
abstractmethod
","text":"Implements the forward pass of the model.
Parameters:
Name Type Description Defaulttensor
Tensor
The input tensor to the model.
required Source code insrc/eva/core/models/networks/wrappers/base.py
@abc.abstractmethod\ndef model_forward(self, tensor: torch.Tensor) -> torch.Tensor:\n \"\"\"Implements the forward pass of the model.\n\n Args:\n tensor: The input tensor to the model.\n \"\"\"\n raise NotImplementedError\n
"},{"location":"reference/core/models/networks/#eva.models.networks.wrappers.ModelFromFunction","title":"eva.models.networks.wrappers.ModelFromFunction
","text":" Bases: BaseModel
Wrapper class for models which are initialized from functions.
This is helpful for initializing models in a .yaml
configuration file.
Parameters:
Name Type Description Defaultpath
Callable[..., Module]
The path to the callable object (class or function).
requiredarguments
Dict[str, Any] | None
The extra callable function / class arguments.
None
checkpoint_path
str | None
The path to the checkpoint to load the model weights from. This is currently only supported for torch model checkpoints. For other formats, the checkpoint loading should be handled within the provided callable object in . None
tensor_transforms
Callable | None
The transforms to apply to the output tensor produced by the model.
None
Source code in src/eva/core/models/networks/wrappers/from_function.py
def __init__(\n self,\n path: Callable[..., nn.Module],\n arguments: Dict[str, Any] | None = None,\n checkpoint_path: str | None = None,\n tensor_transforms: Callable | None = None,\n) -> None:\n \"\"\"Initializes and constructs the model.\n\n Args:\n path: The path to the callable object (class or function).\n arguments: The extra callable function / class arguments.\n checkpoint_path: The path to the checkpoint to load the model\n weights from. This is currently only supported for torch\n model checkpoints. For other formats, the checkpoint loading\n should be handled within the provided callable object in <path>.\n tensor_transforms: The transforms to apply to the output tensor\n produced by the model.\n \"\"\"\n super().__init__()\n\n self._path = path\n self._arguments = arguments\n self._checkpoint_path = checkpoint_path\n self._tensor_transforms = tensor_transforms\n\n self._model = self.load_model()\n
"},{"location":"reference/core/models/networks/#eva.models.networks.wrappers.HuggingFaceModel","title":"eva.models.networks.wrappers.HuggingFaceModel
","text":" Bases: BaseModel
Wrapper class for loading HuggingFace transformers
models.
Parameters:
Name Type Description Defaultmodel_name_or_path
str
The model name or path to load the model from. This can be a local path or a model name from the HuggingFace
model hub.
tensor_transforms
Callable | None
The transforms to apply to the output tensor produced by the model.
None
Source code in src/eva/core/models/networks/wrappers/huggingface.py
def __init__(self, model_name_or_path: str, tensor_transforms: Callable | None = None) -> None:\n \"\"\"Initializes the model.\n\n Args:\n model_name_or_path: The model name or path to load the model from.\n This can be a local path or a model name from the `HuggingFace`\n model hub.\n tensor_transforms: The transforms to apply to the output tensor\n produced by the model.\n \"\"\"\n super().__init__(tensor_transforms=tensor_transforms)\n\n self._model_name_or_path = model_name_or_path\n self._model = self.load_model()\n
"},{"location":"reference/core/models/networks/#eva.models.networks.wrappers.ONNXModel","title":"eva.models.networks.wrappers.ONNXModel
","text":" Bases: BaseModel
Wrapper class for loading ONNX models.
Parameters:
Name Type Description Defaultpath
str
The path to the .onnx model file.
requireddevice
Literal['cpu', 'cuda'] | None
The device to run the model on. This can be either \"cpu\" or \"cuda\".
'cpu'
tensor_transforms
Callable | None
The transforms to apply to the output tensor produced by the model.
None
Source code in src/eva/core/models/networks/wrappers/onnx.py
def __init__(\n self,\n path: str,\n device: Literal[\"cpu\", \"cuda\"] | None = \"cpu\",\n tensor_transforms: Callable | None = None,\n):\n \"\"\"Initializes the model.\n\n Args:\n path: The path to the .onnx model file.\n device: The device to run the model on. This can be either \"cpu\" or \"cuda\".\n tensor_transforms: The transforms to apply to the output tensor produced by the model.\n \"\"\"\n super().__init__(tensor_transforms=tensor_transforms)\n\n self._path = path\n self._device = device\n self._model = self.load_model()\n
"},{"location":"reference/core/trainers/functional/","title":"Functional","text":"Reference information for the trainers Functional
API.
eva.core.trainers.functional.run_evaluation_session
","text":"Runs a downstream evaluation session out-of-place.
It performs an evaluation run (fit and evaluate) on the model multiple times. Note that as the input base_trainer
and base_model
would be cloned, the input object would not be modified.
Parameters:
Name Type Description Defaultbase_trainer
Trainer
The base trainer module to use.
requiredbase_model
ModelModule
The base model module to use.
requireddatamodule
DataModule
The data module.
requiredn_runs
int
The amount of runs (fit and evaluate) to perform.
1
verbose
bool
Whether to verbose the session metrics instead of these of each individual runs and vice-versa.
True
Source code in src/eva/core/trainers/functional.py
def run_evaluation_session(\n base_trainer: eva_trainer.Trainer,\n base_model: modules.ModelModule,\n datamodule: datamodules.DataModule,\n *,\n n_runs: int = 1,\n verbose: bool = True,\n) -> None:\n \"\"\"Runs a downstream evaluation session out-of-place.\n\n It performs an evaluation run (fit and evaluate) on the model\n multiple times. Note that as the input `base_trainer` and\n `base_model` would be cloned, the input object would not\n be modified.\n\n Args:\n base_trainer: The base trainer module to use.\n base_model: The base model module to use.\n datamodule: The data module.\n n_runs: The amount of runs (fit and evaluate) to perform.\n verbose: Whether to verbose the session metrics instead of\n these of each individual runs and vice-versa.\n \"\"\"\n recorder = _recorder.SessionRecorder(output_dir=base_trainer.default_log_dir, verbose=verbose)\n for run_index in range(n_runs):\n validation_scores, test_scores = run_evaluation(\n base_trainer,\n base_model,\n datamodule,\n run_id=f\"run_{run_index}\",\n verbose=not verbose,\n )\n recorder.update(validation_scores, test_scores)\n recorder.save()\n
"},{"location":"reference/core/trainers/functional/#eva.core.trainers.functional.run_evaluation","title":"eva.core.trainers.functional.run_evaluation
","text":"Fits and evaluates a model out-of-place.
Parameters:
Name Type Description Defaultbase_trainer
Trainer
The base trainer to use but not modify.
requiredbase_model
ModelModule
The model module to use but not modify.
requireddatamodule
DataModule
The data module.
requiredrun_id
str | None
The run id to be appended to the output log directory. If None
, it will use the log directory of the trainer as is.
None
verbose
bool
Whether to print the validation and test metrics in the end of the training.
True
Returns:
Type DescriptionTuple[_EVALUATE_OUTPUT, _EVALUATE_OUTPUT | None]
A tuple of with the validation and the test metrics (if exists).
Source code insrc/eva/core/trainers/functional.py
def run_evaluation(\n base_trainer: eva_trainer.Trainer,\n base_model: modules.ModelModule,\n datamodule: datamodules.DataModule,\n *,\n run_id: str | None = None,\n verbose: bool = True,\n) -> Tuple[_EVALUATE_OUTPUT, _EVALUATE_OUTPUT | None]:\n \"\"\"Fits and evaluates a model out-of-place.\n\n Args:\n base_trainer: The base trainer to use but not modify.\n base_model: The model module to use but not modify.\n datamodule: The data module.\n run_id: The run id to be appended to the output log directory.\n If `None`, it will use the log directory of the trainer as is.\n verbose: Whether to print the validation and test metrics\n in the end of the training.\n\n Returns:\n A tuple of with the validation and the test metrics (if exists).\n \"\"\"\n trainer, model = _utils.clone(base_trainer, base_model)\n trainer.setup_log_dirs(run_id or \"\")\n return fit_and_validate(trainer, model, datamodule, verbose=verbose)\n
"},{"location":"reference/core/trainers/functional/#eva.core.trainers.functional.fit_and_validate","title":"eva.core.trainers.functional.fit_and_validate
","text":"Fits and evaluates a model in-place.
If the test set is set in the datamodule, it will evaluate the model on the test set as well.
Parameters:
Name Type Description Defaulttrainer
Trainer
The trainer module to use and update in-place.
requiredmodel
ModelModule
The model module to use and update in-place.
requireddatamodule
DataModule
The data module.
requiredverbose
bool
Whether to print the validation and test metrics in the end of the training.
True
Returns:
Type DescriptionTuple[_EVALUATE_OUTPUT, _EVALUATE_OUTPUT | None]
A tuple of with the validation and the test metrics (if exists).
Source code insrc/eva/core/trainers/functional.py
def fit_and_validate(\n trainer: eva_trainer.Trainer,\n model: modules.ModelModule,\n datamodule: datamodules.DataModule,\n verbose: bool = True,\n) -> Tuple[_EVALUATE_OUTPUT, _EVALUATE_OUTPUT | None]:\n \"\"\"Fits and evaluates a model in-place.\n\n If the test set is set in the datamodule, it will evaluate the model\n on the test set as well.\n\n Args:\n trainer: The trainer module to use and update in-place.\n model: The model module to use and update in-place.\n datamodule: The data module.\n verbose: Whether to print the validation and test metrics\n in the end of the training.\n\n Returns:\n A tuple of with the validation and the test metrics (if exists).\n \"\"\"\n trainer.fit(model, datamodule=datamodule)\n validation_scores = trainer.validate(datamodule=datamodule, verbose=verbose)\n test_scores = (\n None\n if datamodule.datasets.test is None\n else trainer.test(datamodule=datamodule, verbose=verbose)\n )\n return validation_scores, test_scores\n
"},{"location":"reference/core/trainers/functional/#eva.core.trainers.functional.infer_model","title":"eva.core.trainers.functional.infer_model
","text":"Performs model inference out-of-place.
Note that the input base_model
and base_trainer
would not be modified.
Parameters:
Name Type Description Defaultbase_trainer
Trainer
The base trainer to use but not modify.
requiredbase_model
ModelModule
The model module to use but not modify.
requireddatamodule
DataModule
The data module.
requiredreturn_predictions
bool
Whether to return the model predictions.
False
Source code in src/eva/core/trainers/functional.py
def infer_model(\n base_trainer: eva_trainer.Trainer,\n base_model: modules.ModelModule,\n datamodule: datamodules.DataModule,\n *,\n return_predictions: bool = False,\n) -> None:\n \"\"\"Performs model inference out-of-place.\n\n Note that the input `base_model` and `base_trainer` would\n not be modified.\n\n Args:\n base_trainer: The base trainer to use but not modify.\n base_model: The model module to use but not modify.\n datamodule: The data module.\n return_predictions: Whether to return the model predictions.\n \"\"\"\n trainer, model = _utils.clone(base_trainer, base_model)\n return trainer.predict(\n model=model,\n datamodule=datamodule,\n return_predictions=return_predictions,\n )\n
"},{"location":"reference/core/trainers/trainer/","title":"Trainers","text":"Reference information for the Trainers
API.
eva.core.trainers.Trainer
","text":" Bases: Trainer
Core trainer class.
This is an extended version of lightning's core trainer class.
For the input arguments, refer to ::class::lightning.pytorch.Trainer
.
Parameters:
Name Type Description Defaultargs
Any
Positional arguments of ::class::lightning.pytorch.Trainer
.
()
default_root_dir
str
The default root directory to store the output logs. Unlike in ::class::lightning.pytorch.Trainer
, this path would be the prioritized destination point.
'logs'
n_runs
int
The amount of runs (fit and evaluate) to perform in an evaluation session.
1
kwargs
Any
Kew-word arguments of ::class::lightning.pytorch.Trainer
.
{}
Source code in src/eva/core/trainers/trainer.py
@argparse._defaults_from_env_vars\ndef __init__(\n self,\n *args: Any,\n default_root_dir: str = \"logs\",\n n_runs: int = 1,\n **kwargs: Any,\n) -> None:\n \"\"\"Initializes the trainer.\n\n For the input arguments, refer to ::class::`lightning.pytorch.Trainer`.\n\n Args:\n args: Positional arguments of ::class::`lightning.pytorch.Trainer`.\n default_root_dir: The default root directory to store the output logs.\n Unlike in ::class::`lightning.pytorch.Trainer`, this path would be the\n prioritized destination point.\n n_runs: The amount of runs (fit and evaluate) to perform in an evaluation session.\n kwargs: Kew-word arguments of ::class::`lightning.pytorch.Trainer`.\n \"\"\"\n super().__init__(*args, default_root_dir=default_root_dir, **kwargs)\n\n self._n_runs = n_runs\n\n self._session_id: str = _logging.generate_session_id()\n self._log_dir: str = self.default_log_dir\n\n self.setup_log_dirs()\n
"},{"location":"reference/core/trainers/trainer/#eva.core.trainers.Trainer.default_log_dir","title":"default_log_dir: str
property
","text":"Returns the default log directory.
"},{"location":"reference/core/trainers/trainer/#eva.core.trainers.Trainer.setup_log_dirs","title":"setup_log_dirs
","text":"Setups the logging directory of the trainer and experimental loggers in-place.
Parameters:
Name Type Description Defaultsubdirectory
str
Whether to append a subdirectory to the output log.
''
Source code in src/eva/core/trainers/trainer.py
def setup_log_dirs(self, subdirectory: str = \"\") -> None:\n \"\"\"Setups the logging directory of the trainer and experimental loggers in-place.\n\n Args:\n subdirectory: Whether to append a subdirectory to the output log.\n \"\"\"\n self._log_dir = os.path.join(self.default_root_dir, self._session_id, subdirectory)\n\n enabled_loggers = []\n if isinstance(self.loggers, list) and len(self.loggers) > 0:\n for logger in self.loggers:\n if isinstance(logger, (pl_loggers.CSVLogger, pl_loggers.TensorBoardLogger)):\n if not cloud_io._is_local_file_protocol(self.default_root_dir):\n loguru.logger.warning(\n f\"Skipped {type(logger).__name__} as remote storage is not supported.\"\n )\n continue\n else:\n logger._root_dir = self.default_root_dir\n logger._name = self._session_id\n logger._version = subdirectory\n enabled_loggers.append(logger)\n\n self._loggers = enabled_loggers or [eva_loggers.DummyLogger(self._log_dir)]\n
"},{"location":"reference/core/trainers/trainer/#eva.core.trainers.Trainer.run_evaluation_session","title":"run_evaluation_session
","text":"Runs an evaluation session out-of-place.
It performs an evaluation run (fit and evaluate) the model self._n_run
times. Note that the input base_model
would not be modified, so the weights of the input model will remain as they are.
Parameters:
Name Type Description Defaultmodel
ModelModule
The base model module to evaluate.
requireddatamodule
DataModule
The data module.
required Source code insrc/eva/core/trainers/trainer.py
def run_evaluation_session(\n self,\n model: modules.ModelModule,\n datamodule: datamodules.DataModule,\n) -> None:\n \"\"\"Runs an evaluation session out-of-place.\n\n It performs an evaluation run (fit and evaluate) the model\n `self._n_run` times. Note that the input `base_model` would\n not be modified, so the weights of the input model will remain\n as they are.\n\n Args:\n model: The base model module to evaluate.\n datamodule: The data module.\n \"\"\"\n functional.run_evaluation_session(\n base_trainer=self,\n base_model=model,\n datamodule=datamodule,\n n_runs=self._n_runs,\n verbose=self._n_runs > 1,\n )\n
"},{"location":"reference/core/utils/multiprocessing/","title":"Multiprocessing","text":"Reference information for the utils Multiprocessing
API.
eva.core.utils.multiprocessing.Process
","text":" Bases: Process
Multiprocessing wrapper with logic to propagate exceptions to the parent process.
Source: https://stackoverflow.com/a/33599967/4992248
Source code insrc/eva/core/utils/multiprocessing.py
def __init__(self, *args: Any, **kwargs: Any) -> None:\n \"\"\"Initialize the process.\"\"\"\n multiprocessing.Process.__init__(self, *args, **kwargs)\n\n self._parent_conn, self._child_conn = multiprocessing.Pipe()\n self._exception = None\n
"},{"location":"reference/core/utils/multiprocessing/#eva.core.utils.multiprocessing.Process.exception","title":"exception
property
","text":"Property that contains exception information from the process.
"},{"location":"reference/core/utils/multiprocessing/#eva.core.utils.multiprocessing.Process.run","title":"run
","text":"Run the process.
Source code insrc/eva/core/utils/multiprocessing.py
def run(self) -> None:\n \"\"\"Run the process.\"\"\"\n try:\n multiprocessing.Process.run(self)\n self._child_conn.send(None)\n except Exception as e:\n tb = traceback.format_exc()\n self._child_conn.send((e, tb))\n
"},{"location":"reference/core/utils/multiprocessing/#eva.core.utils.multiprocessing.Process.check_exceptions","title":"check_exceptions
","text":"Check for exception propagate it to the parent process.
Source code insrc/eva/core/utils/multiprocessing.py
def check_exceptions(self) -> None:\n \"\"\"Check for exception propagate it to the parent process.\"\"\"\n if not self.is_alive():\n if self.exception:\n error, traceback = self.exception\n sys.stderr.write(traceback + \"\\n\")\n raise error\n
"},{"location":"reference/core/utils/workers/","title":"Workers","text":"Reference information for the utils Workers
API.
eva.core.utils.workers.main_worker_only
","text":"Function decorator which will execute it only on main / worker process.
Source code insrc/eva/core/utils/workers.py
def main_worker_only(func: Callable) -> Any:\n \"\"\"Function decorator which will execute it only on main / worker process.\"\"\"\n\n def wrapper(*args: Any, **kwargs: Any) -> Any:\n \"\"\"Wrapper function for the decorated method.\"\"\"\n if is_main_worker():\n return func(*args, **kwargs)\n\n return wrapper\n
"},{"location":"reference/core/utils/workers/#eva.core.utils.workers.is_main_worker","title":"eva.core.utils.workers.is_main_worker
","text":"Returns whether the main process / worker is currently used.
Source code insrc/eva/core/utils/workers.py
def is_main_worker() -> bool:\n \"\"\"Returns whether the main process / worker is currently used.\"\"\"\n process = multiprocessing.current_process()\n return process.name == \"MainProcess\"\n
"},{"location":"reference/vision/","title":"Vision","text":"Reference information for the Vision
API.
If you have not already installed the Vision
-package, install it with:
pip install 'kaiko-eva[vision]'\n
"},{"location":"reference/vision/utils/","title":"Utils","text":""},{"location":"reference/vision/utils/#eva.vision.utils.io.image","title":"eva.vision.utils.io.image
","text":"Image I/O related functions.
"},{"location":"reference/vision/utils/#eva.vision.utils.io.image.read_image","title":"read_image
","text":"Reads and loads the image from a file path as a RGB.
Parameters:
Name Type Description Defaultpath
str
The path of the image file.
requiredReturns:
Type DescriptionNDArray[uint8]
The RGB image as a numpy array.
Raises:
Type DescriptionFileExistsError
If the path does not exist or it is unreachable.
IOError
If the image could not be loaded.
Source code insrc/eva/vision/utils/io/image.py
def read_image(path: str) -> npt.NDArray[np.uint8]:\n \"\"\"Reads and loads the image from a file path as a RGB.\n\n Args:\n path: The path of the image file.\n\n Returns:\n The RGB image as a numpy array.\n\n Raises:\n FileExistsError: If the path does not exist or it is unreachable.\n IOError: If the image could not be loaded.\n \"\"\"\n return read_image_as_array(path, cv2.IMREAD_COLOR)\n
"},{"location":"reference/vision/utils/#eva.vision.utils.io.image.read_image_as_array","title":"read_image_as_array
","text":"Reads and loads an image file as a numpy array.
Parameters:
Name Type Description Defaultpath
str
The path to the image file.
requiredflags
int
Specifies the way in which the image should be read.
IMREAD_UNCHANGED
Returns:
Type DescriptionNDArray[uint8]
The image as a numpy array.
Raises:
Type DescriptionFileExistsError
If the path does not exist or it is unreachable.
IOError
If the image could not be loaded.
Source code insrc/eva/vision/utils/io/image.py
def read_image_as_array(path: str, flags: int = cv2.IMREAD_UNCHANGED) -> npt.NDArray[np.uint8]:\n \"\"\"Reads and loads an image file as a numpy array.\n\n Args:\n path: The path to the image file.\n flags: Specifies the way in which the image should be read.\n\n Returns:\n The image as a numpy array.\n\n Raises:\n FileExistsError: If the path does not exist or it is unreachable.\n IOError: If the image could not be loaded.\n \"\"\"\n _utils.check_file(path)\n image = cv2.imread(path, flags=flags)\n if image is None:\n raise IOError(\n f\"Input '{path}' could not be loaded. \"\n \"Please verify that the path is a valid image file.\"\n )\n\n if image.ndim == 3:\n image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n\n if image.ndim == 2 and flags == cv2.IMREAD_COLOR:\n image = image[:, :, np.newaxis]\n\n return np.asarray(image).astype(np.uint8)\n
"},{"location":"reference/vision/utils/#eva.vision.utils.io.nifti","title":"eva.vision.utils.io.nifti
","text":"NIfTI I/O related functions.
"},{"location":"reference/vision/utils/#eva.vision.utils.io.nifti.read_nifti_slice","title":"read_nifti_slice
","text":"Reads and loads a NIfTI image from a file path as uint8
.
Parameters:
Name Type Description Defaultpath
str
The path to the NIfTI file.
requiredslice_index
int
The image slice index to return.
requireduse_storage_dtype
bool
Whether to cast the raw image array to the inferred type.
True
Returns:
Type DescriptionNDArray[Any]
The image as a numpy array (height, width, channels).
Raises:
Type DescriptionFileExistsError
If the path does not exist or it is unreachable.
ValueError
If the input channel is invalid for the image.
Source code insrc/eva/vision/utils/io/nifti.py
def read_nifti_slice(\n path: str, slice_index: int, *, use_storage_dtype: bool = True\n) -> npt.NDArray[Any]:\n \"\"\"Reads and loads a NIfTI image from a file path as `uint8`.\n\n Args:\n path: The path to the NIfTI file.\n slice_index: The image slice index to return.\n use_storage_dtype: Whether to cast the raw image\n array to the inferred type.\n\n Returns:\n The image as a numpy array (height, width, channels).\n\n Raises:\n FileExistsError: If the path does not exist or it is unreachable.\n ValueError: If the input channel is invalid for the image.\n \"\"\"\n _utils.check_file(path)\n image_data = nib.load(path) # type: ignore\n image_slice = image_data.slicer[:, :, slice_index : slice_index + 1] # type: ignore\n image_array = image_slice.get_fdata()\n if use_storage_dtype:\n image_array = image_array.astype(image_data.get_data_dtype()) # type: ignore\n return image_array\n
"},{"location":"reference/vision/utils/#eva.vision.utils.io.nifti.fetch_total_nifti_slices","title":"fetch_total_nifti_slices
","text":"Fetches the total slides of a NIfTI image file.
Parameters:
Name Type Description Defaultpath
str
The path to the NIfTI file.
requiredReturns:
Type Descriptionint
The number of the total available slides.
Raises:
Type DescriptionFileExistsError
If the path does not exist or it is unreachable.
ValueError
If the input channel is invalid for the image.
Source code insrc/eva/vision/utils/io/nifti.py
def fetch_total_nifti_slices(path: str) -> int:\n \"\"\"Fetches the total slides of a NIfTI image file.\n\n Args:\n path: The path to the NIfTI file.\n\n Returns:\n The number of the total available slides.\n\n Raises:\n FileExistsError: If the path does not exist or it is unreachable.\n ValueError: If the input channel is invalid for the image.\n \"\"\"\n _utils.check_file(path)\n image = nib.load(path) # type: ignore\n image_shape = image.header.get_data_shape() # type: ignore\n return image_shape[-1]\n
"},{"location":"reference/vision/data/","title":"Vision Data","text":"Reference information for the Vision Data
API.
eva.vision.data.datasets.VisionDataset
","text":" Bases: Dataset
, ABC
, Generic[DataSample]
Base dataset class for vision tasks.
"},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.VisionDataset.filename","title":"filename
abstractmethod
","text":"Returns the filename of the index
'th data sample.
Note that this is the relative file path to the root.
Parameters:
Name Type Description Defaultindex
int
The index of the data-sample to select.
requiredReturns:
Type Descriptionstr
The filename of the index
'th data sample.
src/eva/vision/data/datasets/vision.py
@abc.abstractmethod\ndef filename(self, index: int) -> str:\n \"\"\"Returns the filename of the `index`'th data sample.\n\n Note that this is the relative file path to the root.\n\n Args:\n index: The index of the data-sample to select.\n\n Returns:\n The filename of the `index`'th data sample.\n \"\"\"\n
"},{"location":"reference/vision/data/datasets/#classification-datasets","title":"Classification datasets","text":""},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.BACH","title":"eva.vision.data.datasets.BACH
","text":" Bases: ImageClassification
Dataset class for BACH images and corresponding targets.
The dataset is split into train and validation by taking into account the patient IDs to avoid any data leakage.
Parameters:
Name Type Description Defaultroot
str
Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist.
requiredsplit
Literal['train', 'val'] | None
Dataset split to use. If None
, the entire dataset is used.
None
download
bool
Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:prepare_data
method and if the data does not yet exist on disk.
False
image_transforms
Callable | None
A function/transform that takes in an image and returns a transformed version.
None
target_transforms
Callable | None
A function/transform that takes in the target and transforms it.
None
Source code in src/eva/vision/data/datasets/classification/bach.py
def __init__(\n self,\n root: str,\n split: Literal[\"train\", \"val\"] | None = None,\n download: bool = False,\n image_transforms: Callable | None = None,\n target_transforms: Callable | None = None,\n) -> None:\n \"\"\"Initialize the dataset.\n\n The dataset is split into train and validation by taking into account\n the patient IDs to avoid any data leakage.\n\n Args:\n root: Path to the root directory of the dataset. The dataset will\n be downloaded and extracted here, if it does not already exist.\n split: Dataset split to use. If `None`, the entire dataset is used.\n download: Whether to download the data for the specified split.\n Note that the download will be executed only by additionally\n calling the :meth:`prepare_data` method and if the data does\n not yet exist on disk.\n image_transforms: A function/transform that takes in an image\n and returns a transformed version.\n target_transforms: A function/transform that takes in the target\n and transforms it.\n \"\"\"\n super().__init__(\n image_transforms=image_transforms,\n target_transforms=target_transforms,\n )\n\n self._root = root\n self._split = split\n self._download = download\n\n self._samples: List[Tuple[str, int]] = []\n self._indices: List[int] = []\n
"},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.PatchCamelyon","title":"eva.vision.data.datasets.PatchCamelyon
","text":" Bases: ImageClassification
Dataset class for PatchCamelyon images and corresponding targets.
Parameters:
Name Type Description Defaultroot
str
The path to the dataset root. This path should contain the uncompressed h5 files and the metadata.
requiredsplit
Literal['train', 'val', 'test']
The dataset split for training, validation, or testing.
requireddownload
bool
Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:prepare_data
method.
False
image_transforms
Callable | None
A function/transform that takes in an image and returns a transformed version.
None
target_transforms
Callable | None
A function/transform that takes in the target and transforms it.
None
Source code in src/eva/vision/data/datasets/classification/patch_camelyon.py
def __init__(\n self,\n root: str,\n split: Literal[\"train\", \"val\", \"test\"],\n download: bool = False,\n image_transforms: Callable | None = None,\n target_transforms: Callable | None = None,\n) -> None:\n \"\"\"Initializes the dataset.\n\n Args:\n root: The path to the dataset root. This path should contain\n the uncompressed h5 files and the metadata.\n split: The dataset split for training, validation, or testing.\n download: Whether to download the data for the specified split.\n Note that the download will be executed only by additionally\n calling the :meth:`prepare_data` method.\n image_transforms: A function/transform that takes in an image\n and returns a transformed version.\n target_transforms: A function/transform that takes in the target\n and transforms it.\n \"\"\"\n super().__init__(\n image_transforms=image_transforms,\n target_transforms=target_transforms,\n )\n\n self._root = root\n self._split = split\n self._download = download\n
"},{"location":"reference/vision/data/datasets/#segmentation-datasets","title":"Segmentation datasets","text":""},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.ImageSegmentation","title":"eva.vision.data.datasets.ImageSegmentation
","text":" Bases: VisionDataset[Tuple[Image, Mask]]
, ABC
Image segmentation abstract dataset.
Parameters:
Name Type Description Defaulttransforms
Callable | None
A function/transforms that takes in an image and a label and returns the transformed versions of both.
None
Source code in src/eva/vision/data/datasets/segmentation/base.py
def __init__(\n self,\n transforms: Callable | None = None,\n) -> None:\n \"\"\"Initializes the image segmentation base class.\n\n Args:\n transforms: A function/transforms that takes in an\n image and a label and returns the transformed versions of both.\n \"\"\"\n super().__init__()\n\n self._transforms = transforms\n
"},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.ImageSegmentation.classes","title":"classes: List[str] | None
property
","text":"Returns the list with names of the dataset names.
"},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.ImageSegmentation.class_to_idx","title":"class_to_idx: Dict[str, int] | None
property
","text":"Returns a mapping of the class name to its target index.
"},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.ImageSegmentation.load_metadata","title":"load_metadata
","text":"Returns the dataset metadata.
Parameters:
Name Type Description Defaultindex
int | None
The index of the data sample to return the metadata of. If None
, it will return the metadata of the current dataset.
Returns:
Type DescriptionDict[str, Any] | List[Dict[str, Any]] | None
The sample metadata.
Source code insrc/eva/vision/data/datasets/segmentation/base.py
def load_metadata(self, index: int | None) -> Dict[str, Any] | List[Dict[str, Any]] | None:\n \"\"\"Returns the dataset metadata.\n\n Args:\n index: The index of the data sample to return the metadata of.\n If `None`, it will return the metadata of the current dataset.\n\n Returns:\n The sample metadata.\n \"\"\"\n
"},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.ImageSegmentation.load_image","title":"load_image
abstractmethod
","text":"Loads and returns the index
'th image sample.
Parameters:
Name Type Description Defaultindex
int
The index of the data sample to load.
requiredReturns:
Type DescriptionImage
An image torchvision tensor (channels, height, width).
Source code insrc/eva/vision/data/datasets/segmentation/base.py
@abc.abstractmethod\ndef load_image(self, index: int) -> tv_tensors.Image:\n \"\"\"Loads and returns the `index`'th image sample.\n\n Args:\n index: The index of the data sample to load.\n\n Returns:\n An image torchvision tensor (channels, height, width).\n \"\"\"\n
"},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.ImageSegmentation.load_mask","title":"load_mask
abstractmethod
","text":"Returns the index
'th target masks sample.
Parameters:
Name Type Description Defaultindex
int
The index of the data sample target masks to load.
requiredReturns:
Type DescriptionMask
The semantic mask as a (H x W) shaped tensor with integer
Mask
values which represent the pixel class id.
Source code insrc/eva/vision/data/datasets/segmentation/base.py
@abc.abstractmethod\ndef load_mask(self, index: int) -> tv_tensors.Mask:\n \"\"\"Returns the `index`'th target masks sample.\n\n Args:\n index: The index of the data sample target masks to load.\n\n Returns:\n The semantic mask as a (H x W) shaped tensor with integer\n values which represent the pixel class id.\n \"\"\"\n
"},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.TotalSegmentator2D","title":"eva.vision.data.datasets.TotalSegmentator2D
","text":" Bases: ImageSegmentation
TotalSegmentator 2D segmentation dataset.
Parameters:
Name Type Description Defaultroot
str
Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist.
requiredsplit
Literal['train', 'val'] | None
Dataset split to use. If None
, the entire dataset is used.
version
Literal['small', 'full'] | None
The version of the dataset to initialize. If None
, it will use the files located at root as is and wont perform any checks.
'small'
download
bool
Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:prepare_data
method and if the data does not exist yet on disk.
False
as_uint8
bool
Whether to convert and return the images as a 8-bit.
True
transforms
Callable | None
A function/transforms that takes in an image and a target mask and returns the transformed versions of both.
None
Source code in src/eva/vision/data/datasets/segmentation/total_segmentator.py
def __init__(\n self,\n root: str,\n split: Literal[\"train\", \"val\"] | None,\n version: Literal[\"small\", \"full\"] | None = \"small\",\n download: bool = False,\n as_uint8: bool = True,\n transforms: Callable | None = None,\n) -> None:\n \"\"\"Initialize dataset.\n\n Args:\n root: Path to the root directory of the dataset. The dataset will\n be downloaded and extracted here, if it does not already exist.\n split: Dataset split to use. If `None`, the entire dataset is used.\n version: The version of the dataset to initialize. If `None`, it will\n use the files located at root as is and wont perform any checks.\n download: Whether to download the data for the specified split.\n Note that the download will be executed only by additionally\n calling the :meth:`prepare_data` method and if the data does not\n exist yet on disk.\n as_uint8: Whether to convert and return the images as a 8-bit.\n transforms: A function/transforms that takes in an image and a target\n mask and returns the transformed versions of both.\n \"\"\"\n super().__init__(transforms=transforms)\n\n self._root = root\n self._split = split\n self._version = version\n self._download = download\n self._as_uint8 = as_uint8\n\n self._samples_dirs: List[str] = []\n self._indices: List[Tuple[int, int]] = []\n
"},{"location":"reference/vision/data/transforms/","title":"Transforms","text":""},{"location":"reference/vision/data/transforms/#eva.core.data.transforms.dtype.ArrayToTensor","title":"eva.core.data.transforms.dtype.ArrayToTensor
","text":"Converts a numpy array to a torch tensor.
"},{"location":"reference/vision/data/transforms/#eva.core.data.transforms.dtype.ArrayToFloatTensor","title":"eva.core.data.transforms.dtype.ArrayToFloatTensor
","text":" Bases: ArrayToTensor
Converts a numpy array to a torch tensor and casts it to float.
"},{"location":"reference/vision/data/transforms/#eva.vision.data.transforms.ResizeAndCrop","title":"eva.vision.data.transforms.ResizeAndCrop
","text":" Bases: Compose
Resizes, crops and normalizes an input image while preserving its aspect ratio.
Parameters:
Name Type Description Defaultsize
int | Sequence[int]
Desired output size of the crop. If size is an int
instead of sequence like (h, w), a square crop (size, size) is made.
224
mean
Sequence[float]
Sequence of means for each image channel.
(0.5, 0.5, 0.5)
std
Sequence[float]
Sequence of standard deviations for each image channel.
(0.5, 0.5, 0.5)
Source code in src/eva/vision/data/transforms/common/resize_and_crop.py
def __init__(\n self,\n size: int | Sequence[int] = 224,\n mean: Sequence[float] = (0.5, 0.5, 0.5),\n std: Sequence[float] = (0.5, 0.5, 0.5),\n) -> None:\n \"\"\"Initializes the transform object.\n\n Args:\n size: Desired output size of the crop. If size is an `int` instead\n of sequence like (h, w), a square crop (size, size) is made.\n mean: Sequence of means for each image channel.\n std: Sequence of standard deviations for each image channel.\n \"\"\"\n self._size = size\n self._mean = mean\n self._std = std\n\n super().__init__(transforms=self._build_transforms())\n
"},{"location":"reference/vision/models/networks/","title":"Networks","text":""},{"location":"reference/vision/models/networks/#eva.vision.models.networks.ABMIL","title":"eva.vision.models.networks.ABMIL
","text":" Bases: Module
ABMIL network for multiple instance learning classification tasks.
Takes an array of patch level embeddings per slide as input. This implementation supports batched inputs of shape (batch_size
, n_instances
, input_size
). For slides with less than n_instances
patches, you can apply padding and provide a mask tensor to the forward pass.
The original implementation from [1] was used as a reference: https://github.com/AMLab-Amsterdam/AttentionDeepMIL/blob/master/model.py
Notes[1] Maximilian Ilse, Jakub M. Tomczak, Max Welling, \"Attention-based Deep Multiple Instance Learning\", 2018 https://arxiv.org/abs/1802.04712
Parameters:
Name Type Description Defaultinput_size
int
input embedding dimension
requiredoutput_size
int
number of classes
requiredprojected_input_size
int | None
size of the projected input. if None
, no projection is performed.
hidden_size_attention
int
hidden dimension in attention network
128
hidden_sizes_mlp
tuple
dimensions for hidden layers in last mlp
(128, 64)
use_bias
bool
whether to use bias in the attention network
True
dropout_input_embeddings
float
dropout rate for the input embeddings
0.0
dropout_attention
float
dropout rate for the attention network and classifier
0.0
dropout_mlp
float
dropout rate for the final MLP network
0.0
pad_value
int | float | None
Value indicating padding in the input tensor. If specified, entries with this value in the will be masked. If set to None
, no masking is applied.
float('-inf')
Source code in src/eva/vision/models/networks/abmil.py
def __init__(\n self,\n input_size: int,\n output_size: int,\n projected_input_size: int | None,\n hidden_size_attention: int = 128,\n hidden_sizes_mlp: tuple = (128, 64),\n use_bias: bool = True,\n dropout_input_embeddings: float = 0.0,\n dropout_attention: float = 0.0,\n dropout_mlp: float = 0.0,\n pad_value: int | float | None = float(\"-inf\"),\n) -> None:\n \"\"\"Initializes the ABMIL network.\n\n Args:\n input_size: input embedding dimension\n output_size: number of classes\n projected_input_size: size of the projected input. if `None`, no projection is\n performed.\n hidden_size_attention: hidden dimension in attention network\n hidden_sizes_mlp: dimensions for hidden layers in last mlp\n use_bias: whether to use bias in the attention network\n dropout_input_embeddings: dropout rate for the input embeddings\n dropout_attention: dropout rate for the attention network and classifier\n dropout_mlp: dropout rate for the final MLP network\n pad_value: Value indicating padding in the input tensor. If specified, entries with\n this value in the will be masked. If set to `None`, no masking is applied.\n \"\"\"\n super().__init__()\n\n self._pad_value = pad_value\n\n if projected_input_size:\n self.projector = nn.Sequential(\n nn.Linear(input_size, projected_input_size, bias=True),\n nn.Dropout(p=dropout_input_embeddings),\n )\n input_size = projected_input_size\n else:\n self.projector = nn.Dropout(p=dropout_input_embeddings)\n\n self.gated_attention = GatedAttention(\n input_dim=input_size,\n hidden_dim=hidden_size_attention,\n dropout=dropout_attention,\n n_classes=1,\n use_bias=use_bias,\n )\n\n self.classifier = MLP(\n input_size=input_size,\n output_size=output_size,\n hidden_layer_sizes=hidden_sizes_mlp,\n dropout=dropout_mlp,\n hidden_activation_fn=nn.ReLU,\n )\n
"},{"location":"reference/vision/models/networks/#eva.vision.models.networks.ABMIL.forward","title":"forward
","text":"Forward pass.
Parameters:
Name Type Description Defaultinput_tensor
Tensor
Tensor with expected shape of (batch_size, n_instances, input_size).
required Source code insrc/eva/vision/models/networks/abmil.py
def forward(self, input_tensor: torch.Tensor) -> torch.Tensor:\n \"\"\"Forward pass.\n\n Args:\n input_tensor: Tensor with expected shape of (batch_size, n_instances, input_size).\n \"\"\"\n input_tensor, mask = self._mask_values(input_tensor, self._pad_value)\n\n # (batch_size, n_instances, input_size) -> (batch_size, n_instances, projected_input_size)\n input_tensor = self.projector(input_tensor)\n\n attention_logits = self.gated_attention(input_tensor) # (batch_size, n_instances, 1)\n if mask is not None:\n # fill masked values with -inf, which will yield 0s after softmax\n attention_logits = attention_logits.masked_fill(mask, float(\"-inf\"))\n\n attention_weights = nn.functional.softmax(attention_logits, dim=1)\n # (batch_size, n_instances, 1)\n\n attention_result = torch.matmul(torch.transpose(attention_weights, 1, 2), input_tensor)\n # (batch_size, 1, hidden_size_attention)\n\n attention_result = torch.squeeze(attention_result, 1) # (batch_size, hidden_size_attention)\n\n return self.classifier(attention_result) # (batch_size, output_size)\n
"},{"location":"reference/vision/utils/io/","title":"IO","text":""},{"location":"reference/vision/utils/io/#eva.vision.utils.io.image","title":"eva.vision.utils.io.image
","text":"Image I/O related functions.
"},{"location":"reference/vision/utils/io/#eva.vision.utils.io.image.read_image","title":"read_image
","text":"Reads and loads the image from a file path as a RGB.
Parameters:
Name Type Description Defaultpath
str
The path of the image file.
requiredReturns:
Type DescriptionNDArray[uint8]
The RGB image as a numpy array.
Raises:
Type DescriptionFileExistsError
If the path does not exist or it is unreachable.
IOError
If the image could not be loaded.
Source code insrc/eva/vision/utils/io/image.py
def read_image(path: str) -> npt.NDArray[np.uint8]:\n \"\"\"Reads and loads the image from a file path as a RGB.\n\n Args:\n path: The path of the image file.\n\n Returns:\n The RGB image as a numpy array.\n\n Raises:\n FileExistsError: If the path does not exist or it is unreachable.\n IOError: If the image could not be loaded.\n \"\"\"\n return read_image_as_array(path, cv2.IMREAD_COLOR)\n
"},{"location":"reference/vision/utils/io/#eva.vision.utils.io.image.read_image_as_array","title":"read_image_as_array
","text":"Reads and loads an image file as a numpy array.
Parameters:
Name Type Description Defaultpath
str
The path to the image file.
requiredflags
int
Specifies the way in which the image should be read.
IMREAD_UNCHANGED
Returns:
Type DescriptionNDArray[uint8]
The image as a numpy array.
Raises:
Type DescriptionFileExistsError
If the path does not exist or it is unreachable.
IOError
If the image could not be loaded.
Source code insrc/eva/vision/utils/io/image.py
def read_image_as_array(path: str, flags: int = cv2.IMREAD_UNCHANGED) -> npt.NDArray[np.uint8]:\n \"\"\"Reads and loads an image file as a numpy array.\n\n Args:\n path: The path to the image file.\n flags: Specifies the way in which the image should be read.\n\n Returns:\n The image as a numpy array.\n\n Raises:\n FileExistsError: If the path does not exist or it is unreachable.\n IOError: If the image could not be loaded.\n \"\"\"\n _utils.check_file(path)\n image = cv2.imread(path, flags=flags)\n if image is None:\n raise IOError(\n f\"Input '{path}' could not be loaded. \"\n \"Please verify that the path is a valid image file.\"\n )\n\n if image.ndim == 3:\n image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n\n if image.ndim == 2 and flags == cv2.IMREAD_COLOR:\n image = image[:, :, np.newaxis]\n\n return np.asarray(image).astype(np.uint8)\n
"},{"location":"reference/vision/utils/io/#eva.vision.utils.io.nifti","title":"eva.vision.utils.io.nifti
","text":"NIfTI I/O related functions.
"},{"location":"reference/vision/utils/io/#eva.vision.utils.io.nifti.read_nifti_slice","title":"read_nifti_slice
","text":"Reads and loads a NIfTI image from a file path as uint8
.
Parameters:
Name Type Description Defaultpath
str
The path to the NIfTI file.
requiredslice_index
int
The image slice index to return.
requireduse_storage_dtype
bool
Whether to cast the raw image array to the inferred type.
True
Returns:
Type DescriptionNDArray[Any]
The image as a numpy array (height, width, channels).
Raises:
Type DescriptionFileExistsError
If the path does not exist or it is unreachable.
ValueError
If the input channel is invalid for the image.
Source code insrc/eva/vision/utils/io/nifti.py
def read_nifti_slice(\n path: str, slice_index: int, *, use_storage_dtype: bool = True\n) -> npt.NDArray[Any]:\n \"\"\"Reads and loads a NIfTI image from a file path as `uint8`.\n\n Args:\n path: The path to the NIfTI file.\n slice_index: The image slice index to return.\n use_storage_dtype: Whether to cast the raw image\n array to the inferred type.\n\n Returns:\n The image as a numpy array (height, width, channels).\n\n Raises:\n FileExistsError: If the path does not exist or it is unreachable.\n ValueError: If the input channel is invalid for the image.\n \"\"\"\n _utils.check_file(path)\n image_data = nib.load(path) # type: ignore\n image_slice = image_data.slicer[:, :, slice_index : slice_index + 1] # type: ignore\n image_array = image_slice.get_fdata()\n if use_storage_dtype:\n image_array = image_array.astype(image_data.get_data_dtype()) # type: ignore\n return image_array\n
"},{"location":"reference/vision/utils/io/#eva.vision.utils.io.nifti.fetch_total_nifti_slices","title":"fetch_total_nifti_slices
","text":"Fetches the total slides of a NIfTI image file.
Parameters:
Name Type Description Defaultpath
str
The path to the NIfTI file.
requiredReturns:
Type Descriptionint
The number of the total available slides.
Raises:
Type DescriptionFileExistsError
If the path does not exist or it is unreachable.
ValueError
If the input channel is invalid for the image.
Source code insrc/eva/vision/utils/io/nifti.py
def fetch_total_nifti_slices(path: str) -> int:\n \"\"\"Fetches the total slides of a NIfTI image file.\n\n Args:\n path: The path to the NIfTI file.\n\n Returns:\n The number of the total available slides.\n\n Raises:\n FileExistsError: If the path does not exist or it is unreachable.\n ValueError: If the input channel is invalid for the image.\n \"\"\"\n _utils.check_file(path)\n image = nib.load(path) # type: ignore\n image_shape = image.header.get_data_shape() # type: ignore\n return image_shape[-1]\n
"},{"location":"user-guide/","title":"User Guide","text":"Here you can find everything you need to install, understand and interact with eva.
"},{"location":"user-guide/#getting-started","title":"Getting started","text":"Install eva on your machine and learn how to use eva.
"},{"location":"user-guide/#tutorials","title":"Tutorials","text":"To familiarize yourself with eva, try out some of our tutorials.
Get to know eva in more depth by studying our advanced user guides.
This document shows how to use eva's Model Wrapper API (eva.models.networks.wrappers
) to load different model formats from a series of sources such as PyTorch Hub, HuggingFace Model Hub and ONNX.
The eva framework is built on top of PyTorch Lightning and thus naturally supports loading PyTorch models. You just need to specify the class path of your model in the backbone section of the .yaml
config file.
backbone:\n class_path: path.to.your.ModelClass\n init_args:\n arg_1: ...\n arg_2: ...\n
Note that your ModelClass
should subclass torch.nn.Module
and implement the forward()
method to return embedding tensors of shape [embedding_dim]
.
To load models from PyTorch Hub or other torch model providers, the easiest way is to use the ModelFromFunction
wrapper class:
backbone:\n class_path: eva.models.networks.wrappers.ModelFromFunction\n init_args:\n path: torch.hub.load\n arguments:\n repo_or_dir: facebookresearch/dino:main\n model: dino_vits16\n pretrained: false\n checkpoint_path: path/to/your/checkpoint.torch\n
Note that if a checkpoint_path
is provided, ModelFromFunction
will automatically initialize the specified model using the provided weights from that checkpoint file.
Similar to the above example, we can easily load models using the common vision library timm
:
backbone:\n class_path: eva.models.networks.wrappers.ModelFromFunction\n init_args:\n path: timm.create_model\n arguments:\n model_name: resnet18\n pretrained: true\n
"},{"location":"user-guide/advanced/model_wrappers/#loading-models-from-huggingface-hub","title":"Loading models from HuggingFace Hub","text":"For loading models from HuggingFace Hub, eva provides a custom wrapper class HuggingFaceModel
which can be used as follows:
backbone:\n class_path: eva.models.networks.wrappers.HuggingFaceModel\n init_args:\n model_name_or_path: owkin/phikon\n tensor_transforms: \n class_path: eva.models.networks.transforms.ExtractCLSFeatures\n
In the above example, the forward pass implemented by the owkin/phikon
model returns an output tensor containing the hidden states of all input tokens. In order to extract the state corresponding to the CLS token only, we can specify a transformation via the tensor_transforms
argument which will be applied to the model output.
.onnx
model checkpoints can be loaded using the ONNXModel
wrapper class as follows:
class_path: eva.models.networks.wrappers.ONNXModel\ninit_args:\n path: path/to/model.onnx\n device: cuda\n
"},{"location":"user-guide/advanced/model_wrappers/#implementing-custom-model-wrappers","title":"Implementing custom model wrappers","text":"You can also implement your own model wrapper classes, in case your model format is not supported by the wrapper classes that eva already provides. To do so, you need to subclass eva.models.networks.wrappers.BaseModel
and implement the following abstract methods:
load_model
: Returns an instantiated model object & loads pre-trained model weights from a checkpoint if available. model_forward
: Implements the forward pass of the model and returns the output as a torch.Tensor
of shape [embedding_dim]
You can take the implementations of ModelFromFunction
, HuggingFaceModel
and ONNXModel
wrappers as a reference.
To produce the evaluation results presented here, you can run eva with the settings below.
Make sure to replace <task>
in the commands below with bach
, crc
, mhist
or patch_camelyon
.
Note that to run the commands below you will need to first download the data. BACH, CRC and PatchCamelyon provide automatic download by setting the argument download: true
in their respective config-files. In the case of MHIST you will need to download the data manually by following the instructions provided here.
Evaluating the backbone with randomly initialized weights serves as a baseline to compare the pretrained FMs to an FM that produces embeddings without any prior learning on image tasks. To evaluate, run:
PRETRAINED=false \\\nEMBEDDINGS_ROOT=\"./data/embeddings/dino_vits16_random\" \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
"},{"location":"user-guide/advanced/replicate_evaluations/#dino-vit-s16-imagenet","title":"DINO ViT-S16 (ImageNet)","text":"The next baseline model, uses a pretrained ViT-S16 backbone with ImageNet weights. To evaluate, run:
EMBEDDINGS_ROOT=\"./data/embeddings/dino_vits16_imagenet\" \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
"},{"location":"user-guide/advanced/replicate_evaluations/#dino-vit-b8-imagenet","title":"DINO ViT-B8 (ImageNet)","text":"To evaluate performance on the larger ViT-B8 backbone pretrained on ImageNet, run:
EMBEDDINGS_ROOT=\"./data/embeddings/dino_vitb8_imagenet\" \\\nDINO_BACKBONE=dino_vitb8 \\\nIN_FEATURES=768 \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
"},{"location":"user-guide/advanced/replicate_evaluations/#dinov2-vit-l14-imagenet","title":"DINOv2 ViT-L14 (ImageNet)","text":"To evaluate performance on Dino v2 ViT-L14 backbone pretrained on ImageNet, run:
PRETRAINED=true \\\nEMBEDDINGS_ROOT=\"./data/embeddings/dinov2_vitl14_kaiko\" \\\nREPO_OR_DIR=facebookresearch/dinov2:main \\\nDINO_BACKBONE=dinov2_vitl14_reg \\\nFORCE_RELOAD=true \\\nIN_FEATURES=1024 \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
"},{"location":"user-guide/advanced/replicate_evaluations/#lunit-dino-vit-s16-tcga","title":"Lunit - DINO ViT-S16 (TCGA)","text":"Lunit, released the weights for a DINO ViT-S16 backbone, pretrained on TCGA data on GitHub. To evaluate, run:
PRETRAINED=false \\\nEMBEDDINGS_ROOT=\"./data/embeddings/dino_vits16_lunit\" \\\nCHECKPOINT_PATH=\"https://github.com/lunit-io/benchmark-ssl-pathology/releases/download/pretrained-weights/dino_vit_small_patch16_ep200.torch\" \\\nNORMALIZE_MEAN=[0.70322989,0.53606487,0.66096631] \\\nNORMALIZE_STD=[0.21716536,0.26081574,0.20723464] \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
"},{"location":"user-guide/advanced/replicate_evaluations/#owkin-ibot-vit-b16-tcga","title":"Owkin - iBOT ViT-B16 (TCGA)","text":"Owkin released the weights for \"Phikon\", an FM trained with iBOT on TCGA data, via HuggingFace. To evaluate, run:
EMBEDDINGS_ROOT=\"./data/embeddings/dino_vitb16_owkin\" \\\neva predict_fit --config configs/vision/owkin/phikon/offline/<task>.yaml\n
Note: since eva provides the config files to evaluate tasks with the Phikon FM in \"configs/vision/owkin/phikon/offline\", it is not necessary to set the environment variables needed for the runs above.
"},{"location":"user-guide/advanced/replicate_evaluations/#uni-dinov2-vit-l16-mass-100k","title":"UNI - DINOv2 ViT-L16 (Mass-100k)","text":"The UNI FM, introduced in [1] is available on HuggingFace. Note that access needs to be requested.
Unlike the other FMs evaluated for our leaderboard, the UNI model uses the vision library timm
to load the model. To accomodate this, you will need to modify the config files (see also Model Wrappers).
Make a copy of the task-config you'd like to run, and replace the backbone
section with:
backbone:\n class_path: eva.models.ModelFromFunction\n init_args:\n path: timm.create_model\n arguments:\n model_name: vit_large_patch16_224\n patch_size: 16\n init_values: 1e-5\n num_classes: 0\n dynamic_img_size: true\n checkpoint_path: <path/to/pytorch_model.bin>\n
Now evaluate the model by running:
EMBEDDINGS_ROOT=\"./data/embeddings/dinov2_vitl16_uni\" \\\nIN_FEATURES=1024 \\\neva predict_fit --config path/to/<task>.yaml\n
"},{"location":"user-guide/advanced/replicate_evaluations/#kaikoai-dino-vit-s16-tcga","title":"kaiko.ai - DINO ViT-S16 (TCGA)","text":"To evaluate kaiko.ai's FM with DINO ViT-S16 backbone, pretrained on TCGA data on GitHub, run:
PRETRAINED=false \\\nEMBEDDINGS_ROOT=\"./data/embeddings/dino_vits16_kaiko\" \\\nCHECKPOINT_PATH=[TBD*] \\\nNORMALIZE_MEAN=[0.5,0.5,0.5] \\\nNORMALIZE_STD=[0.5,0.5,0.5] \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
* path to public checkpoint will be added when available.
"},{"location":"user-guide/advanced/replicate_evaluations/#kaikoai-dino-vit-s8-tcga","title":"kaiko.ai - DINO ViT-S8 (TCGA)","text":"To evaluate kaiko.ai's FM with DINO ViT-S8 backbone, pretrained on TCGA data on GitHub, run:
PRETRAINED=false \\\nEMBEDDINGS_ROOT=\"./data/embeddings/dino_vits8_kaiko\" \\\nDINO_BACKBONE=dino_vits8 \\\nCHECKPOINT_PATH=[TBD*] \\\nNORMALIZE_MEAN=[0.5,0.5,0.5] \\\nNORMALIZE_STD=[0.5,0.5,0.5] \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
* path to public checkpoint will be added when available.
"},{"location":"user-guide/advanced/replicate_evaluations/#kaikoai-dino-vit-b16-tcga","title":"kaiko.ai - DINO ViT-B16 (TCGA)","text":"To evaluate kaiko.ai's FM with the larger DINO ViT-B16 backbone, pretrained on TCGA data, run:
PRETRAINED=false \\\nEMBEDDINGS_ROOT=\"./data/embeddings/dino_vitb16_kaiko\" \\\nDINO_BACKBONE=dino_vitb16 \\\nCHECKPOINT_PATH=[TBD*] \\\nIN_FEATURES=768 \\\nNORMALIZE_MEAN=[0.5,0.5,0.5] \\\nNORMALIZE_STD=[0.5,0.5,0.5] \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
* path to public checkpoint will be added when available.
"},{"location":"user-guide/advanced/replicate_evaluations/#kaikoai-dino-vit-b8-tcga","title":"kaiko.ai - DINO ViT-B8 (TCGA)","text":"To evaluate kaiko.ai's FM with the larger DINO ViT-B8 backbone, pretrained on TCGA data, run:
PRETRAINED=false \\\nEMBEDDINGS_ROOT=\"./data/embeddings/dino_vitb8_kaiko\" \\\nDINO_BACKBONE=dino_vitb8 \\\nCHECKPOINT_PATH=[TBD*] \\\nIN_FEATURES=768 \\\nNORMALIZE_MEAN=[0.5,0.5,0.5] \\\nNORMALIZE_STD=[0.5,0.5,0.5] \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
* path to public checkpoint will be added when available.
"},{"location":"user-guide/advanced/replicate_evaluations/#kaikoai-dinov2-vit-l14-tcga","title":"kaiko.ai - DINOv2 ViT-L14 (TCGA)","text":"To evaluate kaiko.ai's FM with the larger DINOv2 ViT-L14 backbone, pretrained on TCGA data, run:
PRETRAINED=false \\\nEMBEDDINGS_ROOT=\"./data/embeddings/dinov2_vitl14_kaiko\" \\\nREPO_OR_DIR=facebookresearch/dinov2:main \\\nDINO_BACKBONE=dinov2_vitl14_reg \\\nFORCE_RELOAD=true \\\nCHECKPOINT_PATH=[TBD*] \\\nIN_FEATURES=1024 \\\nNORMALIZE_MEAN=[0.5,0.5,0.5] \\\nNORMALIZE_STD=[0.5,0.5,0.5] \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
* path to public checkpoint will be added when available.
"},{"location":"user-guide/advanced/replicate_evaluations/#references","title":"References","text":"[1]: Chen: A General-Purpose Self-Supervised Model for Computational Pathology, 2023 (arxiv)
"},{"location":"user-guide/getting-started/how_to_use/","title":"How to use eva","text":"Before starting to use eva, it's important to get familiar with the different workflows, subcommands and configurations.
"},{"location":"user-guide/getting-started/how_to_use/#eva-subcommands","title":"eva subcommands","text":"To run an evaluation, we call:
eva <subcommand> --config <path-to-config-file>\n
The eva interface supports the subcommands: predict
, fit
and predict_fit
.
fit
: is used to train a decoder for a specific task and subsequently evaluate the performance. This can be done online or offline *predict
: is used to compute embeddings for input images with a provided FM-checkpoint. This is the first step of the offline workflowpredict_fit
: runs predict
and fit
sequentially. Like the fit
-online run, it runs a complete evaluation with images as input.We distinguish between the online and offline workflow:
The online workflow can be used to quickly run a complete evaluation without saving and tracking embeddings. The offline workflow runs faster (only one FM-backbone forward pass) and is ideal to experiment with different decoders on the same FM-backbone.
"},{"location":"user-guide/getting-started/how_to_use/#run-configurations","title":"Run configurations","text":""},{"location":"user-guide/getting-started/how_to_use/#config-files","title":"Config files","text":"The setup for an eva run is provided in a .yaml
config file which is defined with the --config
flag.
A config file specifies the setup for the trainer (including callback for the model backbone), the model (setup of the trainable decoder) and data module.
The config files for the datasets and models that eva supports out of the box, you can find on GitHub (scroll to the bottom of the page). We recommend that you inspect some of them to get a better understanding of their structure and content.
"},{"location":"user-guide/getting-started/how_to_use/#environment-variables","title":"Environment variables","text":"To customize runs, without the need of creating custom config-files, you can overwrite the config-parameters listed below by setting them as environment variables.
Type DescriptionOUTPUT_ROOT
str The directory to store logging outputs and evaluation results EMBEDDINGS_ROOT
str The directory to store the computed embeddings CHECKPOINT_PATH
str Path to the FM-checkpoint to be evaluated IN_FEATURES
int The input feature dimension (embedding) NUM_CLASSES
int Number of classes for classification tasks N_RUNS
int Number of fit
runs to perform in a session, defaults to 5 MAX_STEPS
int Maximum number of training steps (if early stopping is not triggered) BATCH_SIZE
int Batch size for a training step PREDICT_BATCH_SIZE
int Batch size for a predict step LR_VALUE
float Learning rate for training the decoder MONITOR_METRIC
str The metric to monitor for early stopping and final model checkpoint loading MONITOR_METRIC_MODE
str \"min\" or \"max\", depending on the MONITOR_METRIC
used REPO_OR_DIR
str GitHub repo with format containing model implementation, e.g. \"facebookresearch/dino:main\" DINO_BACKBONE
str Backbone model architecture if a facebookresearch/dino FM is evaluated FORCE_RELOAD
bool Whether to force a fresh download of the github repo unconditionally PRETRAINED
bool Whether to load FM-backbone weights from a pretrained model"},{"location":"user-guide/getting-started/installation/","title":"Installation","text":"Create and activate a virtual environment with Python 3.10+
Install eva and the eva-vision package with:
pip install \"kaiko-eva[vision]\"\n
"},{"location":"user-guide/getting-started/installation/#run-eva","title":"Run eva","text":"Now you are all set and you can start running eva with:
eva <subcommand> --config <path-to-config-file>\n
To learn how the subcommands and configs work, we recommend you familiarize yourself with How to use eva and then proceed to running eva with the Tutorials."},{"location":"user-guide/tutorials/evaluate_resnet/","title":"Train and evaluate a ResNet","text":"If you read How to use eva and followed the Tutorials to this point, you might ask yourself why you would not always use the offline workflow to run a complete evaluation. An offline-run stores the computed embeddings and runs faster than the online-workflow which computes a backbone-forward pass in every epoch.
One use case for the online-workflow is the evaluation of a supervised ML model that does not rely on a backbone/head architecture. To demonstrate this, let's train a ResNet 18 from PyTorch Image Models (timm).
To do this we need to create a new config-file:
configs/vision/resnet18
configs/vision/dino_vit/online/bach.yaml
and move it to the new folder.Now let's adapt the new bach.yaml
-config to the new model:
backbone
-key from the config. If no backbone is specified, the backbone will be skipped during inference. head:\n class_path: eva.models.ModelFromFunction\n init_args:\n path: timm.create_model\n arguments:\n model_name: resnet18\n num_classes: &NUM_CLASSES 4\n drop_rate: 0.0\n pretrained: false\n
To reduce training time, let's overwrite some of the default parameters. Run the training & evaluation with: OUTPUT_ROOT=logs/resnet/bach \\\nMAX_STEPS=50 \\\nLR_VALUE=0.01 \\\neva fit --config configs/vision/resnet18/bach.yaml\n
Once the run is complete, take a look at the results in logs/resnet/bach/<session-id>/results.json
and check out the tensorboard with tensorboard --logdir logs/resnet/bach
. How does the performance compare to the results observed in the previous tutorials?"},{"location":"user-guide/tutorials/offline_vs_online/","title":"Offline vs. online evaluations","text":"In this tutorial we run eva with the three subcommands predict
, fit
and predict_fit
, and take a look at the difference between offline and online workflows.
If you haven't downloaded the config files yet, please download them from GitHub (scroll to the bottom of the page).
For this tutorial we use the BACH classification task which is available on Zenodo and is distributed under Attribution-NonCommercial-ShareAlike 4.0 International license.
To let eva automatically handle the dataset download, you can open configs/vision/dino_vit/offline/bach.yaml
and set download: true
. Before doing so, please make sure that your use case is compliant with the dataset license.
First, let's use the predict
-command to download the data and compute embeddings. In this example we use a randomly initialized dino_vits16
as backbone.
Open a terminal in the folder where you installed eva and run:
PRETRAINED=false \\\nEMBEDDINGS_ROOT=./data/embeddings/dino_vits16_random \\\neva predict --config configs/vision/dino_vit/offline/bach.yaml\n
Executing this command will:
./data/bach
(if it has not already been downloaded to this location). This will take a few minutes.EMBEDDINGS_ROOT
along with a manifest.csv
file.Once the session is complete, verify that:
./data/bach/ICIAR2018_BACH_Challenge
./data/embeddings/dino_vits16_random/bach
manifest.csv
file that maps the filename to the embedding, target and split has been created in ./data/embeddings/dino_vits16/bach
.Now we can use the fit
-command to evaluate the FM on the precomputed embeddings.
To ensure a quick run for the purpose of this exercise, we overwrite some of the default parameters. Run eva to fit the decoder classifier with:
N_RUNS=2 \\\nMAX_STEPS=20 \\\nLR_VALUE=0.1 \\\neva fit --config configs/vision/dino_vit/offline/bach.yaml\n
Executing this command will:
Once the session is complete:
logs/dino_vits16/offline/bach/<session-id>/results.json
. (The <session-id>
consists of a timestamp and a hash that is based on the run configuration.)tensorboard --logdir logs/dino_vits16/offline/bach\n
With the predict_fit
-command, the two steps above can be executed with one command. Let's do this, but this time let's use an FM pretrained from ImageNet.
Go back to the terminal and execute:
N_RUNS=1 \\\nMAX_STEPS=20 \\\nLR_VALUE=0.1 \\\nPRETRAINED=true \\\nEMBEDDINGS_ROOT=./data/embeddings/dino_vits16_pretrained \\\neva predict_fit --config configs/vision/dino_vit/offline/bach.yaml\n
Once the session is complete, inspect the evaluation results as you did in Step 2. Compare the performance metrics and training curves. Can you observe better performance with the ImageNet pretrained encoder?
"},{"location":"user-guide/tutorials/offline_vs_online/#online-evaluations","title":"Online evaluations","text":"Alternatively to the offline workflow from Step 3, a complete evaluation can also be computed online. In this case we don't save and track embeddings and instead fit the ML model (encoder with frozen layers + trainable decoder) directly on the given task.
As in Step 3 above, we again use a dino_vits16
pretrained from ImageNet.
Run a complete online workflow with the following command:
N_RUNS=1 \\\nMAX_STEPS=20 \\\nLR_VALUE=0.1 \\\nPRETRAINED=true \\\neva fit --config configs/vision/dino_vit/online/bach.yaml\n
Executing this command will:
Once the run is complete:
logs/dino_vits16/offline/bach/<session-id>/results.json
and compare them to the results of Step 3. Do they match?Oncology FM Evaluation Framework by kaiko.ai
With the first release, eva supports performance evaluation for vision Foundation Models (\"FMs\") and supervised machine learning models on WSI-patch-level image classification task. Support for radiology (CT-scans) segmentation tasks will be added soon.
With eva we provide the open-source community with an easy-to-use framework that follows industry best practices to deliver a robust, reproducible and fair evaluation benchmark across FMs of different sizes and architectures.
Support for additional modalities and tasks will be added in future releases.
"},{"location":"#use-cases","title":"Use cases","text":""},{"location":"#1-evaluate-your-own-fms-on-public-benchmark-datasets","title":"1. Evaluate your own FMs on public benchmark datasets","text":"With a specified FM as input, you can run eva on several publicly available datasets & tasks. One evaluation run will download and preprocess the relevant data, compute embeddings, fit and evaluate a downstream head and report the mean and standard deviation of the relevant performance metrics.
Supported datasets & tasks include:
WSI patch-level pathology datasets
Radiology datasets
To evaluate FMs, eva provides support for different model-formats, including models trained with PyTorch, models available on HuggingFace and ONNX-models. For other formats custom wrappers can be implemented.
"},{"location":"#2-evaluate-ml-models-on-your-own-dataset-task","title":"2. Evaluate ML models on your own dataset & task","text":"If you have your own labeled dataset, all that is needed is to implement a dataset class tailored to your source data. Start from one of our out-of-the box provided dataset classes, adapt it to your data and run eva to see how different FMs perform on your task.
"},{"location":"#evaluation-results","title":"Evaluation results","text":"We evaluated the following FMs on the 4 supported WSI-patch-level image classification tasks. On the table below we report Balanced Accuracy for binary & multiclass tasks and show the average performance & standard deviation over 5 runs.
FM-backbone pretraining BACH CRC MHIST PCam/val PCam/test DINO ViT-S16 N/A 0.410 (\u00b10.009) 0.617 (\u00b10.008) 0.501 (\u00b10.004) 0.753 (\u00b10.002) 0.728 (\u00b10.003) DINO ViT-S16 ImageNet 0.695 (\u00b10.004) 0.935 (\u00b10.003) 0.831 (\u00b10.002) 0.864 (\u00b10.007) 0.849 (\u00b10.007) DINO ViT-B8 ImageNet 0.710 (\u00b10.007) 0.939 (\u00b10.001) 0.814 (\u00b10.003) 0.870 (\u00b10.003) 0.856 (\u00b10.004) DINOv2 ViT-L14 ImageNet 0.707 (\u00b10.008) 0.916 (\u00b10.002) 0.832 (\u00b10.003) 0.873 (\u00b10.001) 0.888 (\u00b10.001) Lunit - ViT-S16 TCGA 0.801 (\u00b10.005) 0.934 (\u00b10.001) 0.768 (\u00b10.004) 0.889 (\u00b10.002) 0.895 (\u00b10.006) Owkin - iBOT ViT-B16 TCGA 0.725 (\u00b10.004) 0.935 (\u00b10.001) 0.777 (\u00b10.005) 0.912 (\u00b10.002) 0.915 (\u00b10.003) UNI - DINOv2 ViT-L16 Mass-100k 0.814 (\u00b10.008) 0.950 (\u00b10.001) 0.837 (\u00b10.001) 0.936 (\u00b10.001) 0.938 (\u00b10.001) kaiko.ai - DINO ViT-S16 TCGA 0.797 (\u00b10.003) 0.943 (\u00b10.001) 0.828 (\u00b10.003) 0.903 (\u00b10.001) 0.893 (\u00b10.005) kaiko.ai - DINO ViT-S8 TCGA 0.834 (\u00b10.012) 0.946 (\u00b10.002) 0.832 (\u00b10.006) 0.897 (\u00b10.001) 0.887 (\u00b10.002) kaiko.ai - DINO ViT-B16 TCGA 0.810 (\u00b10.008) 0.960 (\u00b10.001) 0.826 (\u00b10.003) 0.900 (\u00b10.002) 0.898 (\u00b10.003) kaiko.ai - DINO ViT-B8 TCGA 0.865 (\u00b10.019) 0.956 (\u00b10.001) 0.809 (\u00b10.021) 0.913 (\u00b10.001) 0.921 (\u00b10.002) kaiko.ai - DINOv2 ViT-L14 TCGA 0.870 (\u00b10.005) 0.930 (\u00b10.001) 0.809 (\u00b10.001) 0.908 (\u00b10.001) 0.898 (\u00b10.002)
The runs use the default setup described in the section below.
eva trains the decoder on the \"train\" split and uses the \"validation\" split for monitoring, early stopping and checkpoint selection. Evaluation results are reported on the \"validation\" split and, if available, on the \"test\" split.
For more details on the FM-backbones and instructions to replicate the results, check out Replicate evaluations.
"},{"location":"#evaluation-setup","title":"Evaluation setup","text":"Note that the current version of eva implements the task- & model-independent and fixed default set up following the standard evaluation protocol proposed by [1] and described in the table below. We selected this approach to prioritize reliable, robust and fair FM-evaluation while being in line with common literature. Additionally, with future versions we are planning to allow the use of cross-validation and hyper-parameter tuning to find the optimal setup to achieve best possible performance on the implemented downstream tasks.
With a provided FM, eva computes embeddings for all input images (WSI patches) which are then used to train a downstream head consisting of a single linear layer in a supervised setup for each of the benchmark datasets. We use early stopping with a patience of 5% of the maximal number of epochs.
Backbone frozen Hidden layers none Dropout 0.0 Activation function none Number of steps 12,500 Base Batch size 4,096 Batch size dataset specific* Base learning rate 0.01 Learning Rate [Base learning rate] * [Batch size] / [Base batch size] Max epochs [Number of samples] * [Number of steps] / [Batch size] Early stopping 5% * [Max epochs] Optimizer SGD Momentum 0.9 Weight Decay 0.0 Nesterov momentum true LR Schedule Cosine without warmup* For smaller datasets (e.g. BACH with 400 samples) we reduce the batch size to 256 and scale the learning rate accordingly.
eva is distributed under the terms of the Apache-2.0 license.
"},{"location":"#next-steps","title":"Next steps","text":"Check out the User Guide to get started with eva
"},{"location":"CODE_OF_CONDUCT/","title":"Contributor Covenant Code of Conduct","text":""},{"location":"CODE_OF_CONDUCT/#our-pledge","title":"Our Pledge","text":"In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
"},{"location":"CODE_OF_CONDUCT/#our-standards","title":"Our Standards","text":"Examples of behavior that contributes to creating a positive environment include:
Examples of unacceptable behavior by participants include:
Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
"},{"location":"CODE_OF_CONDUCT/#scope","title":"Scope","text":"This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
"},{"location":"CODE_OF_CONDUCT/#enforcement","title":"Enforcement","text":"Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at eva@kaiko.ai. All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
"},{"location":"CODE_OF_CONDUCT/#attribution","title":"Attribution","text":"This Code of Conduct is adapted from the Contributor Covenant, version 1.4, available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
For answers to common questions about this code of conduct, see https://www.contributor-covenant.org/faq
"},{"location":"CONTRIBUTING/","title":"Contributing to eva","text":"eva is open source and community contributions are welcome!
"},{"location":"CONTRIBUTING/#contribution-process","title":"Contribution Process","text":""},{"location":"CONTRIBUTING/#github-issues","title":"GitHub Issues","text":"The eva contribution process generally starts with filing a GitHub issue.
eva defines four categories of issues: feature requests, bug reports, documentation fixes, and installation issues. In general, we recommend waiting for feedback from a eva maintainer or community member before proceeding to implement a feature or patch.
"},{"location":"CONTRIBUTING/#pull-requests","title":"Pull Requests","text":"After you have agreed upon an implementation strategy for your feature or patch with an eva maintainer, the next step is to introduce your changes as a pull request against the eva repository.
Steps to make a pull request:
main
branchmain
branch of https://github.com/kaiko-ai/evaOnce your pull request has been merged, your changes will be automatically included in the next eva release!
"},{"location":"DEVELOPER_GUIDE/","title":"Developer Guide","text":""},{"location":"DEVELOPER_GUIDE/#setting-up-a-dev-environment","title":"Setting up a DEV environment","text":"We use PDM as a package and dependency manager. You can set up a local python environment for development as follows: 1. Install package and dependency manager PDM following the instructions here. 2. Install system dependencies - For MacOS: brew install Cmake
- For Linux (Debian): sudo apt-get install build-essential cmake
3. Run pdm install -G dev
to install the python dependencies. This will create a virtual environment in eva/.venv
.
Add a new dependency to the core
submodule: pdm add <package_name>
Add a new dependency to the vision
submodule: pdm add -G vision -G all <package_name>
For more information about managing dependencies please look here.
"},{"location":"DEVELOPER_GUIDE/#continuous-integration-ci","title":"Continuous Integration (CI)","text":"For testing automation, we use nox
.
Installation: - with brew: brew install nox
- with pip: pip install --user --upgrade nox
(this way, you might need to run nox commands with python -m nox
or specify an alias)
Commands: - nox
to run all the automation tests. - nox -s fmt
to run the code formatting tests. - nox -s lint
to run the code lining tests. - nox -s check
to run the type-annotation tests. - nox -s test
to run the unit tests. - nox -s test -- tests/eva/metrics/test_average_loss.py
to run specific tests
This document contains our style guides used in eva
.
Our priority is consistency, so that developers can quickly ingest and understand the entire codebase without being distracted by style idiosyncrasies.
"},{"location":"STYLE_GUIDE/#general-coding-principles","title":"General coding principles","text":"Q: How to keep code readable and maintainable? - Don't Repeat Yourself (DRY) - Use the lowest possible visibility for a variable or method (i.e. make private if possible) -- see Information Hiding / Encapsulation
Q: How big should a function be? - Single Level of Abstraction Principle (SLAP) - High Cohesion and Low Coupling
TL;DR: functions should usually be quite small, and _do one thing_\n
"},{"location":"STYLE_GUIDE/#python-style-guide","title":"Python Style Guide","text":"In general we follow the following regulations: PEP8, the Google Python Style Guide and we expect type hints/annotations.
"},{"location":"STYLE_GUIDE/#docstrings","title":"Docstrings","text":"Our docstring style is derived from Google Python style.
def example_function(variable: int, optional: str | None = None) -> str:\n \"\"\"An example docstring that explains what this functions do.\n\n Docs sections can be referenced via :ref:`custom text here <anchor-link>`.\n\n Classes can be referenced via :class:`eva.data.datamodules.DataModule`.\n\n Functions can be referenced via :func:`eva.data.datamodules.call.call_method_if_exists`.\n\n Example:\n\n >>> from torch import nn\n >>> import eva\n >>> eva.models.modules.HeadModule(\n >>> head=nn.Linear(10, 2),\n >>> criterion=nn.CrossEntropyLoss(),\n >>> )\n\n Args:\n variable: A required argument.\n optional: An optional argument.\n\n Returns:\n A description of the output string.\n \"\"\"\n pass\n
"},{"location":"STYLE_GUIDE/#module-docstrings","title":"Module docstrings","text":"PEP-8 and PEP-257 indicate docstrings should have very specific syntax:
\"\"\"One line docstring that shouldn't wrap onto next line.\"\"\"\n
\"\"\"First line of multiline docstring that shouldn't wrap.\n\nSubsequent line or paragraphs.\n\"\"\"\n
"},{"location":"STYLE_GUIDE/#constants-docstrings","title":"Constants docstrings","text":"Public constants should usually have docstrings. Optional on private constants. Docstrings on constants go underneath
SOME_CONSTANT = 3\n\"\"\"Either a single-line docstring or multiline as per above.\"\"\"\n
"},{"location":"STYLE_GUIDE/#function-docstrings","title":"Function docstrings","text":"All public functions should have docstrings following the pattern shown below.
Each section can be omitted if there are no inputs, outputs, or no notable exceptions raised, respectively.
def fake_datamodule(\n n_samples: int, random: bool = True\n) -> eva.data.datamodules.DataModule:\n \"\"\"Generates a fake DataModule.\n\n It builds a :class:`eva.data.datamodules.DataModule` by generating\n a fake dataset with generated data while fixing the seed. It can\n be useful for debugging purposes.\n\n Args:\n n_samples: The number of samples of the generated datasets.\n random: Whether to generated randomly.\n\n Returns:\n A :class:`eva.data.datamodules.DataModule` with generated random data.\n\n Raises:\n ValueError: If `n_samples` is `0`.\n \"\"\"\n pass\n
"},{"location":"STYLE_GUIDE/#class-docstrings","title":"Class docstrings","text":"All public classes should have class docstrings following the pattern shown below.
class DataModule(pl.LightningDataModule):\n \"\"\"DataModule encapsulates all the steps needed to process data.\n\n It will initialize and create the mapping between dataloaders and\n datasets. During the `prepare_data`, `setup` and `teardown`, the\n datamodule will call the respectively methods from all the datasets,\n given that they are defined.\n \"\"\"\n\n def __init__(\n self,\n datasets: schemas.DatasetsSchema | None = None,\n dataloaders: schemas.DataloadersSchema | None = None,\n ) -> None:\n \"\"\"Initializes the datamodule.\n\n Args:\n datasets: The desired datasets. Defaults to `None`.\n dataloaders: The desired dataloaders. Defaults to `None`.\n \"\"\"\n pass\n
"},{"location":"datasets/","title":"Datasets","text":"eva provides native support for several public datasets. When possible, the corresponding dataset classes facilitate automatic download to disk, if not possible, this documentation provides download instructions.
"},{"location":"datasets/#vision-datasets-overview","title":"Vision Datasets Overview","text":""},{"location":"datasets/#whole-slide-wsi-and-microscopy-image-datasets","title":"Whole Slide (WSI) and microscopy image datasets","text":"Dataset #Patches Patch Size Magnification (\u03bcm/px) Task Cancer Type BACH 400 2048x1536 20x (0.5) Classification (4 classes) Breast CRC 107,180 224x224 20x (0.5) Classification (9 classes) Colorectal PatchCamelyon 327,680 96x96 10x (1.0) * Classification (2 classes) Breast MHIST 3,152 224x224 5x (2.0) * Classification (2 classes) Colorectal Polyp* Downsampled from 40x (0.25 \u03bcm/px) to increase the field of view.
"},{"location":"datasets/#radiology-datasets","title":"Radiology datasets","text":"Dataset #Images Image Size Task Download provided TotalSegmentator 1228 ~300 x ~300 x ~350 * Multilabel Classification (117 classes) Yes* 3D images of varying sizes
"},{"location":"datasets/bach/","title":"BACH","text":"The BACH dataset consists of microscopy and WSI images, of which we use only the microscopy images. These are 408 labeled images from 4 classes (\"Normal\", \"Benign\", \"Invasive\", \"InSitu\"). This dataset was used for the \"BACH Grand Challenge on Breast Cancer Histology images\".
"},{"location":"datasets/bach/#raw-data","title":"Raw data","text":""},{"location":"datasets/bach/#key-stats","title":"Key stats","text":"Modality Vision (microscopy images) Task Multiclass classification (4 classes) Cancer type Breast Data size total: 10.4GB / data in use: 7.37 GB (18.9 MB per image) Image dimension 1536 x 2048 x 3 Magnification (\u03bcm/px) 20x (0.42) Files format.tif
images Number of images 408 (102 from each class) Splits in use one labeled split"},{"location":"datasets/bach/#organization","title":"Organization","text":"The data ICIAR2018_BACH_Challenge.zip
from zenodo is organized as follows:
ICAR2018_BACH_Challenge\n\u251c\u2500\u2500 Photos # All labeled patches used by eva\n\u2502 \u251c\u2500\u2500 Normal\n\u2502 \u2502 \u251c\u2500\u2500 n032.tif\n\u2502 \u2502 \u2514\u2500\u2500 ...\n\u2502 \u251c\u2500\u2500 Benign\n\u2502 \u2502 \u2514\u2500\u2500 ...\n\u2502 \u251c\u2500\u2500 Invasive\n\u2502 \u2502 \u2514\u2500\u2500 ...\n\u2502 \u251c\u2500\u2500 InSitu\n\u2502 \u2502 \u2514\u2500\u2500 ...\n\u251c\u2500\u2500 WSI # WSIs, not in use\n\u2502 \u251c\u2500\u2500 ...\n\u2514\u2500\u2500 ...\n
"},{"location":"datasets/bach/#download-and-preprocessing","title":"Download and preprocessing","text":"The BACH
dataset class supports downloading the data during runtime by setting the init argument download=True
.
Note that in the provided BACH
-config files the download argument is set to false
. To enable automatic download you will need to open the config and set download: true
.
The splits are created from the indices specified in the BACH dataset class. These indices were picked to prevent data leakage due to images belonging to the same patient. Because the small dataset in combination with the patient ID constraint does not allow to split the data three-ways with sufficient amount of data in each split, we only create a train and val split and leave it to the user to submit predictions on the official test split to the BACH Challenge Leaderboard.
Splits Train Validation #Samples 268 (67%) 132 (33%)"},{"location":"datasets/bach/#relevant-links","title":"Relevant links","text":"Attribution-NonCommercial-ShareAlike 4.0 International
"},{"location":"datasets/crc/","title":"CRC","text":"The CRC-HE dataset consists of labeled patches (9 classes) from colorectal cancer (CRC) and normal tissue. We use the NCT-CRC-HE-100K
dataset for training and validation and the CRC-VAL-HE-7K for testing
.
The NCT-CRC-HE-100K-NONORM
consists of 100,000 images without applied color normalization. The CRC-VAL-HE-7K
consists of 7,180 image patches from 50 patients without overlap with NCT-CRC-HE-100K-NONORM
.
The tissue classes are: Adipose (ADI), background (BACK), debris (DEB), lymphocytes (LYM), mucus (MUC), smooth muscle (MUS), normal colon mucosa (NORM), cancer-associated stroma (STR) and colorectal adenocarcinoma epithelium (TUM)
"},{"location":"datasets/crc/#raw-data","title":"Raw data","text":""},{"location":"datasets/crc/#key-stats","title":"Key stats","text":"Modality Vision (WSI patches) Task Multiclass classification (9 classes) Cancer type Colorectal Data size total: 11.7GB (train), 800MB (val) Image dimension 224 x 224 x 3 Magnification (\u03bcm/px) 20x (0.5) Files format.tif
images Number of images 107,180 (100k train, 7.2k val) Splits in use NCT-CRC-HE-100K (train), CRC-VAL-HE-7K (val)"},{"location":"datasets/crc/#splits","title":"Splits","text":"We use the splits according to the data sources:
NCT-CRC-HE-100K
CRC-VAL-HE-7K
A test split is not provided. Because the patient information for the training data is not available, dividing the training data in a train/val split (and using the given val split as test split) is not possible without risking data leakage. eva therefore reports evaluation results for CRC HE on the validation split.
"},{"location":"datasets/crc/#organization","title":"Organization","text":"The data NCT-CRC-HE-100K.zip
, NCT-CRC-HE-100K-NONORM.zip
and CRC-VAL-HE-7K.zip
from zenodo are organized as follows:
NCT-CRC-HE-100K # All images used for training\n\u251c\u2500\u2500 ADI # All labeled patches belonging to the 1st class\n\u2502 \u251c\u2500\u2500 ADI-AAAFLCLY.tif\n\u2502 \u251c\u2500\u2500 ...\n\u251c\u2500\u2500 BACK # All labeled patches belonging to the 2nd class\n\u2502 \u251c\u2500\u2500 ...\n\u2514\u2500\u2500 ...\n\nNCT-CRC-HE-100K-NONORM # All images used for training\n\u251c\u2500\u2500 ADI # All labeled patches belonging to the 1st class\n\u2502 \u251c\u2500\u2500 ADI-AAAFLCLY.tif\n\u2502 \u251c\u2500\u2500 ...\n\u251c\u2500\u2500 BACK # All labeled patches belonging to the 2nd class\n\u2502 \u251c\u2500\u2500 ...\n\u2514\u2500\u2500 ...\n\nCRC-VAL-HE-7K # All images used for validation\n\u251c\u2500\u2500 ... # identical structure as for NCT-CRC-HE-100K-NONORM\n\u2514\u2500\u2500 ...\n
"},{"location":"datasets/crc/#download-and-preprocessing","title":"Download and preprocessing","text":"The CRC
dataset class supports downloading the data during runtime by setting the init argument download=True
.
Note that in the provided CRC
-config files the download argument is set to false
. To enable automatic download you will need to open the config and set download: true
.
CC BY 4.0 LEGAL CODE
"},{"location":"datasets/mhist/","title":"MHIST","text":"MHIST is a binary classification task which comprises of 3,152 hematoxylin and eosin (H&E)-stained Formalin Fixed Paraffin-Embedded (FFPE) fixed-size images (224 by 224 pixels) of colorectal polyps from the Department of Pathology and Laboratory Medicine at Dartmouth-Hitchcock Medical Center (DHMC).
The tissue classes are: Hyperplastic Polyp (HP), Sessile Serrated Adenoma (SSA). This classification task focuses on the clinically-important binary distinction between HPs and SSAs, a challenging problem with considerable inter-pathologist variability. HPs are typically benign, while sessile serrated adenomas are precancerous lesions that can turn into cancer if left untreated and require sooner follow-up examinations. Histologically, HPs have a superficial serrated architecture and elongated crypts, whereas SSAs are characterized by broad-based crypts, often with complex structure and heavy serration.
"},{"location":"datasets/mhist/#raw-data","title":"Raw data","text":""},{"location":"datasets/mhist/#key-stats","title":"Key stats","text":"Modality Vision (WSI patches) Task Binary classification (2 classes) Cancer type Colorectal Polyp Data size 354 MB Image dimension 224 x 224 x 3 Magnification (\u03bcm/px) 5x (2.0) * Files format.png
images Number of images 3,152 (2,175 train, 977 test) Splits in use annotations.csv (train / test) * Downsampled from 40x to increase the field of view.
"},{"location":"datasets/mhist/#organization","title":"Organization","text":"The contents from images.zip
and the file annotations.csv
from bmirds are organized as follows:
mhist # Root folder\n\u251c\u2500\u2500 images # All the dataset images\n\u2502 \u251c\u2500\u2500 MHIST_aaa.png\n\u2502 \u251c\u2500\u2500 MHIST_aab.png\n\u2502 \u251c\u2500\u2500 ...\n\u2514\u2500\u2500 annotations.csv # The dataset annotations file\n
"},{"location":"datasets/mhist/#download-and-preprocessing","title":"Download and preprocessing","text":"To download the dataset, please visit the access portal on BMIRDS and follow the instructions. You will then receive an email with all the relative links that you can use to download the data (images.zip
, annotations.csv
, Dataset Research Use Agreement.pdf
and MD5SUMs.txt
).
Please create a root folder, e.g. mhist
, and download all the files there, which unzipping the contents of images.zip
to a directory named images
inside your root folder (i.e. mhist/images
). Afterwards, you can (optionally) delete the images.zip
file.
We work with the splits provided by the data source. Since no \"validation\" split is provided, we use the \"test\" split as validation split.
annotations.csv
:: \"Partition\" == \"train\"annotations.csv
:: \"Partition\" == \"test\"The PatchCamelyon benchmark is an image classification dataset with 327,680 color images (96 x 96px) extracted from histopathologic scans of lymph node sections. Each image is annotated with a binary label indicating presence of metastatic tissue.
"},{"location":"datasets/patch_camelyon/#raw-data","title":"Raw data","text":""},{"location":"datasets/patch_camelyon/#key-stats","title":"Key stats","text":"Modality Vision (WSI patches) Task Binary classification Cancer type Breast Data size 8 GB Image dimension 96 x 96 x 3 Magnification (\u03bcm/px) 10x (1.0) * Files formath5
Number of images 327,680 (50% of each class) * The slides were acquired and digitized at 2 different medical centers using a 40x objective but under-sampled to 10x to increase the field of view.
"},{"location":"datasets/patch_camelyon/#splits","title":"Splits","text":"The data source provides train/validation/test splits
Splits Train Validation Test #Samples 262,144 (80%) 32,768 (10%) 32,768 (10%)"},{"location":"datasets/patch_camelyon/#organization","title":"Organization","text":"The PatchCamelyon data from zenodo is organized as follows:
\u251c\u2500\u2500 camelyonpatch_level_2_split_train_x.h5.gz # train images\n\u251c\u2500\u2500 camelyonpatch_level_2_split_train_y.h5.gz # train labels\n\u251c\u2500\u2500 camelyonpatch_level_2_split_valid_x.h5.gz # val images\n\u251c\u2500\u2500 camelyonpatch_level_2_split_valid_y.h5.gz # val labels\n\u251c\u2500\u2500 camelyonpatch_level_2_split_test_x.h5.gz # test images\n\u251c\u2500\u2500 camelyonpatch_level_2_split_test_y.h5.gz # test labels\n
"},{"location":"datasets/patch_camelyon/#download-and-preprocessing","title":"Download and preprocessing","text":"The dataset class PatchCamelyon
supports downloading the data during runtime by setting the init argument download=True
.
Note that in the provided PatchCamelyon
-config files the download argument is set to false
. To enable automatic download you will need to open the config and set download: true
.
Labels are provided by source files, splits are given by file names.
"},{"location":"datasets/patch_camelyon/#relevant-links","title":"Relevant links","text":"@misc{b_s_veeling_j_linmans_j_winkens_t_cohen_2018_2546921,\n author = {B. S. Veeling, J. Linmans, J. Winkens, T. Cohen, M. Welling},\n title = {Rotation Equivariant CNNs for Digital Pathology},\n month = sep,\n year = 2018,\n doi = {10.1007/978-3-030-00934-2_24},\n url = {https://doi.org/10.1007/978-3-030-00934-2_24}\n}\n
"},{"location":"datasets/patch_camelyon/#license","title":"License","text":"Creative Commons Zero v1.0 Universal
"},{"location":"datasets/total_segmentator/","title":"TotalSegmentator","text":"The TotalSegmentator dataset is a radiology image-segmentation dataset with 1228 3D images and corresponding masks with 117 different anatomical structures. It can be used for segmentation and multilabel classification tasks.
"},{"location":"datasets/total_segmentator/#raw-data","title":"Raw data","text":""},{"location":"datasets/total_segmentator/#key-stats","title":"Key stats","text":"Modality Vision (radiology, CT scans) Task Segmentation / multilabel classification (117 classes) Data size total: 23.6GB Image dimension ~300 x ~300 x ~350 (number of slices) x 1 (grey scale) * Files format.nii
(\"NIFTI\") images Number of images 1228 Splits in use one labeled split /* image resolution and number of slices per image vary
"},{"location":"datasets/total_segmentator/#organization","title":"Organization","text":"The data Totalsegmentator_dataset_v201.zip
from zenodo is organized as follows:
Totalsegmentator_dataset_v201\n\u251c\u2500\u2500 s0011 # one image\n\u2502 \u251c\u2500\u2500 ct.nii.gz # CT scan\n\u2502 \u251c\u2500\u2500 segmentations # directory with segmentation masks\n\u2502 \u2502 \u251c\u2500\u2500 adrenal_gland_left.nii.gz # segmentation mask 1st anatomical structure\n\u2502 \u2502 \u251c\u2500\u2500 adrenal_gland_right.nii.gz # segmentation mask 2nd anatomical structure\n\u2502 \u2502 \u2514\u2500\u2500 ...\n\u2514\u2500\u2500 ...\n
"},{"location":"datasets/total_segmentator/#download-and-preprocessing","title":"Download and preprocessing","text":"TotalSegmentator
supports download the data on runtime with the initialized argument download: bool = True
. TotalSegmentator
class creates a manifest file with one row/slice and the columns: path
, slice
, split
and additional 117 columns for each class.Creative Commons Attribution 4.0 International
"},{"location":"reference/","title":"Reference API","text":"Here is the Reference API, describing the classes, functions, parameters and attributes of the eva package.
To learn how to use eva, however, its best to get started with the User Guide
"},{"location":"reference/core/callbacks/","title":"Callbacks","text":""},{"location":"reference/core/callbacks/#writers","title":"Writers","text":""},{"location":"reference/core/callbacks/#eva.core.callbacks.writers.EmbeddingsWriter","title":"eva.core.callbacks.writers.EmbeddingsWriter
","text":" Bases: BasePredictionWriter
Callback for writing generated embeddings to disk.
This callback writes the embedding files in a separate process to avoid blocking the main process where the model forward pass is executed.
Parameters:
Name Type Description Defaultoutput_dir
str
The directory where the embeddings will be saved.
requiredbackbone
Module | None
A model to be used as feature extractor. If None
, it will be expected that the input batch returns the features directly.
None
dataloader_idx_map
Dict[int, str] | None
A dictionary mapping dataloader indices to their respective names (e.g. train, val, test).
None
group_key
str | None
The metadata key to group the embeddings by. If specified, the embedding files will be saved in subdirectories named after the group_key. If specified, the key must be present in the metadata of the input batch.
None
overwrite
bool
Whether to overwrite the output directory. Defaults to True.
True
Source code in src/eva/core/callbacks/writers/embeddings.py
def __init__(\n self,\n output_dir: str,\n backbone: nn.Module | None = None,\n dataloader_idx_map: Dict[int, str] | None = None,\n group_key: str | None = None,\n overwrite: bool = True,\n) -> None:\n \"\"\"Initializes a new EmbeddingsWriter instance.\n\n This callback writes the embedding files in a separate process to avoid blocking the\n main process where the model forward pass is executed.\n\n Args:\n output_dir: The directory where the embeddings will be saved.\n backbone: A model to be used as feature extractor. If `None`,\n it will be expected that the input batch returns the features directly.\n dataloader_idx_map: A dictionary mapping dataloader indices to their respective\n names (e.g. train, val, test).\n group_key: The metadata key to group the embeddings by. If specified, the\n embedding files will be saved in subdirectories named after the group_key.\n If specified, the key must be present in the metadata of the input batch.\n overwrite: Whether to overwrite the output directory. Defaults to True.\n \"\"\"\n super().__init__(write_interval=\"batch\")\n\n self._output_dir = output_dir\n self._backbone = backbone\n self._dataloader_idx_map = dataloader_idx_map or {}\n self._group_key = group_key\n self._overwrite = overwrite\n\n self._write_queue: multiprocessing.Queue\n self._write_process: eva_multiprocessing.Process\n
"},{"location":"reference/core/interface/","title":"Interface API","text":"Reference information for the Interface
API.
eva.Interface
","text":"A high-level interface for training and validating a machine learning model.
This class provides a convenient interface to connect a model, data, and trainer to train and validate a model.
"},{"location":"reference/core/interface/#eva.Interface.fit","title":"fit
","text":"Perform model training and evaluation out-of-place.
This method uses the specified trainer to fit the model using the provided data.
Example use cases:
Parameters:
Name Type Description Defaulttrainer
Trainer
The base trainer to use but not modify.
requiredmodel
ModelModule
The model module to use but not modify.
requireddata
DataModule
The data module.
required Source code insrc/eva/core/interface/interface.py
def fit(\n self,\n trainer: eva_trainer.Trainer,\n model: modules.ModelModule,\n data: datamodules.DataModule,\n) -> None:\n \"\"\"Perform model training and evaluation out-of-place.\n\n This method uses the specified trainer to fit the model using the provided data.\n\n Example use cases:\n\n - Using a model consisting of a frozen backbone and a head, the backbone will generate\n the embeddings on the fly which are then used as input features to train the head on\n the downstream task specified by the given dataset.\n - Fitting only the head network using a dataset that loads pre-computed embeddings.\n\n Args:\n trainer: The base trainer to use but not modify.\n model: The model module to use but not modify.\n data: The data module.\n \"\"\"\n trainer.run_evaluation_session(model=model, datamodule=data)\n
"},{"location":"reference/core/interface/#eva.Interface.predict","title":"predict
","text":"Perform model prediction out-of-place.
This method performs inference with a pre-trained foundation model to compute embeddings.
Parameters:
Name Type Description Defaulttrainer
Trainer
The base trainer to use but not modify.
requiredmodel
ModelModule
The model module to use but not modify.
requireddata
DataModule
The data module.
required Source code insrc/eva/core/interface/interface.py
def predict(\n self,\n trainer: eva_trainer.Trainer,\n model: modules.ModelModule,\n data: datamodules.DataModule,\n) -> None:\n \"\"\"Perform model prediction out-of-place.\n\n This method performs inference with a pre-trained foundation model to compute embeddings.\n\n Args:\n trainer: The base trainer to use but not modify.\n model: The model module to use but not modify.\n data: The data module.\n \"\"\"\n eva_trainer.infer_model(\n base_trainer=trainer,\n base_model=model,\n datamodule=data,\n return_predictions=False,\n )\n
"},{"location":"reference/core/interface/#eva.Interface.predict_fit","title":"predict_fit
","text":"Combines the predict and fit commands in one method.
This method performs the following two steps: 1. predict: perform inference with a pre-trained foundation model to compute embeddings. 2. fit: training the head network using the embeddings generated in step 1.
Parameters:
Name Type Description Defaulttrainer
Trainer
The base trainer to use but not modify.
requiredmodel
ModelModule
The model module to use but not modify.
requireddata
DataModule
The data module.
required Source code insrc/eva/core/interface/interface.py
def predict_fit(\n self,\n trainer: eva_trainer.Trainer,\n model: modules.ModelModule,\n data: datamodules.DataModule,\n) -> None:\n \"\"\"Combines the predict and fit commands in one method.\n\n This method performs the following two steps:\n 1. predict: perform inference with a pre-trained foundation model to compute embeddings.\n 2. fit: training the head network using the embeddings generated in step 1.\n\n Args:\n trainer: The base trainer to use but not modify.\n model: The model module to use but not modify.\n data: The data module.\n \"\"\"\n self.predict(trainer=trainer, model=model, data=data)\n self.fit(trainer=trainer, model=model, data=data)\n
"},{"location":"reference/core/data/dataloaders/","title":"Dataloaders","text":"Reference information for the Dataloader
classes.
eva.data.DataLoader
dataclass
","text":"The DataLoader
combines a dataset and a sampler.
It provides an iterable over the given dataset.
"},{"location":"reference/core/data/dataloaders/#eva.data.DataLoader.batch_size","title":"batch_size: int | None = 1
class-attribute
instance-attribute
","text":"How many samples per batch to load.
Set to None
for iterable dataset where dataset produces batches.
shuffle: bool = False
class-attribute
instance-attribute
","text":"Whether to shuffle the data at every epoch.
"},{"location":"reference/core/data/dataloaders/#eva.data.DataLoader.sampler","title":"sampler: samplers.Sampler | None = None
class-attribute
instance-attribute
","text":"Defines the strategy to draw samples from the dataset.
Can be any Iterable with __len__
implemented. If specified, shuffle must not be specified.
batch_sampler: samplers.Sampler | None = None
class-attribute
instance-attribute
","text":"Like sampler
, but returns a batch of indices at a time.
Mutually exclusive with batch_size
, shuffle
, sampler
and drop_last
.
num_workers: int = multiprocessing.cpu_count()
class-attribute
instance-attribute
","text":"How many workers to use for loading the data.
By default, it will use the number of CPUs available.
"},{"location":"reference/core/data/dataloaders/#eva.data.DataLoader.collate_fn","title":"collate_fn: Callable | None = None
class-attribute
instance-attribute
","text":"The batching process.
"},{"location":"reference/core/data/dataloaders/#eva.data.DataLoader.pin_memory","title":"pin_memory: bool = True
class-attribute
instance-attribute
","text":"Will copy Tensors into CUDA pinned memory before returning them.
"},{"location":"reference/core/data/dataloaders/#eva.data.DataLoader.drop_last","title":"drop_last: bool = False
class-attribute
instance-attribute
","text":"Drops the last incomplete batch.
"},{"location":"reference/core/data/dataloaders/#eva.data.DataLoader.persistent_workers","title":"persistent_workers: bool = True
class-attribute
instance-attribute
","text":"Will keep the worker processes after a dataset has been consumed once.
"},{"location":"reference/core/data/dataloaders/#eva.data.DataLoader.prefetch_factor","title":"prefetch_factor: int | None = 2
class-attribute
instance-attribute
","text":"Number of batches loaded in advance by each worker.
"},{"location":"reference/core/data/datamodules/","title":"Datamodules","text":"Reference information for the Datamodule
classes and functions.
eva.data.DataModule
","text":" Bases: LightningDataModule
DataModule encapsulates all the steps needed to process data.
It will initialize and create the mapping between dataloaders and datasets. During the prepare_data
, setup
and teardown
, the datamodule will call the respective methods from all datasets, given that they are defined.
Parameters:
Name Type Description Defaultdatasets
DatasetsSchema | None
The desired datasets.
None
dataloaders
DataloadersSchema | None
The desired dataloaders.
None
Source code in src/eva/core/data/datamodules/datamodule.py
def __init__(\n self,\n datasets: schemas.DatasetsSchema | None = None,\n dataloaders: schemas.DataloadersSchema | None = None,\n) -> None:\n \"\"\"Initializes the datamodule.\n\n Args:\n datasets: The desired datasets.\n dataloaders: The desired dataloaders.\n \"\"\"\n super().__init__()\n\n self.datasets = datasets or self.default_datasets\n self.dataloaders = dataloaders or self.default_dataloaders\n
"},{"location":"reference/core/data/datamodules/#eva.data.DataModule.default_datasets","title":"default_datasets: schemas.DatasetsSchema
property
","text":"Returns the default datasets.
"},{"location":"reference/core/data/datamodules/#eva.data.DataModule.default_dataloaders","title":"default_dataloaders: schemas.DataloadersSchema
property
","text":"Returns the default dataloader schema.
"},{"location":"reference/core/data/datamodules/#eva.data.datamodules.call.call_method_if_exists","title":"eva.data.datamodules.call.call_method_if_exists
","text":"Calls a desired method
from the datasets if exists.
Parameters:
Name Type Description Defaultobjects
Iterable[Any]
An iterable of objects.
requiredmethod
str
The dataset method name to call if exists.
required Source code insrc/eva/core/data/datamodules/call.py
def call_method_if_exists(objects: Iterable[Any], /, method: str) -> None:\n \"\"\"Calls a desired `method` from the datasets if exists.\n\n Args:\n objects: An iterable of objects.\n method: The dataset method name to call if exists.\n \"\"\"\n for _object in _recursive_iter(objects):\n if hasattr(_object, method):\n fn = getattr(_object, method)\n fn()\n
"},{"location":"reference/core/data/datamodules/#eva.data.datamodules.schemas.DatasetsSchema","title":"eva.data.datamodules.schemas.DatasetsSchema
dataclass
","text":"Datasets schema used in DataModule.
"},{"location":"reference/core/data/datamodules/#eva.data.datamodules.schemas.DatasetsSchema.train","title":"train: TRAIN_DATASET = None
class-attribute
instance-attribute
","text":"Train dataset.
"},{"location":"reference/core/data/datamodules/#eva.data.datamodules.schemas.DatasetsSchema.val","title":"val: EVAL_DATASET = None
class-attribute
instance-attribute
","text":"Validation dataset.
"},{"location":"reference/core/data/datamodules/#eva.data.datamodules.schemas.DatasetsSchema.test","title":"test: EVAL_DATASET = None
class-attribute
instance-attribute
","text":"Test dataset.
"},{"location":"reference/core/data/datamodules/#eva.data.datamodules.schemas.DatasetsSchema.predict","title":"predict: EVAL_DATASET = None
class-attribute
instance-attribute
","text":"Predict dataset.
"},{"location":"reference/core/data/datamodules/#eva.data.datamodules.schemas.DatasetsSchema.tolist","title":"tolist
","text":"Returns the dataclass as a list and optionally filters it given the stage.
Source code insrc/eva/core/data/datamodules/schemas.py
def tolist(self, stage: str | None = None) -> List[EVAL_DATASET]:\n \"\"\"Returns the dataclass as a list and optionally filters it given the stage.\"\"\"\n match stage:\n case \"fit\":\n return [self.train, self.val]\n case \"validate\":\n return [self.val]\n case \"test\":\n return [self.test]\n case \"predict\":\n return [self.predict]\n case None:\n return [self.train, self.val, self.test, self.predict]\n case _:\n raise ValueError(f\"Invalid stage `{stage}`.\")\n
"},{"location":"reference/core/data/datasets/","title":"Datasets","text":"Reference information for the Dataset
base class.
eva.core.data.Dataset
","text":" Bases: TorchDataset
Base dataset class.
"},{"location":"reference/core/data/datasets/#eva.core.data.Dataset.prepare_data","title":"prepare_data
","text":"Encapsulates all disk related tasks.
This method is preferred for downloading and preparing the data, for example generate manifest files. If implemented, it will be called via :class:eva.core.data.datamodules.DataModule
, which ensures that is called only within a single process, making it multi-processes safe.
src/eva/core/data/datasets/base.py
def prepare_data(self) -> None:\n \"\"\"Encapsulates all disk related tasks.\n\n This method is preferred for downloading and preparing the data, for\n example generate manifest files. If implemented, it will be called via\n :class:`eva.core.data.datamodules.DataModule`, which ensures that is called\n only within a single process, making it multi-processes safe.\n \"\"\"\n
"},{"location":"reference/core/data/datasets/#eva.core.data.Dataset.setup","title":"setup
","text":"Setups the dataset.
This method is preferred for creating datasets or performing train/val/test splits. If implemented, it will be called via :class:eva.core.data.datamodules.DataModule
at the beginning of fit (train + validate), validate, test, or predict and it will be called from every process (i.e. GPU) across all the nodes in DDP.
src/eva/core/data/datasets/base.py
def setup(self) -> None:\n \"\"\"Setups the dataset.\n\n This method is preferred for creating datasets or performing\n train/val/test splits. If implemented, it will be called via\n :class:`eva.core.data.datamodules.DataModule` at the beginning of fit\n (train + validate), validate, test, or predict and it will be called\n from every process (i.e. GPU) across all the nodes in DDP.\n \"\"\"\n self.configure()\n self.validate()\n
"},{"location":"reference/core/data/datasets/#eva.core.data.Dataset.configure","title":"configure
","text":"Configures the dataset.
This method is preferred to configure the dataset; assign values to attributes, perform splits etc. This would be called from the method ::method::setup
, before calling the ::method::validate
.
src/eva/core/data/datasets/base.py
def configure(self):\n \"\"\"Configures the dataset.\n\n This method is preferred to configure the dataset; assign values\n to attributes, perform splits etc. This would be called from the\n method ::method::`setup`, before calling the ::method::`validate`.\n \"\"\"\n
"},{"location":"reference/core/data/datasets/#eva.core.data.Dataset.validate","title":"validate
","text":"Validates the dataset.
This method aims to check the integrity of the dataset and verify that is configured properly. This would be called from the method ::method::setup
, after calling the ::method::configure
.
src/eva/core/data/datasets/base.py
def validate(self):\n \"\"\"Validates the dataset.\n\n This method aims to check the integrity of the dataset and verify\n that is configured properly. This would be called from the method\n ::method::`setup`, after calling the ::method::`configure`.\n \"\"\"\n
"},{"location":"reference/core/data/datasets/#eva.core.data.Dataset.teardown","title":"teardown
","text":"Cleans up the data artifacts.
Used to clean-up when the run is finished. If implemented, it will be called via :class:eva.core.data.datamodules.DataModule
at the end of fit (train + validate), validate, test, or predict and it will be called from every process (i.e. GPU) across all the nodes in DDP.
src/eva/core/data/datasets/base.py
def teardown(self) -> None:\n \"\"\"Cleans up the data artifacts.\n\n Used to clean-up when the run is finished. If implemented, it will\n be called via :class:`eva.core.data.datamodules.DataModule` at the end\n of fit (train + validate), validate, test, or predict and it will be\n called from every process (i.e. GPU) across all the nodes in DDP.\n \"\"\"\n
"},{"location":"reference/core/data/datasets/#embeddings-datasets","title":"Embeddings datasets","text":""},{"location":"reference/core/data/datasets/#eva.core.data.datasets.EmbeddingsClassificationDataset","title":"eva.core.data.datasets.EmbeddingsClassificationDataset
","text":" Bases: EmbeddingsDataset
Embeddings dataset class for classification tasks.
Expects a manifest file listing the paths of .pt files that contain tensor embeddings of shape [embedding_dim] or [1, embedding_dim].
Parameters:
Name Type Description Defaultroot
str
Root directory of the dataset.
requiredmanifest_file
str
The path to the manifest file, which is relative to the root
argument.
split
Literal['train', 'val', 'test'] | None
The dataset split to use. The split
column of the manifest file will be splitted based on this value.
None
column_mapping
Dict[str, str]
Defines the map between the variables and the manifest columns. It will overwrite the default_column_mapping
with the provided values, so that column_mapping
can contain only the values which are altered or missing.
default_column_mapping
embeddings_transforms
Callable | None
A function/transform that transforms the embedding.
None
target_transforms
Callable | None
A function/transform that transforms the target.
None
Source code in src/eva/core/data/datasets/embeddings/classification/embeddings.py
def __init__(\n self,\n root: str,\n manifest_file: str,\n split: Literal[\"train\", \"val\", \"test\"] | None = None,\n column_mapping: Dict[str, str] = base.default_column_mapping,\n embeddings_transforms: Callable | None = None,\n target_transforms: Callable | None = None,\n) -> None:\n \"\"\"Initialize dataset.\n\n Expects a manifest file listing the paths of .pt files that contain\n tensor embeddings of shape [embedding_dim] or [1, embedding_dim].\n\n Args:\n root: Root directory of the dataset.\n manifest_file: The path to the manifest file, which is relative to\n the `root` argument.\n split: The dataset split to use. The `split` column of the manifest\n file will be splitted based on this value.\n column_mapping: Defines the map between the variables and the manifest\n columns. It will overwrite the `default_column_mapping` with\n the provided values, so that `column_mapping` can contain only the\n values which are altered or missing.\n embeddings_transforms: A function/transform that transforms the embedding.\n target_transforms: A function/transform that transforms the target.\n \"\"\"\n super().__init__(\n root=root,\n manifest_file=manifest_file,\n split=split,\n column_mapping=column_mapping,\n embeddings_transforms=embeddings_transforms,\n target_transforms=target_transforms,\n )\n
"},{"location":"reference/core/data/datasets/#eva.core.data.datasets.MultiEmbeddingsClassificationDataset","title":"eva.core.data.datasets.MultiEmbeddingsClassificationDataset
","text":" Bases: EmbeddingsDataset
Dataset class for where a sample corresponds to multiple embeddings.
Example use case: Slide level dataset where each slide has multiple patch embeddings.
Expects a manifest file listing the paths of .pt
files containing tensor embeddings.
The manifest must have a column_mapping[\"multi_id\"]
column that contains the unique identifier group of embeddings. For oncology datasets, this would be usually the slide id. Each row in the manifest file points to a .pt file that can contain one or multiple embeddings. There can also be multiple rows for the same multi_id
, in which case the embeddings from the different .pt files corresponding to that same multi_id
will be stacked along the first dimension.
Parameters:
Name Type Description Defaultroot
str
Root directory of the dataset.
requiredmanifest_file
str
The path to the manifest file, which is relative to the root
argument.
split
Literal['train', 'val', 'test']
The dataset split to use. The split
column of the manifest file will be splitted based on this value.
column_mapping
Dict[str, str]
Defines the map between the variables and the manifest columns. It will overwrite the default_column_mapping
with the provided values, so that column_mapping
can contain only the values which are altered or missing.
default_column_mapping
embeddings_transforms
Callable | None
A function/transform that transforms the embedding.
None
target_transforms
Callable | None
A function/transform that transforms the target.
None
Source code in src/eva/core/data/datasets/embeddings/classification/multi_embeddings.py
def __init__(\n self,\n root: str,\n manifest_file: str,\n split: Literal[\"train\", \"val\", \"test\"],\n column_mapping: Dict[str, str] = base.default_column_mapping,\n embeddings_transforms: Callable | None = None,\n target_transforms: Callable | None = None,\n):\n \"\"\"Initialize dataset.\n\n Expects a manifest file listing the paths of `.pt` files containing tensor embeddings.\n\n The manifest must have a `column_mapping[\"multi_id\"]` column that contains the\n unique identifier group of embeddings. For oncology datasets, this would be usually\n the slide id. Each row in the manifest file points to a .pt file that can contain\n one or multiple embeddings. There can also be multiple rows for the same `multi_id`,\n in which case the embeddings from the different .pt files corresponding to that same\n `multi_id` will be stacked along the first dimension.\n\n Args:\n root: Root directory of the dataset.\n manifest_file: The path to the manifest file, which is relative to\n the `root` argument.\n split: The dataset split to use. The `split` column of the manifest\n file will be splitted based on this value.\n column_mapping: Defines the map between the variables and the manifest\n columns. It will overwrite the `default_column_mapping` with\n the provided values, so that `column_mapping` can contain only the\n values which are altered or missing.\n embeddings_transforms: A function/transform that transforms the embedding.\n target_transforms: A function/transform that transforms the target.\n \"\"\"\n super().__init__(\n manifest_file=manifest_file,\n root=root,\n split=split,\n column_mapping=column_mapping,\n embeddings_transforms=embeddings_transforms,\n target_transforms=target_transforms,\n )\n\n self._multi_ids: List[int]\n
"},{"location":"reference/core/data/transforms/","title":"Transforms","text":""},{"location":"reference/core/data/transforms/#eva.data.transforms.ArrayToTensor","title":"eva.data.transforms.ArrayToTensor
","text":"Converts a numpy array to a torch tensor.
"},{"location":"reference/core/data/transforms/#eva.data.transforms.ArrayToFloatTensor","title":"eva.data.transforms.ArrayToFloatTensor
","text":" Bases: ArrayToTensor
Converts a numpy array to a torch tensor and casts it to float.
"},{"location":"reference/core/data/transforms/#eva.data.transforms.Pad2DTensor","title":"eva.data.transforms.Pad2DTensor
","text":"Pads a 2D tensor to a fixed dimension accross the first dimension.
Parameters:
Name Type Description Defaultpad_size
int
The size to pad the tensor to. If the tensor is larger than this size, no padding will be applied.
requiredpad_value
int | float
The value to use for padding.
float('-inf')
Source code in src/eva/core/data/transforms/padding/pad_2d_tensor.py
def __init__(self, pad_size: int, pad_value: int | float = float(\"-inf\")):\n \"\"\"Initialize the transformation.\n\n Args:\n pad_size: The size to pad the tensor to. If the tensor is larger than this size,\n no padding will be applied.\n pad_value: The value to use for padding.\n \"\"\"\n self._pad_size = pad_size\n self._pad_value = pad_value\n
"},{"location":"reference/core/data/transforms/#eva.data.transforms.SampleFromAxis","title":"eva.data.transforms.SampleFromAxis
","text":"Samples n_samples entries from a tensor along a given axis.
Parameters:
Name Type Description Defaultn_samples
int
The number of samples to draw.
requiredseed
int
The seed to use for sampling.
42
axis
int
The axis along which to sample.
0
Source code in src/eva/core/data/transforms/sampling/sample_from_axis.py
def __init__(self, n_samples: int, seed: int = 42, axis: int = 0):\n \"\"\"Initialize the transformation.\n\n Args:\n n_samples: The number of samples to draw.\n seed: The seed to use for sampling.\n axis: The axis along which to sample.\n \"\"\"\n self._seed = seed\n self._n_samples = n_samples\n self._axis = axis\n self._generator = self._get_generator()\n
"},{"location":"reference/core/loggers/loggers/","title":"Loggers","text":""},{"location":"reference/core/loggers/loggers/#eva.core.loggers.DummyLogger","title":"eva.core.loggers.DummyLogger
","text":" Bases: DummyLogger
Dummy logger class.
This logger is currently used as a placeholder when saving results to remote storage, as common lightning loggers do not work with azure blob storage:
https://github.com/Lightning-AI/pytorch-lightning/issues/18861 https://github.com/Lightning-AI/pytorch-lightning/issues/19736
Simply disabling the loggers when pointing to remote storage doesn't work because callbacks such as LearningRateMonitor or ModelCheckpoint require a logger to be present.
Parameters:
Name Type Description Defaultsave_dir
str
The save directory (this logger does not save anything, but callbacks might use this path to save their outputs).
required Source code insrc/eva/core/loggers/dummy.py
def __init__(self, save_dir: str) -> None:\n \"\"\"Initializes the logger.\n\n Args:\n save_dir: The save directory (this logger does not save anything,\n but callbacks might use this path to save their outputs).\n \"\"\"\n super().__init__()\n self._save_dir = save_dir\n
"},{"location":"reference/core/loggers/loggers/#eva.core.loggers.DummyLogger.save_dir","title":"save_dir: str
property
","text":"Returns the save directory.
"},{"location":"reference/core/metrics/","title":"Metrics","text":"Reference information for the Metrics
classes.
eva.metrics.AverageLoss
","text":" Bases: Metric
Average loss metric tracker.
Source code insrc/eva/core/metrics/average_loss.py
def __init__(self) -> None:\n \"\"\"Initializes the metric.\"\"\"\n super().__init__()\n\n self.add_state(\"value\", default=torch.tensor(0), dist_reduce_fx=\"sum\")\n self.add_state(\"total\", default=torch.tensor(0), dist_reduce_fx=\"sum\")\n
"},{"location":"reference/core/metrics/binary_balanced_accuracy/","title":"Binary Balanced Accuracy","text":""},{"location":"reference/core/metrics/binary_balanced_accuracy/#eva.metrics.BinaryBalancedAccuracy","title":"eva.metrics.BinaryBalancedAccuracy
","text":" Bases: BinaryStatScores
Computes the balanced accuracy for binary classification.
"},{"location":"reference/core/metrics/binary_balanced_accuracy/#eva.metrics.BinaryBalancedAccuracy.compute","title":"compute
","text":"Compute accuracy based on inputs passed in to update
previously.
src/eva/core/metrics/binary_balanced_accuracy.py
def compute(self) -> Tensor:\n \"\"\"Compute accuracy based on inputs passed in to ``update`` previously.\"\"\"\n tp, fp, tn, fn = self._final_state()\n sensitivity = _safe_divide(tp, tp + fn)\n specificity = _safe_divide(tn, tn + fp)\n return 0.5 * (sensitivity + specificity)\n
"},{"location":"reference/core/metrics/core/","title":"Core","text":""},{"location":"reference/core/metrics/core/#eva.metrics.MetricModule","title":"eva.metrics.MetricModule
","text":" Bases: Module
The metrics module.
Allows to store and keep track of train
, val
and test
metrics.
Parameters:
Name Type Description Defaulttrain
MetricCollection | None
The training metric collection.
requiredval
MetricCollection | None
The validation metric collection.
requiredtest
MetricCollection | None
The test metric collection.
required Source code insrc/eva/core/metrics/structs/module.py
def __init__(\n self,\n train: collection.MetricCollection | None,\n val: collection.MetricCollection | None,\n test: collection.MetricCollection | None,\n) -> None:\n \"\"\"Initializes the metrics for the Trainer.\n\n Args:\n train: The training metric collection.\n val: The validation metric collection.\n test: The test metric collection.\n \"\"\"\n super().__init__()\n\n self._train = train or self.default_metric_collection\n self._val = val or self.default_metric_collection\n self._test = test or self.default_metric_collection\n
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricModule.default_metric_collection","title":"default_metric_collection: collection.MetricCollection
property
","text":"Returns the default metric collection.
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricModule.training_metrics","title":"training_metrics: collection.MetricCollection
property
","text":"Returns the metrics of the train dataset.
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricModule.validation_metrics","title":"validation_metrics: collection.MetricCollection
property
","text":"Returns the metrics of the validation dataset.
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricModule.test_metrics","title":"test_metrics: collection.MetricCollection
property
","text":"Returns the metrics of the test dataset.
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricModule.from_metrics","title":"from_metrics
classmethod
","text":"Initializes a metric module from a list of metrics.
Parameters:
Name Type Description Defaulttrain
MetricModuleType | None
Metrics for the training stage.
requiredval
MetricModuleType | None
Metrics for the validation stage.
requiredtest
MetricModuleType | None
Metrics for the test stage.
requiredseparator
str
The separator between the group name of the metric and the metric itself.
'/'
Source code in src/eva/core/metrics/structs/module.py
@classmethod\ndef from_metrics(\n cls,\n train: MetricModuleType | None,\n val: MetricModuleType | None,\n test: MetricModuleType | None,\n *,\n separator: str = \"/\",\n) -> MetricModule:\n \"\"\"Initializes a metric module from a list of metrics.\n\n Args:\n train: Metrics for the training stage.\n val: Metrics for the validation stage.\n test: Metrics for the test stage.\n separator: The separator between the group name of the metric\n and the metric itself.\n \"\"\"\n return cls(\n train=_create_collection_from_metrics(train, prefix=\"train\" + separator),\n val=_create_collection_from_metrics(val, prefix=\"val\" + separator),\n test=_create_collection_from_metrics(test, prefix=\"test\" + separator),\n )\n
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricModule.from_schema","title":"from_schema
classmethod
","text":"Initializes a metric module from the metrics schema.
Parameters:
Name Type Description Defaultschema
MetricsSchema
The dataclass metric schema.
requiredseparator
str
The separator between the group name of the metric and the metric itself.
'/'
Source code in src/eva/core/metrics/structs/module.py
@classmethod\ndef from_schema(\n cls,\n schema: schemas.MetricsSchema,\n *,\n separator: str = \"/\",\n) -> MetricModule:\n \"\"\"Initializes a metric module from the metrics schema.\n\n Args:\n schema: The dataclass metric schema.\n separator: The separator between the group name of the metric\n and the metric itself.\n \"\"\"\n return cls.from_metrics(\n train=schema.training_metrics,\n val=schema.evaluation_metrics,\n test=schema.evaluation_metrics,\n separator=separator,\n )\n
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricsSchema","title":"eva.metrics.MetricsSchema
dataclass
","text":"Metrics schema.
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricsSchema.common","title":"common: MetricModuleType | None = None
class-attribute
instance-attribute
","text":"Holds the common train and evaluation metrics.
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricsSchema.train","title":"train: MetricModuleType | None = None
class-attribute
instance-attribute
","text":"The exclusive training metrics.
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricsSchema.evaluation","title":"evaluation: MetricModuleType | None = None
class-attribute
instance-attribute
","text":"The exclusive evaluation metrics.
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricsSchema.training_metrics","title":"training_metrics: MetricModuleType | None
property
","text":"Returns the training metics.
"},{"location":"reference/core/metrics/core/#eva.metrics.MetricsSchema.evaluation_metrics","title":"evaluation_metrics: MetricModuleType | None
property
","text":"Returns the evaluation metics.
"},{"location":"reference/core/metrics/defaults/","title":"Defaults","text":""},{"location":"reference/core/metrics/defaults/#eva.metrics.BinaryClassificationMetrics","title":"eva.metrics.BinaryClassificationMetrics
","text":" Bases: MetricCollection
Default metrics for binary classification tasks.
The metrics instantiated here are:
Parameters:
Name Type Description Defaultthreshold
float
Threshold for transforming probability to binary (0,1) predictions
0.5
ignore_index
int | None
Specifies a target value that is ignored and does not contribute to the metric calculation.
None
prefix
str | None
A string to append in front of the keys of the output dict.
None
postfix
str | None
A string to append after the keys of the output dict.
None
Source code in src/eva/core/metrics/defaults/classification/binary.py
def __init__(\n self,\n threshold: float = 0.5,\n ignore_index: int | None = None,\n prefix: str | None = None,\n postfix: str | None = None,\n) -> None:\n \"\"\"Initializes the binary classification metrics.\n\n The metrics instantiated here are:\n\n - BinaryAUROC\n - BinaryAccuracy\n - BinaryBalancedAccuracy\n - BinaryF1Score\n - BinaryPrecision\n - BinaryRecall\n\n Args:\n threshold: Threshold for transforming probability to binary (0,1) predictions\n ignore_index: Specifies a target value that is ignored and does not\n contribute to the metric calculation.\n prefix: A string to append in front of the keys of the output dict.\n postfix: A string to append after the keys of the output dict.\n \"\"\"\n super().__init__(\n metrics=[\n classification.BinaryAUROC(\n ignore_index=ignore_index,\n ),\n classification.BinaryAccuracy(\n threshold=threshold,\n ignore_index=ignore_index,\n ),\n binary_balanced_accuracy.BinaryBalancedAccuracy(\n threshold=threshold,\n ignore_index=ignore_index,\n ),\n classification.BinaryF1Score(\n threshold=threshold,\n ignore_index=ignore_index,\n ),\n classification.BinaryPrecision(\n threshold=threshold,\n ignore_index=ignore_index,\n ),\n classification.BinaryRecall(\n threshold=threshold,\n ignore_index=ignore_index,\n ),\n ],\n prefix=prefix,\n postfix=postfix,\n compute_groups=[\n [\n \"BinaryAccuracy\",\n \"BinaryBalancedAccuracy\",\n \"BinaryF1Score\",\n \"BinaryPrecision\",\n \"BinaryRecall\",\n ],\n [\n \"BinaryAUROC\",\n ],\n ],\n )\n
"},{"location":"reference/core/metrics/defaults/#eva.metrics.MulticlassClassificationMetrics","title":"eva.metrics.MulticlassClassificationMetrics
","text":" Bases: MetricCollection
Default metrics for multi-class classification tasks.
The metrics instantiated here are:
Parameters:
Name Type Description Defaultnum_classes
int
Integer specifying the number of classes.
requiredaverage
Literal['macro', 'weighted', 'none']
Defines the reduction that is applied over labels.
'macro'
ignore_index
int | None
Specifies a target value that is ignored and does not contribute to the metric calculation.
None
prefix
str | None
A string to append in front of the keys of the output dict.
None
postfix
str | None
A string to append after the keys of the output dict.
None
Source code in src/eva/core/metrics/defaults/classification/multiclass.py
def __init__(\n self,\n num_classes: int,\n average: Literal[\"macro\", \"weighted\", \"none\"] = \"macro\",\n ignore_index: int | None = None,\n prefix: str | None = None,\n postfix: str | None = None,\n) -> None:\n \"\"\"Initializes the multi-class classification metrics.\n\n The metrics instantiated here are:\n\n - MulticlassAccuracy\n - MulticlassPrecision\n - MulticlassRecall\n - MulticlassF1Score\n - MulticlassAUROC\n\n Args:\n num_classes: Integer specifying the number of classes.\n average: Defines the reduction that is applied over labels.\n ignore_index: Specifies a target value that is ignored and does not\n contribute to the metric calculation.\n prefix: A string to append in front of the keys of the output dict.\n postfix: A string to append after the keys of the output dict.\n \"\"\"\n super().__init__(\n metrics=[\n classification.MulticlassAUROC(\n num_classes=num_classes,\n average=average,\n ignore_index=ignore_index,\n ),\n classification.MulticlassAccuracy(\n num_classes=num_classes,\n average=average,\n ignore_index=ignore_index,\n ),\n classification.MulticlassF1Score(\n num_classes=num_classes,\n average=average,\n ignore_index=ignore_index,\n ),\n classification.MulticlassPrecision(\n num_classes=num_classes,\n average=average,\n ignore_index=ignore_index,\n ),\n classification.MulticlassRecall(\n num_classes=num_classes,\n average=average,\n ignore_index=ignore_index,\n ),\n ],\n prefix=prefix,\n postfix=postfix,\n compute_groups=[\n [\n \"MulticlassAccuracy\",\n \"MulticlassF1Score\",\n \"MulticlassPrecision\",\n \"MulticlassRecall\",\n ],\n [\n \"MulticlassAUROC\",\n ],\n ],\n )\n
"},{"location":"reference/core/models/modules/","title":"Modules","text":"Reference information for the model Modules
API.
eva.models.modules.ModelModule
","text":" Bases: LightningModule
The base model module.
Parameters:
Name Type Description Defaultmetrics
MetricsSchema | None
The metric groups to track.
None
postprocess
BatchPostProcess | None
A list of helper functions to apply after the loss and before the metrics calculation to the model predictions and targets.
None
Source code in src/eva/core/models/modules/module.py
def __init__(\n self,\n metrics: metrics_lib.MetricsSchema | None = None,\n postprocess: batch_postprocess.BatchPostProcess | None = None,\n) -> None:\n \"\"\"Initializes the basic module.\n\n Args:\n metrics: The metric groups to track.\n postprocess: A list of helper functions to apply after the\n loss and before the metrics calculation to the model\n predictions and targets.\n \"\"\"\n super().__init__()\n\n self._metrics = metrics or self.default_metrics\n self._postprocess = postprocess or self.default_postprocess\n\n self.metrics = metrics_lib.MetricModule.from_schema(self._metrics)\n
"},{"location":"reference/core/models/modules/#eva.models.modules.ModelModule.default_metrics","title":"default_metrics: metrics_lib.MetricsSchema
property
","text":"The default metrics.
"},{"location":"reference/core/models/modules/#eva.models.modules.ModelModule.default_postprocess","title":"default_postprocess: batch_postprocess.BatchPostProcess
property
","text":"The default post-processes.
"},{"location":"reference/core/models/modules/#eva.models.modules.ModelModule.metrics_device","title":"metrics_device: torch.device
property
","text":"Returns the device by which the metrics should be calculated.
We allocate the metrics to CPU when operating on single device, as it is much faster, but to GPU when employing multiple ones, as DDP strategy requires the metrics to be allocated to the module's GPU.
"},{"location":"reference/core/models/modules/#eva.models.modules.HeadModule","title":"eva.models.modules.HeadModule
","text":" Bases: ModelModule
Neural Net Head Module for training on features.
It can be used for supervised (mini-batch) stochastic gradient descent downstream tasks such as classification, regression and segmentation.
Parameters:
Name Type Description Defaulthead
MODEL_TYPE
The neural network that would be trained on the features.
requiredcriterion
Callable[..., Tensor]
The loss function to use.
requiredbackbone
MODEL_TYPE | None
The feature extractor. If None
, it will be expected that the input batch returns the features directly.
None
optimizer
OptimizerCallable
The optimizer to use.
Adam
lr_scheduler
LRSchedulerCallable
The learning rate scheduler to use.
ConstantLR
metrics
MetricsSchema | None
The metric groups to track.
None
postprocess
BatchPostProcess | None
A list of helper functions to apply after the loss and before the metrics calculation to the model predictions and targets.
None
Source code in src/eva/core/models/modules/head.py
def __init__(\n self,\n head: MODEL_TYPE,\n criterion: Callable[..., torch.Tensor],\n backbone: MODEL_TYPE | None = None,\n optimizer: OptimizerCallable = optim.Adam,\n lr_scheduler: LRSchedulerCallable = lr_scheduler.ConstantLR,\n metrics: metrics_lib.MetricsSchema | None = None,\n postprocess: batch_postprocess.BatchPostProcess | None = None,\n) -> None:\n \"\"\"Initializes the neural net head module.\n\n Args:\n head: The neural network that would be trained on the features.\n criterion: The loss function to use.\n backbone: The feature extractor. If `None`, it will be expected\n that the input batch returns the features directly.\n optimizer: The optimizer to use.\n lr_scheduler: The learning rate scheduler to use.\n metrics: The metric groups to track.\n postprocess: A list of helper functions to apply after the\n loss and before the metrics calculation to the model\n predictions and targets.\n \"\"\"\n super().__init__(metrics=metrics, postprocess=postprocess)\n\n self.head = head\n self.criterion = criterion\n self.backbone = backbone\n self.optimizer = optimizer\n self.lr_scheduler = lr_scheduler\n
"},{"location":"reference/core/models/modules/#eva.models.modules.InferenceModule","title":"eva.models.modules.InferenceModule
","text":" Bases: ModelModule
An lightweight model module to perform inference.
Parameters:
Name Type Description Defaultbackbone
MODEL_TYPE
The network to be used for inference.
required Source code insrc/eva/core/models/modules/inference.py
def __init__(self, backbone: MODEL_TYPE) -> None:\n \"\"\"Initializes the module.\n\n Args:\n backbone: The network to be used for inference.\n \"\"\"\n super().__init__(metrics=None)\n\n self.backbone = backbone\n
"},{"location":"reference/core/models/networks/","title":"Networks","text":"Reference information for the model Networks
API.
eva.models.networks.MLP
","text":" Bases: Module
A Multi-layer Perceptron (MLP) network.
Parameters:
Name Type Description Defaultinput_size
int
The number of input features.
requiredoutput_size
int
The number of output features.
requiredhidden_layer_sizes
tuple[int, ...] | None
A list specifying the number of units in each hidden layer.
None
dropout
float
Dropout probability for hidden layers.
0.0
hidden_activation_fn
Type[Module] | None
Activation function to use for hidden layers. Default is ReLU.
ReLU
output_activation_fn
Type[Module] | None
Activation function to use for the output layer. Default is None.
None
Source code in src/eva/core/models/networks/mlp.py
def __init__(\n self,\n input_size: int,\n output_size: int,\n hidden_layer_sizes: tuple[int, ...] | None = None,\n hidden_activation_fn: Type[torch.nn.Module] | None = nn.ReLU,\n output_activation_fn: Type[torch.nn.Module] | None = None,\n dropout: float = 0.0,\n) -> None:\n \"\"\"Initializes the MLP.\n\n Args:\n input_size: The number of input features.\n output_size: The number of output features.\n hidden_layer_sizes: A list specifying the number of units in each hidden layer.\n dropout: Dropout probability for hidden layers.\n hidden_activation_fn: Activation function to use for hidden layers. Default is ReLU.\n output_activation_fn: Activation function to use for the output layer. Default is None.\n \"\"\"\n super().__init__()\n\n self.input_size = input_size\n self.output_size = output_size\n self.hidden_layer_sizes = hidden_layer_sizes if hidden_layer_sizes is not None else ()\n self.hidden_activation_fn = hidden_activation_fn\n self.output_activation_fn = output_activation_fn\n self.dropout = dropout\n\n self._network = self._build_network()\n
"},{"location":"reference/core/models/networks/#eva.models.networks.MLP.forward","title":"forward
","text":"Defines the forward pass of the MLP.
Parameters:
Name Type Description Defaultx
Tensor
The input tensor.
requiredReturns:
Type DescriptionTensor
The output of the network.
Source code insrc/eva/core/models/networks/mlp.py
def forward(self, x: torch.Tensor) -> torch.Tensor:\n \"\"\"Defines the forward pass of the MLP.\n\n Args:\n x: The input tensor.\n\n Returns:\n The output of the network.\n \"\"\"\n return self._network(x)\n
"},{"location":"reference/core/models/networks/#wrappers","title":"Wrappers","text":""},{"location":"reference/core/models/networks/#eva.models.networks.wrappers.BaseModel","title":"eva.models.networks.wrappers.BaseModel
","text":" Bases: Module
Base class for model wrappers.
Parameters:
Name Type Description Defaulttensor_transforms
Callable | None
The transforms to apply to the output tensor produced by the model.
None
Source code in src/eva/core/models/networks/wrappers/base.py
def __init__(self, tensor_transforms: Callable | None = None) -> None:\n \"\"\"Initializes the model.\n\n Args:\n tensor_transforms: The transforms to apply to the output\n tensor produced by the model.\n \"\"\"\n super().__init__()\n\n self._output_transforms = tensor_transforms\n
"},{"location":"reference/core/models/networks/#eva.models.networks.wrappers.BaseModel.load_model","title":"load_model
abstractmethod
","text":"Loads the model.
Source code insrc/eva/core/models/networks/wrappers/base.py
@abc.abstractmethod\ndef load_model(self) -> Callable[..., torch.Tensor]:\n \"\"\"Loads the model.\"\"\"\n raise NotImplementedError\n
"},{"location":"reference/core/models/networks/#eva.models.networks.wrappers.BaseModel.model_forward","title":"model_forward
abstractmethod
","text":"Implements the forward pass of the model.
Parameters:
Name Type Description Defaulttensor
Tensor
The input tensor to the model.
required Source code insrc/eva/core/models/networks/wrappers/base.py
@abc.abstractmethod\ndef model_forward(self, tensor: torch.Tensor) -> torch.Tensor:\n \"\"\"Implements the forward pass of the model.\n\n Args:\n tensor: The input tensor to the model.\n \"\"\"\n raise NotImplementedError\n
"},{"location":"reference/core/models/networks/#eva.models.networks.wrappers.ModelFromFunction","title":"eva.models.networks.wrappers.ModelFromFunction
","text":" Bases: BaseModel
Wrapper class for models which are initialized from functions.
This is helpful for initializing models in a .yaml
configuration file.
Parameters:
Name Type Description Defaultpath
Callable[..., Module]
The path to the callable object (class or function).
requiredarguments
Dict[str, Any] | None
The extra callable function / class arguments.
None
checkpoint_path
str | None
The path to the checkpoint to load the model weights from. This is currently only supported for torch model checkpoints. For other formats, the checkpoint loading should be handled within the provided callable object in . None
tensor_transforms
Callable | None
The transforms to apply to the output tensor produced by the model.
None
Source code in src/eva/core/models/networks/wrappers/from_function.py
def __init__(\n self,\n path: Callable[..., nn.Module],\n arguments: Dict[str, Any] | None = None,\n checkpoint_path: str | None = None,\n tensor_transforms: Callable | None = None,\n) -> None:\n \"\"\"Initializes and constructs the model.\n\n Args:\n path: The path to the callable object (class or function).\n arguments: The extra callable function / class arguments.\n checkpoint_path: The path to the checkpoint to load the model\n weights from. This is currently only supported for torch\n model checkpoints. For other formats, the checkpoint loading\n should be handled within the provided callable object in <path>.\n tensor_transforms: The transforms to apply to the output tensor\n produced by the model.\n \"\"\"\n super().__init__()\n\n self._path = path\n self._arguments = arguments\n self._checkpoint_path = checkpoint_path\n self._tensor_transforms = tensor_transforms\n\n self._model = self.load_model()\n
"},{"location":"reference/core/models/networks/#eva.models.networks.wrappers.HuggingFaceModel","title":"eva.models.networks.wrappers.HuggingFaceModel
","text":" Bases: BaseModel
Wrapper class for loading HuggingFace transformers
models.
Parameters:
Name Type Description Defaultmodel_name_or_path
str
The model name or path to load the model from. This can be a local path or a model name from the HuggingFace
model hub.
tensor_transforms
Callable | None
The transforms to apply to the output tensor produced by the model.
None
Source code in src/eva/core/models/networks/wrappers/huggingface.py
def __init__(self, model_name_or_path: str, tensor_transforms: Callable | None = None) -> None:\n \"\"\"Initializes the model.\n\n Args:\n model_name_or_path: The model name or path to load the model from.\n This can be a local path or a model name from the `HuggingFace`\n model hub.\n tensor_transforms: The transforms to apply to the output tensor\n produced by the model.\n \"\"\"\n super().__init__(tensor_transforms=tensor_transforms)\n\n self._model_name_or_path = model_name_or_path\n self._model = self.load_model()\n
"},{"location":"reference/core/models/networks/#eva.models.networks.wrappers.ONNXModel","title":"eva.models.networks.wrappers.ONNXModel
","text":" Bases: BaseModel
Wrapper class for loading ONNX models.
Parameters:
Name Type Description Defaultpath
str
The path to the .onnx model file.
requireddevice
Literal['cpu', 'cuda'] | None
The device to run the model on. This can be either \"cpu\" or \"cuda\".
'cpu'
tensor_transforms
Callable | None
The transforms to apply to the output tensor produced by the model.
None
Source code in src/eva/core/models/networks/wrappers/onnx.py
def __init__(\n self,\n path: str,\n device: Literal[\"cpu\", \"cuda\"] | None = \"cpu\",\n tensor_transforms: Callable | None = None,\n):\n \"\"\"Initializes the model.\n\n Args:\n path: The path to the .onnx model file.\n device: The device to run the model on. This can be either \"cpu\" or \"cuda\".\n tensor_transforms: The transforms to apply to the output tensor produced by the model.\n \"\"\"\n super().__init__(tensor_transforms=tensor_transforms)\n\n self._path = path\n self._device = device\n self._model = self.load_model()\n
"},{"location":"reference/core/trainers/functional/","title":"Functional","text":"Reference information for the trainers Functional
API.
eva.core.trainers.functional.run_evaluation_session
","text":"Runs a downstream evaluation session out-of-place.
It performs an evaluation run (fit and evaluate) on the model multiple times. Note that as the input base_trainer
and base_model
would be cloned, the input object would not be modified.
Parameters:
Name Type Description Defaultbase_trainer
Trainer
The base trainer module to use.
requiredbase_model
ModelModule
The base model module to use.
requireddatamodule
DataModule
The data module.
requiredn_runs
int
The amount of runs (fit and evaluate) to perform.
1
verbose
bool
Whether to verbose the session metrics instead of these of each individual runs and vice-versa.
True
Source code in src/eva/core/trainers/functional.py
def run_evaluation_session(\n base_trainer: eva_trainer.Trainer,\n base_model: modules.ModelModule,\n datamodule: datamodules.DataModule,\n *,\n n_runs: int = 1,\n verbose: bool = True,\n) -> None:\n \"\"\"Runs a downstream evaluation session out-of-place.\n\n It performs an evaluation run (fit and evaluate) on the model\n multiple times. Note that as the input `base_trainer` and\n `base_model` would be cloned, the input object would not\n be modified.\n\n Args:\n base_trainer: The base trainer module to use.\n base_model: The base model module to use.\n datamodule: The data module.\n n_runs: The amount of runs (fit and evaluate) to perform.\n verbose: Whether to verbose the session metrics instead of\n these of each individual runs and vice-versa.\n \"\"\"\n recorder = _recorder.SessionRecorder(output_dir=base_trainer.default_log_dir, verbose=verbose)\n for run_index in range(n_runs):\n validation_scores, test_scores = run_evaluation(\n base_trainer,\n base_model,\n datamodule,\n run_id=f\"run_{run_index}\",\n verbose=not verbose,\n )\n recorder.update(validation_scores, test_scores)\n recorder.save()\n
"},{"location":"reference/core/trainers/functional/#eva.core.trainers.functional.run_evaluation","title":"eva.core.trainers.functional.run_evaluation
","text":"Fits and evaluates a model out-of-place.
Parameters:
Name Type Description Defaultbase_trainer
Trainer
The base trainer to use but not modify.
requiredbase_model
ModelModule
The model module to use but not modify.
requireddatamodule
DataModule
The data module.
requiredrun_id
str | None
The run id to be appended to the output log directory. If None
, it will use the log directory of the trainer as is.
None
verbose
bool
Whether to print the validation and test metrics in the end of the training.
True
Returns:
Type DescriptionTuple[_EVALUATE_OUTPUT, _EVALUATE_OUTPUT | None]
A tuple of with the validation and the test metrics (if exists).
Source code insrc/eva/core/trainers/functional.py
def run_evaluation(\n base_trainer: eva_trainer.Trainer,\n base_model: modules.ModelModule,\n datamodule: datamodules.DataModule,\n *,\n run_id: str | None = None,\n verbose: bool = True,\n) -> Tuple[_EVALUATE_OUTPUT, _EVALUATE_OUTPUT | None]:\n \"\"\"Fits and evaluates a model out-of-place.\n\n Args:\n base_trainer: The base trainer to use but not modify.\n base_model: The model module to use but not modify.\n datamodule: The data module.\n run_id: The run id to be appended to the output log directory.\n If `None`, it will use the log directory of the trainer as is.\n verbose: Whether to print the validation and test metrics\n in the end of the training.\n\n Returns:\n A tuple of with the validation and the test metrics (if exists).\n \"\"\"\n trainer, model = _utils.clone(base_trainer, base_model)\n trainer.setup_log_dirs(run_id or \"\")\n return fit_and_validate(trainer, model, datamodule, verbose=verbose)\n
"},{"location":"reference/core/trainers/functional/#eva.core.trainers.functional.fit_and_validate","title":"eva.core.trainers.functional.fit_and_validate
","text":"Fits and evaluates a model in-place.
If the test set is set in the datamodule, it will evaluate the model on the test set as well.
Parameters:
Name Type Description Defaulttrainer
Trainer
The trainer module to use and update in-place.
requiredmodel
ModelModule
The model module to use and update in-place.
requireddatamodule
DataModule
The data module.
requiredverbose
bool
Whether to print the validation and test metrics in the end of the training.
True
Returns:
Type DescriptionTuple[_EVALUATE_OUTPUT, _EVALUATE_OUTPUT | None]
A tuple of with the validation and the test metrics (if exists).
Source code insrc/eva/core/trainers/functional.py
def fit_and_validate(\n trainer: eva_trainer.Trainer,\n model: modules.ModelModule,\n datamodule: datamodules.DataModule,\n verbose: bool = True,\n) -> Tuple[_EVALUATE_OUTPUT, _EVALUATE_OUTPUT | None]:\n \"\"\"Fits and evaluates a model in-place.\n\n If the test set is set in the datamodule, it will evaluate the model\n on the test set as well.\n\n Args:\n trainer: The trainer module to use and update in-place.\n model: The model module to use and update in-place.\n datamodule: The data module.\n verbose: Whether to print the validation and test metrics\n in the end of the training.\n\n Returns:\n A tuple of with the validation and the test metrics (if exists).\n \"\"\"\n trainer.fit(model, datamodule=datamodule)\n validation_scores = trainer.validate(datamodule=datamodule, verbose=verbose)\n test_scores = (\n None\n if datamodule.datasets.test is None\n else trainer.test(datamodule=datamodule, verbose=verbose)\n )\n return validation_scores, test_scores\n
"},{"location":"reference/core/trainers/functional/#eva.core.trainers.functional.infer_model","title":"eva.core.trainers.functional.infer_model
","text":"Performs model inference out-of-place.
Note that the input base_model
and base_trainer
would not be modified.
Parameters:
Name Type Description Defaultbase_trainer
Trainer
The base trainer to use but not modify.
requiredbase_model
ModelModule
The model module to use but not modify.
requireddatamodule
DataModule
The data module.
requiredreturn_predictions
bool
Whether to return the model predictions.
False
Source code in src/eva/core/trainers/functional.py
def infer_model(\n base_trainer: eva_trainer.Trainer,\n base_model: modules.ModelModule,\n datamodule: datamodules.DataModule,\n *,\n return_predictions: bool = False,\n) -> None:\n \"\"\"Performs model inference out-of-place.\n\n Note that the input `base_model` and `base_trainer` would\n not be modified.\n\n Args:\n base_trainer: The base trainer to use but not modify.\n base_model: The model module to use but not modify.\n datamodule: The data module.\n return_predictions: Whether to return the model predictions.\n \"\"\"\n trainer, model = _utils.clone(base_trainer, base_model)\n return trainer.predict(\n model=model,\n datamodule=datamodule,\n return_predictions=return_predictions,\n )\n
"},{"location":"reference/core/trainers/trainer/","title":"Trainers","text":"Reference information for the Trainers
API.
eva.core.trainers.Trainer
","text":" Bases: Trainer
Core trainer class.
This is an extended version of lightning's core trainer class.
For the input arguments, refer to ::class::lightning.pytorch.Trainer
.
Parameters:
Name Type Description Defaultargs
Any
Positional arguments of ::class::lightning.pytorch.Trainer
.
()
default_root_dir
str
The default root directory to store the output logs. Unlike in ::class::lightning.pytorch.Trainer
, this path would be the prioritized destination point.
'logs'
n_runs
int
The amount of runs (fit and evaluate) to perform in an evaluation session.
1
kwargs
Any
Kew-word arguments of ::class::lightning.pytorch.Trainer
.
{}
Source code in src/eva/core/trainers/trainer.py
@argparse._defaults_from_env_vars\ndef __init__(\n self,\n *args: Any,\n default_root_dir: str = \"logs\",\n n_runs: int = 1,\n **kwargs: Any,\n) -> None:\n \"\"\"Initializes the trainer.\n\n For the input arguments, refer to ::class::`lightning.pytorch.Trainer`.\n\n Args:\n args: Positional arguments of ::class::`lightning.pytorch.Trainer`.\n default_root_dir: The default root directory to store the output logs.\n Unlike in ::class::`lightning.pytorch.Trainer`, this path would be the\n prioritized destination point.\n n_runs: The amount of runs (fit and evaluate) to perform in an evaluation session.\n kwargs: Kew-word arguments of ::class::`lightning.pytorch.Trainer`.\n \"\"\"\n super().__init__(*args, default_root_dir=default_root_dir, **kwargs)\n\n self._n_runs = n_runs\n\n self._session_id: str = _logging.generate_session_id()\n self._log_dir: str = self.default_log_dir\n\n self.setup_log_dirs()\n
"},{"location":"reference/core/trainers/trainer/#eva.core.trainers.Trainer.default_log_dir","title":"default_log_dir: str
property
","text":"Returns the default log directory.
"},{"location":"reference/core/trainers/trainer/#eva.core.trainers.Trainer.setup_log_dirs","title":"setup_log_dirs
","text":"Setups the logging directory of the trainer and experimental loggers in-place.
Parameters:
Name Type Description Defaultsubdirectory
str
Whether to append a subdirectory to the output log.
''
Source code in src/eva/core/trainers/trainer.py
def setup_log_dirs(self, subdirectory: str = \"\") -> None:\n \"\"\"Setups the logging directory of the trainer and experimental loggers in-place.\n\n Args:\n subdirectory: Whether to append a subdirectory to the output log.\n \"\"\"\n self._log_dir = os.path.join(self.default_root_dir, self._session_id, subdirectory)\n\n enabled_loggers = []\n if isinstance(self.loggers, list) and len(self.loggers) > 0:\n for logger in self.loggers:\n if isinstance(logger, (pl_loggers.CSVLogger, pl_loggers.TensorBoardLogger)):\n if not cloud_io._is_local_file_protocol(self.default_root_dir):\n loguru.logger.warning(\n f\"Skipped {type(logger).__name__} as remote storage is not supported.\"\n )\n continue\n else:\n logger._root_dir = self.default_root_dir\n logger._name = self._session_id\n logger._version = subdirectory\n enabled_loggers.append(logger)\n\n self._loggers = enabled_loggers or [eva_loggers.DummyLogger(self._log_dir)]\n
"},{"location":"reference/core/trainers/trainer/#eva.core.trainers.Trainer.run_evaluation_session","title":"run_evaluation_session
","text":"Runs an evaluation session out-of-place.
It performs an evaluation run (fit and evaluate) the model self._n_run
times. Note that the input base_model
would not be modified, so the weights of the input model will remain as they are.
Parameters:
Name Type Description Defaultmodel
ModelModule
The base model module to evaluate.
requireddatamodule
DataModule
The data module.
required Source code insrc/eva/core/trainers/trainer.py
def run_evaluation_session(\n self,\n model: modules.ModelModule,\n datamodule: datamodules.DataModule,\n) -> None:\n \"\"\"Runs an evaluation session out-of-place.\n\n It performs an evaluation run (fit and evaluate) the model\n `self._n_run` times. Note that the input `base_model` would\n not be modified, so the weights of the input model will remain\n as they are.\n\n Args:\n model: The base model module to evaluate.\n datamodule: The data module.\n \"\"\"\n functional.run_evaluation_session(\n base_trainer=self,\n base_model=model,\n datamodule=datamodule,\n n_runs=self._n_runs,\n verbose=self._n_runs > 1,\n )\n
"},{"location":"reference/core/utils/multiprocessing/","title":"Multiprocessing","text":"Reference information for the utils Multiprocessing
API.
eva.core.utils.multiprocessing.Process
","text":" Bases: Process
Multiprocessing wrapper with logic to propagate exceptions to the parent process.
Source: https://stackoverflow.com/a/33599967/4992248
Source code insrc/eva/core/utils/multiprocessing.py
def __init__(self, *args: Any, **kwargs: Any) -> None:\n \"\"\"Initialize the process.\"\"\"\n multiprocessing.Process.__init__(self, *args, **kwargs)\n\n self._parent_conn, self._child_conn = multiprocessing.Pipe()\n self._exception = None\n
"},{"location":"reference/core/utils/multiprocessing/#eva.core.utils.multiprocessing.Process.exception","title":"exception
property
","text":"Property that contains exception information from the process.
"},{"location":"reference/core/utils/multiprocessing/#eva.core.utils.multiprocessing.Process.run","title":"run
","text":"Run the process.
Source code insrc/eva/core/utils/multiprocessing.py
def run(self) -> None:\n \"\"\"Run the process.\"\"\"\n try:\n multiprocessing.Process.run(self)\n self._child_conn.send(None)\n except Exception as e:\n tb = traceback.format_exc()\n self._child_conn.send((e, tb))\n
"},{"location":"reference/core/utils/multiprocessing/#eva.core.utils.multiprocessing.Process.check_exceptions","title":"check_exceptions
","text":"Check for exception propagate it to the parent process.
Source code insrc/eva/core/utils/multiprocessing.py
def check_exceptions(self) -> None:\n \"\"\"Check for exception propagate it to the parent process.\"\"\"\n if not self.is_alive():\n if self.exception:\n error, traceback = self.exception\n sys.stderr.write(traceback + \"\\n\")\n raise error\n
"},{"location":"reference/core/utils/workers/","title":"Workers","text":"Reference information for the utils Workers
API.
eva.core.utils.workers.main_worker_only
","text":"Function decorator which will execute it only on main / worker process.
Source code insrc/eva/core/utils/workers.py
def main_worker_only(func: Callable) -> Any:\n \"\"\"Function decorator which will execute it only on main / worker process.\"\"\"\n\n def wrapper(*args: Any, **kwargs: Any) -> Any:\n \"\"\"Wrapper function for the decorated method.\"\"\"\n if is_main_worker():\n return func(*args, **kwargs)\n\n return wrapper\n
"},{"location":"reference/core/utils/workers/#eva.core.utils.workers.is_main_worker","title":"eva.core.utils.workers.is_main_worker
","text":"Returns whether the main process / worker is currently used.
Source code insrc/eva/core/utils/workers.py
def is_main_worker() -> bool:\n \"\"\"Returns whether the main process / worker is currently used.\"\"\"\n process = multiprocessing.current_process()\n return process.name == \"MainProcess\"\n
"},{"location":"reference/vision/","title":"Vision","text":"Reference information for the Vision
API.
If you have not already installed the Vision
-package, install it with:
pip install 'kaiko-eva[vision]'\n
"},{"location":"reference/vision/utils/","title":"Utils","text":""},{"location":"reference/vision/utils/#eva.vision.utils.io.image","title":"eva.vision.utils.io.image
","text":"Image I/O related functions.
"},{"location":"reference/vision/utils/#eva.vision.utils.io.image.read_image","title":"read_image
","text":"Reads and loads the image from a file path as a RGB.
Parameters:
Name Type Description Defaultpath
str
The path of the image file.
requiredReturns:
Type DescriptionNDArray[uint8]
The RGB image as a numpy array.
Raises:
Type DescriptionFileExistsError
If the path does not exist or it is unreachable.
IOError
If the image could not be loaded.
Source code insrc/eva/vision/utils/io/image.py
def read_image(path: str) -> npt.NDArray[np.uint8]:\n \"\"\"Reads and loads the image from a file path as a RGB.\n\n Args:\n path: The path of the image file.\n\n Returns:\n The RGB image as a numpy array.\n\n Raises:\n FileExistsError: If the path does not exist or it is unreachable.\n IOError: If the image could not be loaded.\n \"\"\"\n return read_image_as_array(path, cv2.IMREAD_COLOR)\n
"},{"location":"reference/vision/utils/#eva.vision.utils.io.image.read_image_as_array","title":"read_image_as_array
","text":"Reads and loads an image file as a numpy array.
Parameters:
Name Type Description Defaultpath
str
The path to the image file.
requiredflags
int
Specifies the way in which the image should be read.
IMREAD_UNCHANGED
Returns:
Type DescriptionNDArray[uint8]
The image as a numpy array.
Raises:
Type DescriptionFileExistsError
If the path does not exist or it is unreachable.
IOError
If the image could not be loaded.
Source code insrc/eva/vision/utils/io/image.py
def read_image_as_array(path: str, flags: int = cv2.IMREAD_UNCHANGED) -> npt.NDArray[np.uint8]:\n \"\"\"Reads and loads an image file as a numpy array.\n\n Args:\n path: The path to the image file.\n flags: Specifies the way in which the image should be read.\n\n Returns:\n The image as a numpy array.\n\n Raises:\n FileExistsError: If the path does not exist or it is unreachable.\n IOError: If the image could not be loaded.\n \"\"\"\n _utils.check_file(path)\n image = cv2.imread(path, flags=flags)\n if image is None:\n raise IOError(\n f\"Input '{path}' could not be loaded. \"\n \"Please verify that the path is a valid image file.\"\n )\n\n if image.ndim == 3:\n image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n\n if image.ndim == 2 and flags == cv2.IMREAD_COLOR:\n image = image[:, :, np.newaxis]\n\n return np.asarray(image).astype(np.uint8)\n
"},{"location":"reference/vision/utils/#eva.vision.utils.io.nifti","title":"eva.vision.utils.io.nifti
","text":"NIfTI I/O related functions.
"},{"location":"reference/vision/utils/#eva.vision.utils.io.nifti.read_nifti_slice","title":"read_nifti_slice
","text":"Reads and loads a NIfTI image from a file path as uint8
.
Parameters:
Name Type Description Defaultpath
str
The path to the NIfTI file.
requiredslice_index
int
The image slice index to return.
requireduse_storage_dtype
bool
Whether to cast the raw image array to the inferred type.
True
Returns:
Type DescriptionNDArray[Any]
The image as a numpy array (height, width, channels).
Raises:
Type DescriptionFileExistsError
If the path does not exist or it is unreachable.
ValueError
If the input channel is invalid for the image.
Source code insrc/eva/vision/utils/io/nifti.py
def read_nifti_slice(\n path: str, slice_index: int, *, use_storage_dtype: bool = True\n) -> npt.NDArray[Any]:\n \"\"\"Reads and loads a NIfTI image from a file path as `uint8`.\n\n Args:\n path: The path to the NIfTI file.\n slice_index: The image slice index to return.\n use_storage_dtype: Whether to cast the raw image\n array to the inferred type.\n\n Returns:\n The image as a numpy array (height, width, channels).\n\n Raises:\n FileExistsError: If the path does not exist or it is unreachable.\n ValueError: If the input channel is invalid for the image.\n \"\"\"\n _utils.check_file(path)\n image_data = nib.load(path) # type: ignore\n image_slice = image_data.slicer[:, :, slice_index : slice_index + 1] # type: ignore\n image_array = image_slice.get_fdata()\n if use_storage_dtype:\n image_array = image_array.astype(image_data.get_data_dtype()) # type: ignore\n return image_array\n
"},{"location":"reference/vision/utils/#eva.vision.utils.io.nifti.fetch_total_nifti_slices","title":"fetch_total_nifti_slices
","text":"Fetches the total slides of a NIfTI image file.
Parameters:
Name Type Description Defaultpath
str
The path to the NIfTI file.
requiredReturns:
Type Descriptionint
The number of the total available slides.
Raises:
Type DescriptionFileExistsError
If the path does not exist or it is unreachable.
ValueError
If the input channel is invalid for the image.
Source code insrc/eva/vision/utils/io/nifti.py
def fetch_total_nifti_slices(path: str) -> int:\n \"\"\"Fetches the total slides of a NIfTI image file.\n\n Args:\n path: The path to the NIfTI file.\n\n Returns:\n The number of the total available slides.\n\n Raises:\n FileExistsError: If the path does not exist or it is unreachable.\n ValueError: If the input channel is invalid for the image.\n \"\"\"\n _utils.check_file(path)\n image = nib.load(path) # type: ignore\n image_shape = image.header.get_data_shape() # type: ignore\n return image_shape[-1]\n
"},{"location":"reference/vision/data/","title":"Vision Data","text":"Reference information for the Vision Data
API.
eva.vision.data.datasets.VisionDataset
","text":" Bases: Dataset
, ABC
, Generic[DataSample]
Base dataset class for vision tasks.
"},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.VisionDataset.filename","title":"filename
abstractmethod
","text":"Returns the filename of the index
'th data sample.
Note that this is the relative file path to the root.
Parameters:
Name Type Description Defaultindex
int
The index of the data-sample to select.
requiredReturns:
Type Descriptionstr
The filename of the index
'th data sample.
src/eva/vision/data/datasets/vision.py
@abc.abstractmethod\ndef filename(self, index: int) -> str:\n \"\"\"Returns the filename of the `index`'th data sample.\n\n Note that this is the relative file path to the root.\n\n Args:\n index: The index of the data-sample to select.\n\n Returns:\n The filename of the `index`'th data sample.\n \"\"\"\n
"},{"location":"reference/vision/data/datasets/#classification-datasets","title":"Classification datasets","text":""},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.BACH","title":"eva.vision.data.datasets.BACH
","text":" Bases: ImageClassification
Dataset class for BACH images and corresponding targets.
The dataset is split into train and validation by taking into account the patient IDs to avoid any data leakage.
Parameters:
Name Type Description Defaultroot
str
Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist.
requiredsplit
Literal['train', 'val'] | None
Dataset split to use. If None
, the entire dataset is used.
None
download
bool
Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:prepare_data
method and if the data does not yet exist on disk.
False
image_transforms
Callable | None
A function/transform that takes in an image and returns a transformed version.
None
target_transforms
Callable | None
A function/transform that takes in the target and transforms it.
None
Source code in src/eva/vision/data/datasets/classification/bach.py
def __init__(\n self,\n root: str,\n split: Literal[\"train\", \"val\"] | None = None,\n download: bool = False,\n image_transforms: Callable | None = None,\n target_transforms: Callable | None = None,\n) -> None:\n \"\"\"Initialize the dataset.\n\n The dataset is split into train and validation by taking into account\n the patient IDs to avoid any data leakage.\n\n Args:\n root: Path to the root directory of the dataset. The dataset will\n be downloaded and extracted here, if it does not already exist.\n split: Dataset split to use. If `None`, the entire dataset is used.\n download: Whether to download the data for the specified split.\n Note that the download will be executed only by additionally\n calling the :meth:`prepare_data` method and if the data does\n not yet exist on disk.\n image_transforms: A function/transform that takes in an image\n and returns a transformed version.\n target_transforms: A function/transform that takes in the target\n and transforms it.\n \"\"\"\n super().__init__(\n image_transforms=image_transforms,\n target_transforms=target_transforms,\n )\n\n self._root = root\n self._split = split\n self._download = download\n\n self._samples: List[Tuple[str, int]] = []\n self._indices: List[int] = []\n
"},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.PatchCamelyon","title":"eva.vision.data.datasets.PatchCamelyon
","text":" Bases: ImageClassification
Dataset class for PatchCamelyon images and corresponding targets.
Parameters:
Name Type Description Defaultroot
str
The path to the dataset root. This path should contain the uncompressed h5 files and the metadata.
requiredsplit
Literal['train', 'val', 'test']
The dataset split for training, validation, or testing.
requireddownload
bool
Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:prepare_data
method.
False
image_transforms
Callable | None
A function/transform that takes in an image and returns a transformed version.
None
target_transforms
Callable | None
A function/transform that takes in the target and transforms it.
None
Source code in src/eva/vision/data/datasets/classification/patch_camelyon.py
def __init__(\n self,\n root: str,\n split: Literal[\"train\", \"val\", \"test\"],\n download: bool = False,\n image_transforms: Callable | None = None,\n target_transforms: Callable | None = None,\n) -> None:\n \"\"\"Initializes the dataset.\n\n Args:\n root: The path to the dataset root. This path should contain\n the uncompressed h5 files and the metadata.\n split: The dataset split for training, validation, or testing.\n download: Whether to download the data for the specified split.\n Note that the download will be executed only by additionally\n calling the :meth:`prepare_data` method.\n image_transforms: A function/transform that takes in an image\n and returns a transformed version.\n target_transforms: A function/transform that takes in the target\n and transforms it.\n \"\"\"\n super().__init__(\n image_transforms=image_transforms,\n target_transforms=target_transforms,\n )\n\n self._root = root\n self._split = split\n self._download = download\n
"},{"location":"reference/vision/data/datasets/#segmentation-datasets","title":"Segmentation datasets","text":""},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.ImageSegmentation","title":"eva.vision.data.datasets.ImageSegmentation
","text":" Bases: VisionDataset[Tuple[Image, Mask]]
, ABC
Image segmentation abstract dataset.
Parameters:
Name Type Description Defaulttransforms
Callable | None
A function/transforms that takes in an image and a label and returns the transformed versions of both.
None
Source code in src/eva/vision/data/datasets/segmentation/base.py
def __init__(\n self,\n transforms: Callable | None = None,\n) -> None:\n \"\"\"Initializes the image segmentation base class.\n\n Args:\n transforms: A function/transforms that takes in an\n image and a label and returns the transformed versions of both.\n \"\"\"\n super().__init__()\n\n self._transforms = transforms\n
"},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.ImageSegmentation.classes","title":"classes: List[str] | None
property
","text":"Returns the list with names of the dataset names.
"},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.ImageSegmentation.class_to_idx","title":"class_to_idx: Dict[str, int] | None
property
","text":"Returns a mapping of the class name to its target index.
"},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.ImageSegmentation.load_metadata","title":"load_metadata
","text":"Returns the dataset metadata.
Parameters:
Name Type Description Defaultindex
int | None
The index of the data sample to return the metadata of. If None
, it will return the metadata of the current dataset.
Returns:
Type DescriptionDict[str, Any] | List[Dict[str, Any]] | None
The sample metadata.
Source code insrc/eva/vision/data/datasets/segmentation/base.py
def load_metadata(self, index: int | None) -> Dict[str, Any] | List[Dict[str, Any]] | None:\n \"\"\"Returns the dataset metadata.\n\n Args:\n index: The index of the data sample to return the metadata of.\n If `None`, it will return the metadata of the current dataset.\n\n Returns:\n The sample metadata.\n \"\"\"\n
"},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.ImageSegmentation.load_image","title":"load_image
abstractmethod
","text":"Loads and returns the index
'th image sample.
Parameters:
Name Type Description Defaultindex
int
The index of the data sample to load.
requiredReturns:
Type DescriptionImage
An image torchvision tensor (channels, height, width).
Source code insrc/eva/vision/data/datasets/segmentation/base.py
@abc.abstractmethod\ndef load_image(self, index: int) -> tv_tensors.Image:\n \"\"\"Loads and returns the `index`'th image sample.\n\n Args:\n index: The index of the data sample to load.\n\n Returns:\n An image torchvision tensor (channels, height, width).\n \"\"\"\n
"},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.ImageSegmentation.load_mask","title":"load_mask
abstractmethod
","text":"Returns the index
'th target masks sample.
Parameters:
Name Type Description Defaultindex
int
The index of the data sample target masks to load.
requiredReturns:
Type DescriptionMask
The semantic mask as a (H x W) shaped tensor with integer
Mask
values which represent the pixel class id.
Source code insrc/eva/vision/data/datasets/segmentation/base.py
@abc.abstractmethod\ndef load_mask(self, index: int) -> tv_tensors.Mask:\n \"\"\"Returns the `index`'th target masks sample.\n\n Args:\n index: The index of the data sample target masks to load.\n\n Returns:\n The semantic mask as a (H x W) shaped tensor with integer\n values which represent the pixel class id.\n \"\"\"\n
"},{"location":"reference/vision/data/datasets/#eva.vision.data.datasets.TotalSegmentator2D","title":"eva.vision.data.datasets.TotalSegmentator2D
","text":" Bases: ImageSegmentation
TotalSegmentator 2D segmentation dataset.
Parameters:
Name Type Description Defaultroot
str
Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist.
requiredsplit
Literal['train', 'val'] | None
Dataset split to use. If None
, the entire dataset is used.
version
Literal['small', 'full'] | None
The version of the dataset to initialize. If None
, it will use the files located at root as is and wont perform any checks.
'small'
download
bool
Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:prepare_data
method and if the data does not exist yet on disk.
False
as_uint8
bool
Whether to convert and return the images as a 8-bit.
True
transforms
Callable | None
A function/transforms that takes in an image and a target mask and returns the transformed versions of both.
None
Source code in src/eva/vision/data/datasets/segmentation/total_segmentator.py
def __init__(\n self,\n root: str,\n split: Literal[\"train\", \"val\"] | None,\n version: Literal[\"small\", \"full\"] | None = \"small\",\n download: bool = False,\n as_uint8: bool = True,\n transforms: Callable | None = None,\n) -> None:\n \"\"\"Initialize dataset.\n\n Args:\n root: Path to the root directory of the dataset. The dataset will\n be downloaded and extracted here, if it does not already exist.\n split: Dataset split to use. If `None`, the entire dataset is used.\n version: The version of the dataset to initialize. If `None`, it will\n use the files located at root as is and wont perform any checks.\n download: Whether to download the data for the specified split.\n Note that the download will be executed only by additionally\n calling the :meth:`prepare_data` method and if the data does not\n exist yet on disk.\n as_uint8: Whether to convert and return the images as a 8-bit.\n transforms: A function/transforms that takes in an image and a target\n mask and returns the transformed versions of both.\n \"\"\"\n super().__init__(transforms=transforms)\n\n self._root = root\n self._split = split\n self._version = version\n self._download = download\n self._as_uint8 = as_uint8\n\n self._samples_dirs: List[str] = []\n self._indices: List[Tuple[int, int]] = []\n
"},{"location":"reference/vision/data/transforms/","title":"Transforms","text":""},{"location":"reference/vision/data/transforms/#eva.core.data.transforms.dtype.ArrayToTensor","title":"eva.core.data.transforms.dtype.ArrayToTensor
","text":"Converts a numpy array to a torch tensor.
"},{"location":"reference/vision/data/transforms/#eva.core.data.transforms.dtype.ArrayToFloatTensor","title":"eva.core.data.transforms.dtype.ArrayToFloatTensor
","text":" Bases: ArrayToTensor
Converts a numpy array to a torch tensor and casts it to float.
"},{"location":"reference/vision/data/transforms/#eva.vision.data.transforms.ResizeAndCrop","title":"eva.vision.data.transforms.ResizeAndCrop
","text":" Bases: Compose
Resizes, crops and normalizes an input image while preserving its aspect ratio.
Parameters:
Name Type Description Defaultsize
int | Sequence[int]
Desired output size of the crop. If size is an int
instead of sequence like (h, w), a square crop (size, size) is made.
224
mean
Sequence[float]
Sequence of means for each image channel.
(0.5, 0.5, 0.5)
std
Sequence[float]
Sequence of standard deviations for each image channel.
(0.5, 0.5, 0.5)
Source code in src/eva/vision/data/transforms/common/resize_and_crop.py
def __init__(\n self,\n size: int | Sequence[int] = 224,\n mean: Sequence[float] = (0.5, 0.5, 0.5),\n std: Sequence[float] = (0.5, 0.5, 0.5),\n) -> None:\n \"\"\"Initializes the transform object.\n\n Args:\n size: Desired output size of the crop. If size is an `int` instead\n of sequence like (h, w), a square crop (size, size) is made.\n mean: Sequence of means for each image channel.\n std: Sequence of standard deviations for each image channel.\n \"\"\"\n self._size = size\n self._mean = mean\n self._std = std\n\n super().__init__(transforms=self._build_transforms())\n
"},{"location":"reference/vision/models/networks/","title":"Networks","text":""},{"location":"reference/vision/models/networks/#eva.vision.models.networks.ABMIL","title":"eva.vision.models.networks.ABMIL
","text":" Bases: Module
ABMIL network for multiple instance learning classification tasks.
Takes an array of patch level embeddings per slide as input. This implementation supports batched inputs of shape (batch_size
, n_instances
, input_size
). For slides with less than n_instances
patches, you can apply padding and provide a mask tensor to the forward pass.
The original implementation from [1] was used as a reference: https://github.com/AMLab-Amsterdam/AttentionDeepMIL/blob/master/model.py
Notes[1] Maximilian Ilse, Jakub M. Tomczak, Max Welling, \"Attention-based Deep Multiple Instance Learning\", 2018 https://arxiv.org/abs/1802.04712
Parameters:
Name Type Description Defaultinput_size
int
input embedding dimension
requiredoutput_size
int
number of classes
requiredprojected_input_size
int | None
size of the projected input. if None
, no projection is performed.
hidden_size_attention
int
hidden dimension in attention network
128
hidden_sizes_mlp
tuple
dimensions for hidden layers in last mlp
(128, 64)
use_bias
bool
whether to use bias in the attention network
True
dropout_input_embeddings
float
dropout rate for the input embeddings
0.0
dropout_attention
float
dropout rate for the attention network and classifier
0.0
dropout_mlp
float
dropout rate for the final MLP network
0.0
pad_value
int | float | None
Value indicating padding in the input tensor. If specified, entries with this value in the will be masked. If set to None
, no masking is applied.
float('-inf')
Source code in src/eva/vision/models/networks/abmil.py
def __init__(\n self,\n input_size: int,\n output_size: int,\n projected_input_size: int | None,\n hidden_size_attention: int = 128,\n hidden_sizes_mlp: tuple = (128, 64),\n use_bias: bool = True,\n dropout_input_embeddings: float = 0.0,\n dropout_attention: float = 0.0,\n dropout_mlp: float = 0.0,\n pad_value: int | float | None = float(\"-inf\"),\n) -> None:\n \"\"\"Initializes the ABMIL network.\n\n Args:\n input_size: input embedding dimension\n output_size: number of classes\n projected_input_size: size of the projected input. if `None`, no projection is\n performed.\n hidden_size_attention: hidden dimension in attention network\n hidden_sizes_mlp: dimensions for hidden layers in last mlp\n use_bias: whether to use bias in the attention network\n dropout_input_embeddings: dropout rate for the input embeddings\n dropout_attention: dropout rate for the attention network and classifier\n dropout_mlp: dropout rate for the final MLP network\n pad_value: Value indicating padding in the input tensor. If specified, entries with\n this value in the will be masked. If set to `None`, no masking is applied.\n \"\"\"\n super().__init__()\n\n self._pad_value = pad_value\n\n if projected_input_size:\n self.projector = nn.Sequential(\n nn.Linear(input_size, projected_input_size, bias=True),\n nn.Dropout(p=dropout_input_embeddings),\n )\n input_size = projected_input_size\n else:\n self.projector = nn.Dropout(p=dropout_input_embeddings)\n\n self.gated_attention = GatedAttention(\n input_dim=input_size,\n hidden_dim=hidden_size_attention,\n dropout=dropout_attention,\n n_classes=1,\n use_bias=use_bias,\n )\n\n self.classifier = MLP(\n input_size=input_size,\n output_size=output_size,\n hidden_layer_sizes=hidden_sizes_mlp,\n dropout=dropout_mlp,\n hidden_activation_fn=nn.ReLU,\n )\n
"},{"location":"reference/vision/models/networks/#eva.vision.models.networks.ABMIL.forward","title":"forward
","text":"Forward pass.
Parameters:
Name Type Description Defaultinput_tensor
Tensor
Tensor with expected shape of (batch_size, n_instances, input_size).
required Source code insrc/eva/vision/models/networks/abmil.py
def forward(self, input_tensor: torch.Tensor) -> torch.Tensor:\n \"\"\"Forward pass.\n\n Args:\n input_tensor: Tensor with expected shape of (batch_size, n_instances, input_size).\n \"\"\"\n input_tensor, mask = self._mask_values(input_tensor, self._pad_value)\n\n # (batch_size, n_instances, input_size) -> (batch_size, n_instances, projected_input_size)\n input_tensor = self.projector(input_tensor)\n\n attention_logits = self.gated_attention(input_tensor) # (batch_size, n_instances, 1)\n if mask is not None:\n # fill masked values with -inf, which will yield 0s after softmax\n attention_logits = attention_logits.masked_fill(mask, float(\"-inf\"))\n\n attention_weights = nn.functional.softmax(attention_logits, dim=1)\n # (batch_size, n_instances, 1)\n\n attention_result = torch.matmul(torch.transpose(attention_weights, 1, 2), input_tensor)\n # (batch_size, 1, hidden_size_attention)\n\n attention_result = torch.squeeze(attention_result, 1) # (batch_size, hidden_size_attention)\n\n return self.classifier(attention_result) # (batch_size, output_size)\n
"},{"location":"reference/vision/utils/io/","title":"IO","text":""},{"location":"reference/vision/utils/io/#eva.vision.utils.io.image","title":"eva.vision.utils.io.image
","text":"Image I/O related functions.
"},{"location":"reference/vision/utils/io/#eva.vision.utils.io.image.read_image","title":"read_image
","text":"Reads and loads the image from a file path as a RGB.
Parameters:
Name Type Description Defaultpath
str
The path of the image file.
requiredReturns:
Type DescriptionNDArray[uint8]
The RGB image as a numpy array.
Raises:
Type DescriptionFileExistsError
If the path does not exist or it is unreachable.
IOError
If the image could not be loaded.
Source code insrc/eva/vision/utils/io/image.py
def read_image(path: str) -> npt.NDArray[np.uint8]:\n \"\"\"Reads and loads the image from a file path as a RGB.\n\n Args:\n path: The path of the image file.\n\n Returns:\n The RGB image as a numpy array.\n\n Raises:\n FileExistsError: If the path does not exist or it is unreachable.\n IOError: If the image could not be loaded.\n \"\"\"\n return read_image_as_array(path, cv2.IMREAD_COLOR)\n
"},{"location":"reference/vision/utils/io/#eva.vision.utils.io.image.read_image_as_array","title":"read_image_as_array
","text":"Reads and loads an image file as a numpy array.
Parameters:
Name Type Description Defaultpath
str
The path to the image file.
requiredflags
int
Specifies the way in which the image should be read.
IMREAD_UNCHANGED
Returns:
Type DescriptionNDArray[uint8]
The image as a numpy array.
Raises:
Type DescriptionFileExistsError
If the path does not exist or it is unreachable.
IOError
If the image could not be loaded.
Source code insrc/eva/vision/utils/io/image.py
def read_image_as_array(path: str, flags: int = cv2.IMREAD_UNCHANGED) -> npt.NDArray[np.uint8]:\n \"\"\"Reads and loads an image file as a numpy array.\n\n Args:\n path: The path to the image file.\n flags: Specifies the way in which the image should be read.\n\n Returns:\n The image as a numpy array.\n\n Raises:\n FileExistsError: If the path does not exist or it is unreachable.\n IOError: If the image could not be loaded.\n \"\"\"\n _utils.check_file(path)\n image = cv2.imread(path, flags=flags)\n if image is None:\n raise IOError(\n f\"Input '{path}' could not be loaded. \"\n \"Please verify that the path is a valid image file.\"\n )\n\n if image.ndim == 3:\n image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n\n if image.ndim == 2 and flags == cv2.IMREAD_COLOR:\n image = image[:, :, np.newaxis]\n\n return np.asarray(image).astype(np.uint8)\n
"},{"location":"reference/vision/utils/io/#eva.vision.utils.io.nifti","title":"eva.vision.utils.io.nifti
","text":"NIfTI I/O related functions.
"},{"location":"reference/vision/utils/io/#eva.vision.utils.io.nifti.read_nifti_slice","title":"read_nifti_slice
","text":"Reads and loads a NIfTI image from a file path as uint8
.
Parameters:
Name Type Description Defaultpath
str
The path to the NIfTI file.
requiredslice_index
int
The image slice index to return.
requireduse_storage_dtype
bool
Whether to cast the raw image array to the inferred type.
True
Returns:
Type DescriptionNDArray[Any]
The image as a numpy array (height, width, channels).
Raises:
Type DescriptionFileExistsError
If the path does not exist or it is unreachable.
ValueError
If the input channel is invalid for the image.
Source code insrc/eva/vision/utils/io/nifti.py
def read_nifti_slice(\n path: str, slice_index: int, *, use_storage_dtype: bool = True\n) -> npt.NDArray[Any]:\n \"\"\"Reads and loads a NIfTI image from a file path as `uint8`.\n\n Args:\n path: The path to the NIfTI file.\n slice_index: The image slice index to return.\n use_storage_dtype: Whether to cast the raw image\n array to the inferred type.\n\n Returns:\n The image as a numpy array (height, width, channels).\n\n Raises:\n FileExistsError: If the path does not exist or it is unreachable.\n ValueError: If the input channel is invalid for the image.\n \"\"\"\n _utils.check_file(path)\n image_data = nib.load(path) # type: ignore\n image_slice = image_data.slicer[:, :, slice_index : slice_index + 1] # type: ignore\n image_array = image_slice.get_fdata()\n if use_storage_dtype:\n image_array = image_array.astype(image_data.get_data_dtype()) # type: ignore\n return image_array\n
"},{"location":"reference/vision/utils/io/#eva.vision.utils.io.nifti.fetch_total_nifti_slices","title":"fetch_total_nifti_slices
","text":"Fetches the total slides of a NIfTI image file.
Parameters:
Name Type Description Defaultpath
str
The path to the NIfTI file.
requiredReturns:
Type Descriptionint
The number of the total available slides.
Raises:
Type DescriptionFileExistsError
If the path does not exist or it is unreachable.
ValueError
If the input channel is invalid for the image.
Source code insrc/eva/vision/utils/io/nifti.py
def fetch_total_nifti_slices(path: str) -> int:\n \"\"\"Fetches the total slides of a NIfTI image file.\n\n Args:\n path: The path to the NIfTI file.\n\n Returns:\n The number of the total available slides.\n\n Raises:\n FileExistsError: If the path does not exist or it is unreachable.\n ValueError: If the input channel is invalid for the image.\n \"\"\"\n _utils.check_file(path)\n image = nib.load(path) # type: ignore\n image_shape = image.header.get_data_shape() # type: ignore\n return image_shape[-1]\n
"},{"location":"user-guide/","title":"User Guide","text":"Here you can find everything you need to install, understand and interact with eva.
"},{"location":"user-guide/#getting-started","title":"Getting started","text":"Install eva on your machine and learn how to use eva.
"},{"location":"user-guide/#tutorials","title":"Tutorials","text":"To familiarize yourself with eva, try out some of our tutorials.
Get to know eva in more depth by studying our advanced user guides.
This document shows how to use eva's Model Wrapper API (eva.models.networks.wrappers
) to load different model formats from a series of sources such as PyTorch Hub, HuggingFace Model Hub and ONNX.
The eva framework is built on top of PyTorch Lightning and thus naturally supports loading PyTorch models. You just need to specify the class path of your model in the backbone section of the .yaml
config file.
backbone:\n class_path: path.to.your.ModelClass\n init_args:\n arg_1: ...\n arg_2: ...\n
Note that your ModelClass
should subclass torch.nn.Module
and implement the forward()
method to return embedding tensors of shape [embedding_dim]
.
To load models from PyTorch Hub or other torch model providers, the easiest way is to use the ModelFromFunction
wrapper class:
backbone:\n class_path: eva.models.networks.wrappers.ModelFromFunction\n init_args:\n path: torch.hub.load\n arguments:\n repo_or_dir: facebookresearch/dino:main\n model: dino_vits16\n pretrained: false\n checkpoint_path: path/to/your/checkpoint.torch\n
Note that if a checkpoint_path
is provided, ModelFromFunction
will automatically initialize the specified model using the provided weights from that checkpoint file.
Similar to the above example, we can easily load models using the common vision library timm
:
backbone:\n class_path: eva.models.networks.wrappers.ModelFromFunction\n init_args:\n path: timm.create_model\n arguments:\n model_name: resnet18\n pretrained: true\n
"},{"location":"user-guide/advanced/model_wrappers/#loading-models-from-huggingface-hub","title":"Loading models from HuggingFace Hub","text":"For loading models from HuggingFace Hub, eva provides a custom wrapper class HuggingFaceModel
which can be used as follows:
backbone:\n class_path: eva.models.networks.wrappers.HuggingFaceModel\n init_args:\n model_name_or_path: owkin/phikon\n tensor_transforms: \n class_path: eva.models.networks.transforms.ExtractCLSFeatures\n
In the above example, the forward pass implemented by the owkin/phikon
model returns an output tensor containing the hidden states of all input tokens. In order to extract the state corresponding to the CLS token only, we can specify a transformation via the tensor_transforms
argument which will be applied to the model output.
.onnx
model checkpoints can be loaded using the ONNXModel
wrapper class as follows:
class_path: eva.models.networks.wrappers.ONNXModel\ninit_args:\n path: path/to/model.onnx\n device: cuda\n
"},{"location":"user-guide/advanced/model_wrappers/#implementing-custom-model-wrappers","title":"Implementing custom model wrappers","text":"You can also implement your own model wrapper classes, in case your model format is not supported by the wrapper classes that eva already provides. To do so, you need to subclass eva.models.networks.wrappers.BaseModel
and implement the following abstract methods:
load_model
: Returns an instantiated model object & loads pre-trained model weights from a checkpoint if available. model_forward
: Implements the forward pass of the model and returns the output as a torch.Tensor
of shape [embedding_dim]
You can take the implementations of ModelFromFunction
, HuggingFaceModel
and ONNXModel
wrappers as a reference.
To produce the evaluation results presented here, you can run eva with the settings below.
Make sure to replace <task>
in the commands below with bach
, crc
, mhist
or patch_camelyon
.
Note that to run the commands below you will need to first download the data. BACH, CRC and PatchCamelyon provide automatic download by setting the argument download: true
in their respective config-files. In the case of MHIST you will need to download the data manually by following the instructions provided here.
Evaluating the backbone with randomly initialized weights serves as a baseline to compare the pretrained FMs to an FM that produces embeddings without any prior learning on image tasks. To evaluate, run:
PRETRAINED=false \\\nEMBEDDINGS_ROOT=\"./data/embeddings/dino_vits16_random\" \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
"},{"location":"user-guide/advanced/replicate_evaluations/#dino-vit-s16-imagenet","title":"DINO ViT-S16 (ImageNet)","text":"The next baseline model, uses a pretrained ViT-S16 backbone with ImageNet weights. To evaluate, run:
EMBEDDINGS_ROOT=\"./data/embeddings/dino_vits16_imagenet\" \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
"},{"location":"user-guide/advanced/replicate_evaluations/#dino-vit-b8-imagenet","title":"DINO ViT-B8 (ImageNet)","text":"To evaluate performance on the larger ViT-B8 backbone pretrained on ImageNet, run:
EMBEDDINGS_ROOT=\"./data/embeddings/dino_vitb8_imagenet\" \\\nDINO_BACKBONE=dino_vitb8 \\\nIN_FEATURES=768 \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
"},{"location":"user-guide/advanced/replicate_evaluations/#dinov2-vit-l14-imagenet","title":"DINOv2 ViT-L14 (ImageNet)","text":"To evaluate performance on Dino v2 ViT-L14 backbone pretrained on ImageNet, run:
PRETRAINED=true \\\nEMBEDDINGS_ROOT=\"./data/embeddings/dinov2_vitl14_kaiko\" \\\nREPO_OR_DIR=facebookresearch/dinov2:main \\\nDINO_BACKBONE=dinov2_vitl14_reg \\\nFORCE_RELOAD=true \\\nIN_FEATURES=1024 \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
"},{"location":"user-guide/advanced/replicate_evaluations/#lunit-dino-vit-s16-tcga","title":"Lunit - DINO ViT-S16 (TCGA)","text":"Lunit, released the weights for a DINO ViT-S16 backbone, pretrained on TCGA data on GitHub. To evaluate, run:
PRETRAINED=false \\\nEMBEDDINGS_ROOT=\"./data/embeddings/dino_vits16_lunit\" \\\nCHECKPOINT_PATH=\"https://github.com/lunit-io/benchmark-ssl-pathology/releases/download/pretrained-weights/dino_vit_small_patch16_ep200.torch\" \\\nNORMALIZE_MEAN=[0.70322989,0.53606487,0.66096631] \\\nNORMALIZE_STD=[0.21716536,0.26081574,0.20723464] \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
"},{"location":"user-guide/advanced/replicate_evaluations/#owkin-ibot-vit-b16-tcga","title":"Owkin - iBOT ViT-B16 (TCGA)","text":"Owkin released the weights for \"Phikon\", an FM trained with iBOT on TCGA data, via HuggingFace. To evaluate, run:
EMBEDDINGS_ROOT=\"./data/embeddings/dino_vitb16_owkin\" \\\neva predict_fit --config configs/vision/owkin/phikon/offline/<task>.yaml\n
Note: since eva provides the config files to evaluate tasks with the Phikon FM in \"configs/vision/owkin/phikon/offline\", it is not necessary to set the environment variables needed for the runs above.
"},{"location":"user-guide/advanced/replicate_evaluations/#uni-dinov2-vit-l16-mass-100k","title":"UNI - DINOv2 ViT-L16 (Mass-100k)","text":"The UNI FM, introduced in [1] is available on HuggingFace. Note that access needs to be requested.
Unlike the other FMs evaluated for our leaderboard, the UNI model uses the vision library timm
to load the model. To accomodate this, you will need to modify the config files (see also Model Wrappers).
Make a copy of the task-config you'd like to run, and replace the backbone
section with:
backbone:\n class_path: eva.models.ModelFromFunction\n init_args:\n path: timm.create_model\n arguments:\n model_name: vit_large_patch16_224\n patch_size: 16\n init_values: 1e-5\n num_classes: 0\n dynamic_img_size: true\n checkpoint_path: <path/to/pytorch_model.bin>\n
Now evaluate the model by running:
EMBEDDINGS_ROOT=\"./data/embeddings/dinov2_vitl16_uni\" \\\nIN_FEATURES=1024 \\\neva predict_fit --config path/to/<task>.yaml\n
"},{"location":"user-guide/advanced/replicate_evaluations/#kaikoai-dino-vit-s16-tcga","title":"kaiko.ai - DINO ViT-S16 (TCGA)","text":"To evaluate kaiko.ai's FM with DINO ViT-S16 backbone, pretrained on TCGA data on GitHub, run:
PRETRAINED=false \\\nEMBEDDINGS_ROOT=\"./data/embeddings/dino_vits16_kaiko\" \\\nCHECKPOINT_PATH=[TBD*] \\\nNORMALIZE_MEAN=[0.5,0.5,0.5] \\\nNORMALIZE_STD=[0.5,0.5,0.5] \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
* path to public checkpoint will be added when available.
"},{"location":"user-guide/advanced/replicate_evaluations/#kaikoai-dino-vit-s8-tcga","title":"kaiko.ai - DINO ViT-S8 (TCGA)","text":"To evaluate kaiko.ai's FM with DINO ViT-S8 backbone, pretrained on TCGA data on GitHub, run:
PRETRAINED=false \\\nEMBEDDINGS_ROOT=\"./data/embeddings/dino_vits8_kaiko\" \\\nDINO_BACKBONE=dino_vits8 \\\nCHECKPOINT_PATH=[TBD*] \\\nNORMALIZE_MEAN=[0.5,0.5,0.5] \\\nNORMALIZE_STD=[0.5,0.5,0.5] \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
* path to public checkpoint will be added when available.
"},{"location":"user-guide/advanced/replicate_evaluations/#kaikoai-dino-vit-b16-tcga","title":"kaiko.ai - DINO ViT-B16 (TCGA)","text":"To evaluate kaiko.ai's FM with the larger DINO ViT-B16 backbone, pretrained on TCGA data, run:
PRETRAINED=false \\\nEMBEDDINGS_ROOT=\"./data/embeddings/dino_vitb16_kaiko\" \\\nDINO_BACKBONE=dino_vitb16 \\\nCHECKPOINT_PATH=[TBD*] \\\nIN_FEATURES=768 \\\nNORMALIZE_MEAN=[0.5,0.5,0.5] \\\nNORMALIZE_STD=[0.5,0.5,0.5] \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
* path to public checkpoint will be added when available.
"},{"location":"user-guide/advanced/replicate_evaluations/#kaikoai-dino-vit-b8-tcga","title":"kaiko.ai - DINO ViT-B8 (TCGA)","text":"To evaluate kaiko.ai's FM with the larger DINO ViT-B8 backbone, pretrained on TCGA data, run:
PRETRAINED=false \\\nEMBEDDINGS_ROOT=\"./data/embeddings/dino_vitb8_kaiko\" \\\nDINO_BACKBONE=dino_vitb8 \\\nCHECKPOINT_PATH=[TBD*] \\\nIN_FEATURES=768 \\\nNORMALIZE_MEAN=[0.5,0.5,0.5] \\\nNORMALIZE_STD=[0.5,0.5,0.5] \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
* path to public checkpoint will be added when available.
"},{"location":"user-guide/advanced/replicate_evaluations/#kaikoai-dinov2-vit-l14-tcga","title":"kaiko.ai - DINOv2 ViT-L14 (TCGA)","text":"To evaluate kaiko.ai's FM with the larger DINOv2 ViT-L14 backbone, pretrained on TCGA data, run:
PRETRAINED=false \\\nEMBEDDINGS_ROOT=\"./data/embeddings/dinov2_vitl14_kaiko\" \\\nREPO_OR_DIR=facebookresearch/dinov2:main \\\nDINO_BACKBONE=dinov2_vitl14_reg \\\nFORCE_RELOAD=true \\\nCHECKPOINT_PATH=[TBD*] \\\nIN_FEATURES=1024 \\\nNORMALIZE_MEAN=[0.5,0.5,0.5] \\\nNORMALIZE_STD=[0.5,0.5,0.5] \\\neva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml\n
* path to public checkpoint will be added when available.
"},{"location":"user-guide/advanced/replicate_evaluations/#references","title":"References","text":"[1]: Chen: A General-Purpose Self-Supervised Model for Computational Pathology, 2023 (arxiv)
"},{"location":"user-guide/getting-started/how_to_use/","title":"How to use eva","text":"Before starting to use eva, it's important to get familiar with the different workflows, subcommands and configurations.
"},{"location":"user-guide/getting-started/how_to_use/#eva-subcommands","title":"eva subcommands","text":"To run an evaluation, we call:
eva <subcommand> --config <path-to-config-file>\n
The eva interface supports the subcommands: predict
, fit
and predict_fit
.
fit
: is used to train a decoder for a specific task and subsequently evaluate the performance. This can be done online or offline *predict
: is used to compute embeddings for input images with a provided FM-checkpoint. This is the first step of the offline workflowpredict_fit
: runs predict
and fit
sequentially. Like the fit
-online run, it runs a complete evaluation with images as input.We distinguish between the online and offline workflow:
The online workflow can be used to quickly run a complete evaluation without saving and tracking embeddings. The offline workflow runs faster (only one FM-backbone forward pass) and is ideal to experiment with different decoders on the same FM-backbone.
"},{"location":"user-guide/getting-started/how_to_use/#run-configurations","title":"Run configurations","text":""},{"location":"user-guide/getting-started/how_to_use/#config-files","title":"Config files","text":"The setup for an eva run is provided in a .yaml
config file which is defined with the --config
flag.
A config file specifies the setup for the trainer (including callback for the model backbone), the model (setup of the trainable decoder) and data module.
The config files for the datasets and models that eva supports out of the box, you can find on GitHub. We recommend that you inspect some of them to get a better understanding of their structure and content.
"},{"location":"user-guide/getting-started/how_to_use/#environment-variables","title":"Environment variables","text":"To customize runs, without the need of creating custom config-files, you can overwrite the config-parameters listed below by setting them as environment variables.
Type DescriptionOUTPUT_ROOT
str The directory to store logging outputs and evaluation results EMBEDDINGS_ROOT
str The directory to store the computed embeddings CHECKPOINT_PATH
str Path to the FM-checkpoint to be evaluated IN_FEATURES
int The input feature dimension (embedding) NUM_CLASSES
int Number of classes for classification tasks N_RUNS
int Number of fit
runs to perform in a session, defaults to 5 MAX_STEPS
int Maximum number of training steps (if early stopping is not triggered) BATCH_SIZE
int Batch size for a training step PREDICT_BATCH_SIZE
int Batch size for a predict step LR_VALUE
float Learning rate for training the decoder MONITOR_METRIC
str The metric to monitor for early stopping and final model checkpoint loading MONITOR_METRIC_MODE
str \"min\" or \"max\", depending on the MONITOR_METRIC
used REPO_OR_DIR
str GitHub repo with format containing model implementation, e.g. \"facebookresearch/dino:main\" DINO_BACKBONE
str Backbone model architecture if a facebookresearch/dino FM is evaluated FORCE_RELOAD
bool Whether to force a fresh download of the github repo unconditionally PRETRAINED
bool Whether to load FM-backbone weights from a pretrained model"},{"location":"user-guide/getting-started/installation/","title":"Installation","text":"Create and activate a virtual environment with Python 3.10+
Install eva and the eva-vision package with:
pip install \"kaiko-eva[vision]\"\n
"},{"location":"user-guide/getting-started/installation/#run-eva","title":"Run eva","text":"Now you are all set and you can start running eva with:
eva <subcommand> --config <path-to-config-file>\n
To learn how the subcommands and configs work, we recommend you familiarize yourself with How to use eva and then proceed to running eva with the Tutorials."},{"location":"user-guide/tutorials/evaluate_resnet/","title":"Train and evaluate a ResNet","text":"If you read How to use eva and followed the Tutorials to this point, you might ask yourself why you would not always use the offline workflow to run a complete evaluation. An offline-run stores the computed embeddings and runs faster than the online-workflow which computes a backbone-forward pass in every epoch.
One use case for the online-workflow is the evaluation of a supervised ML model that does not rely on a backbone/head architecture. To demonstrate this, let's train a ResNet 18 from PyTorch Image Models (timm).
To do this we need to create a new config-file:
configs/vision/resnet18
configs/vision/dino_vit/online/bach.yaml
and move it to the new folder.Now let's adapt the new bach.yaml
-config to the new model:
backbone
-key from the config. If no backbone is specified, the backbone will be skipped during inference. head:\n class_path: eva.models.ModelFromFunction\n init_args:\n path: timm.create_model\n arguments:\n model_name: resnet18\n num_classes: &NUM_CLASSES 4\n drop_rate: 0.0\n pretrained: false\n
To reduce training time, let's overwrite some of the default parameters. Run the training & evaluation with: OUTPUT_ROOT=logs/resnet/bach \\\nMAX_STEPS=50 \\\nLR_VALUE=0.01 \\\neva fit --config configs/vision/resnet18/bach.yaml\n
Once the run is complete, take a look at the results in logs/resnet/bach/<session-id>/results.json
and check out the tensorboard with tensorboard --logdir logs/resnet/bach
. How does the performance compare to the results observed in the previous tutorials?"},{"location":"user-guide/tutorials/offline_vs_online/","title":"Offline vs. online evaluations","text":"In this tutorial we run eva with the three subcommands predict
, fit
and predict_fit
, and take a look at the difference between offline and online workflows.
If you haven't downloaded the config files yet, please download them from GitHub.
For this tutorial we use the BACH classification task which is available on Zenodo and is distributed under Attribution-NonCommercial-ShareAlike 4.0 International license.
To let eva automatically handle the dataset download, you can open configs/vision/dino_vit/offline/bach.yaml
and set download: true
. Before doing so, please make sure that your use case is compliant with the dataset license.
First, let's use the predict
-command to download the data and compute embeddings. In this example we use a randomly initialized dino_vits16
as backbone.
Open a terminal in the folder where you installed eva and run:
PRETRAINED=false \\\nEMBEDDINGS_ROOT=./data/embeddings/dino_vits16_random \\\neva predict --config configs/vision/dino_vit/offline/bach.yaml\n
Executing this command will:
./data/bach
(if it has not already been downloaded to this location). This will take a few minutes.EMBEDDINGS_ROOT
along with a manifest.csv
file.Once the session is complete, verify that:
./data/bach/ICIAR2018_BACH_Challenge
./data/embeddings/dino_vits16_random/bach
manifest.csv
file that maps the filename to the embedding, target and split has been created in ./data/embeddings/dino_vits16/bach
.Now we can use the fit
-command to evaluate the FM on the precomputed embeddings.
To ensure a quick run for the purpose of this exercise, we overwrite some of the default parameters. Run eva to fit the decoder classifier with:
N_RUNS=2 \\\nMAX_STEPS=20 \\\nLR_VALUE=0.1 \\\neva fit --config configs/vision/dino_vit/offline/bach.yaml\n
Executing this command will:
Once the session is complete:
logs/dino_vits16/offline/bach/<session-id>/results.json
. (The <session-id>
consists of a timestamp and a hash that is based on the run configuration.)tensorboard --logdir logs/dino_vits16/offline/bach\n
With the predict_fit
-command, the two steps above can be executed with one command. Let's do this, but this time let's use an FM pretrained from ImageNet.
Go back to the terminal and execute:
N_RUNS=1 \\\nMAX_STEPS=20 \\\nLR_VALUE=0.1 \\\nPRETRAINED=true \\\nEMBEDDINGS_ROOT=./data/embeddings/dino_vits16_pretrained \\\neva predict_fit --config configs/vision/dino_vit/offline/bach.yaml\n
Once the session is complete, inspect the evaluation results as you did in Step 2. Compare the performance metrics and training curves. Can you observe better performance with the ImageNet pretrained encoder?
"},{"location":"user-guide/tutorials/offline_vs_online/#online-evaluations","title":"Online evaluations","text":"Alternatively to the offline workflow from Step 3, a complete evaluation can also be computed online. In this case we don't save and track embeddings and instead fit the ML model (encoder with frozen layers + trainable decoder) directly on the given task.
As in Step 3 above, we again use a dino_vits16
pretrained from ImageNet.
Run a complete online workflow with the following command:
N_RUNS=1 \\\nMAX_STEPS=20 \\\nLR_VALUE=0.1 \\\nPRETRAINED=true \\\neva fit --config configs/vision/dino_vit/online/bach.yaml\n
Executing this command will:
Once the run is complete:
logs/dino_vits16/offline/bach/<session-id>/results.json
and compare them to the results of Step 3. Do they match?The setup for an eva run is provided in a .yaml
config file which is defined with the --config
flag.
A config file specifies the setup for the trainer (including callback for the model backbone), the model (setup of the trainable decoder) and data module.
-The config files for the datasets and models that eva supports out of the box, you can find on GitHub (scroll to the bottom of the page). We recommend that you inspect some of them to get a better understanding of their structure and content.
+The config files for the datasets and models that eva supports out of the box, you can find on GitHub. We recommend that you inspect some of them to get a better understanding of their structure and content.
To customize runs, without the need of creating custom config-files, you can overwrite the config-parameters listed below by setting them as environment variables.