Project 4

Project 4: Predict molecular alterations from medical image data

Background

The goal of the project is to predict clinically relevant molecular alterations from histopathological whole-slide images (WSI) in malignant pleural mesothelioma (MPM).

The MESOMICS paper published by our team has described several molecular features, which together with previously know morphological features pave the way toward a morph-molecular classification of MPM. Being able to identify these features from WSI would accelerate the translation of these findings into the clinical practice.

In our study, we have validated our findings using two additional cohorts: the "Bueno" cohort and the TCGA cohort.

Data

The integration of the molecular data from the three sources is described in detail in our data note, and the corresponding GitHub page. In particular, the harmonised supplementary table S2 containing clinical, epidemiologic, morphologic, and molecular data will be very useful.
Diagnostic whole-slide images (H&E) files are only available for TCGA cases in the TCGA data portal in svs (Aperio) format.

Requirements

Python/bash scripting.
Understanding of histopathology deep-learning concepts covered in the lecture.
Access to a GPU (optional but could be interesting).

Steps

Get familiar with the problem (read papers).
Identify the labels to predict (eg loss of BAP1, acinar morphology).
Download and extract the data needed.
Embed WSI or download the already embedded vectors using a pathology foundation model.
Train a model to predict the labels from 2.
Evaluate the performance of the algorithm.

Expected difficulties

Setting up all dependencies and references for running the software.
Coding.
Limited sample size in the TCGA cohort, cross-validation will be needed (eg leave-one-out, k-fold).
WSI encoding could require CPU or GPU time.

Resources

For step 4, you will need to choose a pathology foundation model. Fine tuning a model for each task could be difficult and time consuming. A simple version is to get WSI embedding and use them for prediction in 5. We suggest you to have a quick look at Giga-SSL paper and associated GitHub repository. Luckily the entire TCGA cohort has been already encoded as described in this paper and can be dowloaded here. Encoding additional images using the same approach is also quite simple and described in the associated GitHub repository.

[email protected] (Nicolas Alcala)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly