-
Notifications
You must be signed in to change notification settings - Fork 5
Project 3
This project aims to create a Multi-Omics Factor Analysis (MOFA) by combining molecular data with the results of a self-supervised deep learning process applied to whole slide images (WSIs). The objective is to uncover potential associations between the molecular and morphological profiles of lung neuroendocrine neoplasms (LNENs).
The script Scripts/Script_ExpMethAltCNV_lungNENomicsCombined.R
(up to line 111) provides an example of how data must be formatted to create input matrices for MOFA. This script generates the R object Data/MOFA_molecular/MOFAobjectB.RData
, which contains the formatted matrices from multiple omics data, including RNA sequencing, copy number variants, methylation, and mutation data. If you want to explore more precisely how MOFAobjectB was constructed, all raw molecular data are located in the directory Data/Molecular
.
Use MOFAobjectB
in Data/MOFA_molecular/MOFAobjectB.RData
as input for MOFA analysis using the script Scripts/MOFA_script.R
. The slurm_script/demo_MOFA.sh
provides an example of how MOFA_script.R
can be executed using a SLURM cluster.
The MOFA_script.R
script generates the following outputs:
Data/MOFA_molecular/MOFAobject_trained.hdf5
Data/MOFA_molecular/MOFAobject.trained.RData
For exploratory analysis of these outputs, refer to Scripts/Script_ExpMethAltCNV_lungNENomicsCombined.R
(after line 111). Additional tutorials for MOFA analysis can be found here:
WSIs are large microscopic images of tumours. To process them using deep learning and GPUs, WSIs are divided into smaller patches, referred to as tiles. These tiles are processed using a self-supervised deep learning method called Barlow Twins. This process generates encoded vectors for each tile, which are then clustered to derive morphological partitions.
The file /Data/WSI_DL_outputs/LeidenComK_75_res3_r1_repartition_by_samples_filtered_projectMG.csv
contains the proportion of tiles in each morphological partition for each sample. These distributions are interpreted as the morphological profiles of the patients, summarising information extracted from approximately 10,000 tiles per WSI.
Create a MOFA input object that combines the molecular and morphological features. To achieve this:
- Match the identifiers of each WSI (column
WSI_id
in/Data/WSI_DL_outputs/LeidenComK_75_res3_r1_repartition_by_samples_filtered_projectMG.csv
) with thesample_id
of the molecular MOFA. Use the tableData/TechnicalData/key_clinical_data_patients_with_WSI.csv
. - Adapt
Scripts/Script_ExpMethAltCNV_lungNENomicsCombined.R
to create a new MOFA object.
Adapt and reuse the following scripts to run MOFA on the combined molecular and morphological features:
Scripts/MOFA_script.R
slurm_script/demo_MOFA.sh
Use the output objects from the MOFA analysis to explore the resulting latent space:
- Check associations between latent factors and data types (see examples in
Scripts/Script_ExpMethAltCNV_lungNENomicsCombined.R
, after line 111). - Explore associations between the latent space and key clinical variables. Clinical data can be found in:
-
Data/TechnicalData/key_clinical_data_patients_with_WSI.csv
(restricted to the 192 patients with both molecular and morphological features). -
Data/TechnicalData/key_clinical_data_patients_all_patients.csv
(all patients).
-
Here is a list of the most important variables:
-
archtype_label_combined
inData/TechnicalData/key_clinical_data_patients_with_WSI.csv
, corresponding to the molecular groups. InData/TechnicalData/key_clinical_data_patients_all_patients.csv
, this is namedarchetype_k4_LF3_label
. -
consensus_pathology
inData/TechnicalData/key_clinical_data_patients_with_WSI.csv
, representing the histological type. InData/TechnicalData/key_clinical_data_patients_all_patients.csv
, it is namedtype
. -
age_cat
inData/TechnicalData/key_clinical_data_patients_all_patients.csv
(age category) orage_corrected
inData/TechnicalData/key_clinical_data_patients_with_WSI.csv
. -
sex
(male or female). -
localisation_corrected
inData/TechnicalData/key_clinical_data_patients_with_WSI.csv
(referred to aslocation
inData/TechnicalData/key_clinical_data_patients_all_patients.csv
), indicating the tumour's position relative to the trachea.
Feel free to explore additional variables based on your analytical goals.
-
319 patients have at least one type of omics data. The molecular MOFA includes these 319 patients. Their sample IDs can be found in:
-
Data/MOFA_molecular/MOFAobjectB.RData
(MOFAobjectB@samples_metadata
) -
Data/MOFA_molecular/MOFAobject.trained.RData
(MOFAobject.trained@samples_metadata
)
-
-
For technical information about these samples, use the file
Data/TechnicalData/combined_public_lungNENomics_technical_data.RData
(columnsample_id
). -
192 patients are associated with a WSI. Each WSI has a unique identifier in the column
WSI_id
of/Data/WSI_DL_outputs/LeidenComK_75_res3_r1_repartition_by_samples_filtered_projectMG.csv
. -
The correspondence between
sample_id
(molecular data) andWSI_id
is inData/TechnicalData/key_clinical_data_patients_with_WSI.csv
. -
Clinical data are available in:
Data/TechnicalData/key_clinical_data_patients_with_WSI.csv
Data/TechnicalData/key_clinical_data_patients_all_patients.csv
-
MOFA: Argelaguet, Ricard, Velten, Britta, Arnol, Damien, et al. Multi-Omics Factor Analysis—a framework for unsupervised integration of multi-omics data sets. Molecular Systems Biology, 2018, vol. 14, no 6, p. e8124.
-
Deep Learning Pipeline: https://www.nature.com/articles/s41467-024-48666-7
If you encounter any issues, please contact:
[email protected] (Nicolas Alcala)