This repository contains the modules and Jupyter Notebooks that were used to perform the morphological analyses and generate the bases of the respective plots for
Morphological Subprofile Analysis for Bioactivity Annotation of Small Molecules
Axel Pahl, Beate Schölermann, Sonja Sievers, Herbert Waldmann and Slava Ziegler
It also contains the full processed Cell Painting data set used in this analysis (input/ds_refs.tsv
).
Update 14-Nov-2023: Added data for the MitoStress cluster (Preprint on bioRxiv and submitted to Cell Chem. Biol.)
- added 16 measurements for the MitoStress cluster to
cluster_cpds.tsv
- added 2 reference cpds. for MitoStress that were not part of the original dataset
ds_refs.tsv
- added the MitoStress cluster median subprofile
- re-created several plots, now including the MitoStress cluster
input/
: input files used in this analysis-
cluster_cpds.tsv
: list of cluster-defining measurements -
ds_refs.tsv
: 3560 processed Cell Painting profiles at different concentrations (3547 reference compound measurements (1883 different reference compounds) and 13 internal compound measurements (3 compounds), used in figures 3-5) for the data set of active and non-toxic measurements used in this analysis.Description of columns:
Well_Id
[string]: the unique identifier of a measurement in the data set. It has the following composition:<Compound_Id>:<Batch_No>:<Container_No>_<Conc [µM]>
Compound_Id
[int]: compound identifierConc_uM
[float]: concentration [µM] of the given measurementIs_Ref
[bool]: whether the compound is a reference compound or an internal compound [boolean]Induction
[float]: the number of significantly changed morphological features in the Cell Painting profile, compared to controls, expressed in percentRel_Cell_Count
[int]: cell count of the measurement, relative to DMSO controls, expressed in percent; a value below 50% is considered toxic. These measurements have been excluded from the present analysis.Chiral
[bool]: chiral flag of the structureSmiles
[string]: compound structureTrivial_Name
[string]: the trivial name of a reference compoundKnown_Act
[string]: the annotated known activity of a reference compound. These were sourced from the different vendors.Median_*
[float]: the Z-score values for the 579 features of the processed Cell Painting profiles. A value around 0 indicates no significant change compared to DMSO controls.
-
jupy_tools/
: helper modulescpa.py
: module for analysing the results from the morphological Cell Painting assayutils.py
: Notebook helper utilities
cluster_subprofiles/
: code for calculating / generating...01_calc_cluster_subprofiles.ipynb
: the individual cluster subprofiles02_add_cluster_sims_to_dataset.ipynb
: the similarities of the measurements to the individual clusters and the cross-similarity pair plot (figure S2)03_cluster_cross_correlation_matrix.ipynb
: the cluster correlation matrix (figure 2C)04_cluster_biosim_matrix.ipynb
: the cluster biosimilarity matrices and profile heat maps for figures 3-505_hc_dna_synth.ipynb
: the hierarchical clustering example in figure 6
umap/01_umap.ipynb
: code for generating the UMAP plot (figure 2C)tubulin_cluster/01_different_defining_cpds.ipynb
: code for generating figure 1output/
: output generated by the Notebooks of this repositoryparms_*_0.85.tsv
: the features of the individual clustersmed_prof_*.tsv
: the median profiles of the individual clustersds_refs_sim_to_clusters_*
: Data set with added biological cluster similaritites (multiple formats)
*/plots/
: different sub-folders containing the plots generated by the notebooks in the respective folder
Python3, Matplotlib and Seaborn were used for the visualizations. See the environment.yml
file for the respective versions of all used libraries. Please note, that the plots generated by the code of this repository were in most cases edited for the manuscript versions to adapt sizes, legends, labels or titles, but not the data itself.
The installation has only been tested on Linux (Ubuntu 22.04).
-
Clone this repo and change into the directory
$ git clone https://github.com/mpimp-comas/2022_pahl_ziegler_subprofiles $ cd 2022_pahl_ziegler_subprofiles
-
Create a new conda environment, install the dependencies and activate the environment:
$ conda env create -f environment.yml $ conda activate subprof # if the name from the environment file was used
-
Link the folder
jupy_tools
to one of Python's import paths (e.g. one of the folders that is printed bypython3 -c "import sys; print(sys.path)"
)# e.g. $ ln -s <path-to-this-repo>/jupy_tools <path-to-conda-env>/lib/python3.9/site-packages/ # concrete example: $ ln -s $HOME/dev/github/2022_pahl_ziegler_subprofiles/jupy_tools $HOME/anaconda3/envs/subprof/lib/python3.9/site-packages/
(I actually prefer this to using setuptools, because a simple git pull
will get the newest version).