Skip to content

mpimp-comas/2022_pahl_ziegler_subprofiles

Repository files navigation

2022_pahl_ziegler_subprofiles

This repository contains the modules and Jupyter Notebooks that were used to perform the morphological analyses and generate the bases of the respective plots for

Morphological Subprofile Analysis for Bioactivity Annotation of Small Molecules
Axel Pahl, Beate Schölermann, Sonja Sievers, Herbert Waldmann and Slava Ziegler

It also contains the full processed Cell Painting data set used in this analysis (input/ds_refs.tsv).

Update 14-Nov-2023: Added data for the MitoStress cluster (Preprint on bioRxiv and submitted to Cell Chem. Biol.)

  • added 16 measurements for the MitoStress cluster to cluster_cpds.tsv
  • added 2 reference cpds. for MitoStress that were not part of the original dataset ds_refs.tsv
  • added the MitoStress cluster median subprofile
  • re-created several plots, now including the MitoStress cluster

Repository Content (Excerpt)

  • input/: input files used in this analysis
    • cluster_cpds.tsv: list of cluster-defining measurements

    • ds_refs.tsv: 3560 processed Cell Painting profiles at different concentrations (3547 reference compound measurements (1883 different reference compounds) and 13 internal compound measurements (3 compounds), used in figures 3-5) for the data set of active and non-toxic measurements used in this analysis.

      Description of columns:
      • Well_Id [string]: the unique identifier of a measurement in the data set. It has the following composition: <Compound_Id>:<Batch_No>:<Container_No>_<Conc [µM]>
      • Compound_Id [int]: compound identifier
      • Conc_uM [float]: concentration [µM] of the given measurement
      • Is_Ref [bool]: whether the compound is a reference compound or an internal compound [boolean]
      • Induction [float]: the number of significantly changed morphological features in the Cell Painting profile, compared to controls, expressed in percent
      • Rel_Cell_Count [int]: cell count of the measurement, relative to DMSO controls, expressed in percent; a value below 50% is considered toxic. These measurements have been excluded from the present analysis.
      • Chiral [bool]: chiral flag of the structure
      • Smiles [string]: compound structure
      • Trivial_Name [string]: the trivial name of a reference compound
      • Known_Act [string]: the annotated known activity of a reference compound. These were sourced from the different vendors.
      • Median_* [float]: the Z-score values for the 579 features of the processed Cell Painting profiles. A value around 0 indicates no significant change compared to DMSO controls.
  • jupy_tools/: helper modules
    • cpa.py: module for analysing the results from the morphological Cell Painting assay
    • utils.py: Notebook helper utilities
  • cluster_subprofiles/: code for calculating / generating...
    • 01_calc_cluster_subprofiles.ipynb: the individual cluster subprofiles
    • 02_add_cluster_sims_to_dataset.ipynb: the similarities of the measurements to the individual clusters and the cross-similarity pair plot (figure S2)
    • 03_cluster_cross_correlation_matrix.ipynb: the cluster correlation matrix (figure 2C)
    • 04_cluster_biosim_matrix.ipynb: the cluster biosimilarity matrices and profile heat maps for figures 3-5
    • 05_hc_dna_synth.ipynb: the hierarchical clustering example in figure 6
  • umap/01_umap.ipynb: code for generating the UMAP plot (figure 2C)
  • tubulin_cluster/01_different_defining_cpds.ipynb: code for generating figure 1
  • output/: output generated by the Notebooks of this repository
    • parms_*_0.85.tsv: the features of the individual clusters
    • med_prof_*.tsv: the median profiles of the individual clusters
    • ds_refs_sim_to_clusters_*: Data set with added biological cluster similaritites (multiple formats)
  • */plots/: different sub-folders containing the plots generated by the notebooks in the respective folder

Python3, Matplotlib and Seaborn were used for the visualizations. See the environment.yml file for the respective versions of all used libraries. Please note, that the plots generated by the code of this repository were in most cases edited for the manuscript versions to adapt sizes, legends, labels or titles, but not the data itself.

Installation

The installation has only been tested on Linux (Ubuntu 22.04).

  1. Clone this repo and change into the directory

    $ git clone https://github.com/mpimp-comas/2022_pahl_ziegler_subprofiles
    $ cd 2022_pahl_ziegler_subprofiles
    
  2. Create a new conda environment, install the dependencies and activate the environment:

    $ conda env create -f environment.yml
    $ conda activate subprof  # if the name from the environment file was used
    
  3. Link the folder jupy_tools to one of Python's import paths (e.g. one of the folders that is printed by python3 -c "import sys; print(sys.path)")

    # e.g.
    $ ln -s <path-to-this-repo>/jupy_tools <path-to-conda-env>/lib/python3.9/site-packages/
    
    # concrete example:
    $ ln -s $HOME/dev/github/2022_pahl_ziegler_subprofiles/jupy_tools $HOME/anaconda3/envs/subprof/lib/python3.9/site-packages/
    
    

(I actually prefer this to using setuptools, because a simple git pull will get the newest version).