Data Analysis Pipeline

An R pipeline for technical and biological data assessment.

About the Project

Overall Project Description

Depression is a pressing global issue with a prevalence of 3.8 % worldwide, leading to severe social and economic consequences. Due to the complexity of the disease, research into prevention, diagnosis, and treatment options faces significant challenges. Understanding the intricate pathways by which depression develops and progresses is critical to expanding our knowledge base and developing effective intervention strategies. One area of interest is the possible role of hormonal contraception as a risk factor. While some studies suggest a link between hormonal contraceptives and depression, others find no association, leading to controversy in the medical literature.

Our project takes a biochemical approach, focusing on the effects of steroid hormones on neuronal cells in vitro to address potential biases in previous research. By treating neural progenitor cells with various hormonal contraceptives and analyzing resultant protein changes using quantitative proteomics, we aim to uncover insights into the pharmacological effects of steroids on the neuronal proteome.

We hope to provide a significant contribution to clarifying the possible influence of steroids on the development of depression. This knowledge is of utmost importance due to the widespread and long-term use of oral contraceptives by healthy people and could lead to better risk-benefit assessments.

(back to top)

About the Present Experiment

The following descriptions are only intended to give an overall impression of what this code is used for. For further information, please refer to the published article. (Note: The article will be submitted shortly. This file will be updated with a link to the article once it is published.)

(back to top)

Cell culture

In this experiment, the neural progenitor cell line ReNcell VM was cultured in a 2D cell culture and treated with various substances: ethynyl estradiol, levonorgestrel, the combination of ethynyl estradiol and levonorgestrel commonly found in oral contraceptive pills, and S-23, a drug candidate for the male contraceptive pill. The drugs were dissolved in DMSO and added to the cell culture medium at a final concentration of 100 ng/mL. The final DMSO concentration in the cell culture medium was 100 ppm.

One batch of cells was incubated with cell culture medium with added 100 ppm DMSO and another batch was incubated with only cell culture medium. These cell cultures served as controls in the data analysis.

The cell culture medium was exchanged every other day, and the cells were split at approx 90 % confluency. After 14 days, the cells were harvested, split into aliquots of $1 \times 10^6$ cells, and stored at -80 °C.

(back to top)

Sample Preparation

One aliquot of each sample was processed for label-based quantification (LBQ) using isobaric mass tags (TMTpro 16plex), whereas another aliquot of each sample was processed for label-free quantification (LFQ).

In either case, the proteins were digested, reduced, and alkylated for bottom-up proteomics.

In the case of label-based quantification, the pooled sample was pre-fractionated at high pH into 8 fractions.

(back to top)

nano-LC-MS Analysis

For LC-MS data analysis, the samples were separated using nano-flow liquid chromatography and injected into a quadrupole-Orbitrap mass spectrometer operated in positive mode with a Top-15 data-dependent acquisition (DDA) method.

The LBQ samples were analyzed using a Thermo Scientific Q Exactive mass spectrometer at the Institute of Biochemistry at the German Sport University Cologne, Cologne, Germany. The LFQ samples were analyzed using a Thermo Scientific Q Exactive HF mass spectrometer at the Leibnitz-Institut für Analytische Wissenschaften—ISAS—e.V., Dortmund, Germany.

(back to top)

Data Analysis

The mass spectrometer's raw data was processed with Thermo Scientific Proteome Discoverer 3.0 (LBQ) and 3.1 (LFQ), using the Sequest HT and CHIMERYS search engines. The "Proteins", "Peptide Groups", "PSMs" and "MS/MS Spectrum Info" tables were exported as text files and further analyzed and visualized by the analysis pipeline in this repository. The data analysis involved multiple stages:

A technical evaluation was conducted for the LFQ and LBQ data to assess their quality.
The regulated proteins were biologically evaluated, including the search for gene-to-disease associations and enrichment analyses for GO (Gene Ontology) terms and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways.
The regulated proteins from all data sets were combined to obtain a more comprehensive view, and this combined data set was also subjected to biological analysis.

(back to top)

Run the code

General Remarks on the Repository

The code is organized with object-oriented programming principles in mind. In the classes/ directory, you will find separate R files with S4 classes for a specific assessment, e.g., the assessment of the AGC's fill percentage (agc_fill.R). To keep the code of these classes readable, the classes do not contain any code that manipulates data but only call functions responsible for those tasks. These functions can be found in the utils/ directory with the _helpers suffix, e.g., agc_fill_helpers.R.

Finally, the notebooks/ directory contains three Jupyter Notebooks that combine all the necessary data analysis and visualization steps. The resulting files and images are saved in the results/ directory.

(back to top)

Execute the Code Yourself

To execute the pipeline yourself, follow these steps:

Setup: Ensure you have installed JupyterLab with an R kernel. For simplicity and to avoid compatibility issues, you can use my Docker container.

Prepare the Code: Download the code from this repository. Rename the folder (e.g., to thilmany-etal) and move it into the notebooks/ directory of your JupyterLab Docker environment.

Download Data: Obtain the raw data from ProteomeXchange and place it into the appropriate folders within the data/ directory. Ensure the file names match those specified in utils/data.R to avoid data loading errors.

DISGENET API: The gene-to-disease association analysis uses the DISGENET gene-disease association network. Subscribe to a plan that provides access to the REST API and the R package (we used the Academic license). Copy your API key into the .Renviron.example file and rename it to .Renviron.

Run the Pipeline: Start the Docker container and access JupyterLab. Navigate to notebooks/thilmany-etal/notebooks/, open a Jupyter Notebook (e.g., LBQ-analysis.ipynb), and select "Restart Kernel and Run All Cells..." from the "Run" menu.

Depending on Docker's resource allocation, the process may take several minutes.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
classes		classes
data		data
notebooks		notebooks
utils		utils
.Renviron.example		.Renviron.example
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Analysis Pipeline

An R pipeline for technical and biological data assessment.

Table of Contents

About the Project

Overall Project Description

About the Present Experiment

Cell culture

Sample Preparation

nano-LC-MS Analysis

Data Analysis

Run the code

General Remarks on the Repository

Execute the Code Yourself

About

Releases

Languages

License

SamThilmany/ILLUMINE-202401_Analysis-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Data Analysis Pipeline

An R pipeline for technical and biological data assessment.

Table of Contents

About the Project

Overall Project Description

About the Present Experiment

Cell culture

Sample Preparation

nano-LC-MS Analysis

Data Analysis

Run the code

General Remarks on the Repository

Execute the Code Yourself

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages