Repository for scripts used in Gene- and transcript-level analyses reveal sex-specific trancriptional alterations in prefrontal cortex in Major Depressive Disorder.
We use Miniconda (or Anaconda) for environment management (See environment.yml
file). To create the environment:
conda env create -f environment.yml
We use R version 4.1.2 and Bioconductor version 3.14.
All R packages are installed through conda.
Clone this repository to your local machine and open the project through mdd.Rproj
.
This project is organized in the following directories:
-
data/
: Holds the RNA-seq processed data, study metadata, and genome references;genome/
: auxiliary files for kallisto quantification.Homo_sapiens.GRCh38.97.gtf.gz
: GTF file from human gencode/Ensembl version 97.Homo_sapiens.GRCh38.cdna.all.fa.gz
: Human transcriptome gencode/Ensembl version 97.
kallisto/
: output from kallisto quantification mode (abundance.tsv
). Quantification files are separated by brain region.aINS/
:Cg25/
:dlPFC/
:OFC/
:Nac/
:Sub/
:
meta/
: metadata for BioProject SRP115956.SraRunTable.txt
: study metadata;Metadata.csv
: Information on individual samples. This information was previously available here: http://neuroscience.mssm.edu/nestler/contecenter/listchromatingenedatabase.html.
vcfs
: VCF files filtered by the variants showed in Table 1 from the paper.
-
scripts/
: Holds all the scripts used in the analyses; -
results/
: Holds results from each step analysis.
- Quantification
Script for download, processing and quantification by kallisto;
-
Exploratory analysis:
- Metadata:
metadata
: Organizes metadata for further steps.
- Transcript and gene estimates:
tx_gene
: Usestximport
to summarise gene counts and prepare data for further steps.tx_tx
: Usestximport
to prepare data to further steps.
- Outlier identification:
robust_pca
: Performs robust PCA analysis of samples usingrrcov
;remove_outliers_samples
: Removes the outliers samples chosen by Robust PCA analysis.
- Covariates selection:
impute_meta
: Imputes data for missing values found in some metadata covariates.rank_variables
: Performs covariate analysis.
- Metadata:
-
TAG:
- Feature-wise outlier detection:
outliers_edge_ppcseq_gene
: Performs identification of outlier genes byppcseq
. Outlier genes were removed from DGE analysis.outliers_edge_ppcseq_tx
: Performs identification of outlier transcripts byppcseq
. Outlier transcripts were removed from DTE and DTU analyses.
- Differential gene expression:
edger_diff_gene
: Differential gene expression withedgeR
.
- Differential transcript expression:
edger_diff_tx
: Differential transcript expression withedgeR
.diff_tx_correct
: Performs multiple hypothesis correction withstageR
.
- Differential transcript usage:
ISA/
: scripts for differential transcript usage usingIsoformSwitchAnalyzeR
are stored in this directory.
- Gather results from three methods:
organize_dge_dte_after_filtering
: Filters the outlier genes and transcripts identified byppcseq
from DGE and DTE results.summarise_results_dge_dte_dtu
: Removes outlier transcripts from DTU analysis and gathers results from three methods.
- Feature-wise outlier detection:
-
Functional analyses:
network
: Network inference using stringDB. Visualization byRedeR
andggraph
.enrichment
: Enrichment analysis of transcriptionally altered genes usingclusterProfiler
.gwas_intersections
: Get genes with genomic regions related to depression usinggwasrapidd
.intersection_analysis
: intersection analysis by sex, brain region, and method used.
-
Additional scripts:
plots
andplot_dtu
: Description of figures produced to the paper.supp_fig_variants_by_donors
: Description of Supplementary Figures 9 and 10, which represent the presence of depression-associated SNPs on the samples considered.