Repository for the analysis of human gametolog sex-dependent and sex-chromosome-dependent co-expression fingerprint divergence (CFD)
This repository contains scripts used in the analysis of human gametolog sex-dependent and sex-chromosome-dependent co-expression fingerprint divergence (CFD).
Note that we ran most steps on the NIH (Biowulf) high-performance computing cluster. We have aimed to generalize the code here by removing system-specific references to installed software and modules. Instead, we document required software and version numbers below (excluding standard Unix programs and R). For HPC systems, the required scripts and binaries must be in the PATH. The easiest way to do this is to use an existing module or to install your own. In these cases, the modules should be loaded prior to running the appropriate code below.
As Biowulf uses the slurm scheduler, most code below should run on slurm systems with little or no modification. For non-slurm HPC systems, slurm scripts and environmental variables will need to be adjusted, though hopefully without too much hassle.
We ran most analysis steps using R (v4.1). We recommend the following utility or visualization packages to extend base R's functionality.
The following files are expected:
-
GTEx v8 fastq files should be placed in the
fastq/
folder with the naming convention${sample}/${sample}_1.fastq.gz
(read 1) and${sample}/${sample}_2.fastq.gz
(read 2). -
The GTEx phenotype file should placed in
data/gtex_meta_edit.csv
-
The GTEx attributes file should placed in
data/GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt
-
A .csv file including gametolog information should be placed in
data/gametologs_in_genome.csv
- Key libraries: stringr
# Import metadata and combine
scripts/combine_meta.R
- Required software: kallisto (v0.46.2), samtools (v1.13)
# Create sex specific transcriptomes and index
scripts/kallisto_transcriptomes.sh
# Count transcripts
sbatch --array=1-$(wc -l checkpoints/samplesM.txt | cut -d ' ' -f 1) scripts/run_kallisto_M.sh
sbatch --array=1-$(wc -l checkpoints/samplesF.txt | cut -d ' ' -f 1) scripts/run_kallisto_F.sh
- Key libraries: tximport, rdf5, biomaRt
# Import male and female kallisto results into R and combine
scripts/import_kallisto.R
- Key libraries: edgeR, limma
# Apply filters to gene expression dataset
# Normalize data
# Adjust expression for age + technical effects
scripts/normalize_adjust.R
- Key libraries: spqn
# calculate co-expression (in males & females)
# apply spatial quantile normalization
# calculate sex-dependent CFD
# Create R files
scripts/sex_dependent_CFD_MXY_FXX.R
scripts/sex_dependent_CFD_MX_FXX.R
# Create swarm files
scripts/sex_dependent_CFD_swarm.R
# Submit jobs
swarm -f sex_dependent_CFD_MXY_FXX.swarm -g 200 -t 4 --module R/4.1.0
swarm -f sex_dependent_CFD_MX_FXX.swarm -g 200 -t 4 --module R/4.1.0
- Key libraries: spqn
# calculate co-expression (in males)
# apply spatial quantile normalization
# calculate differential X-Y coupling
# calculate sex-chromosome-dependent CFD
# Create R files
scripts/sex_chr_dependent_CFD.R
# Create swarm files
scripts/sex_chr_dependent_CFD_swarm.R
# Submit job
swarm -f sex_chr_dependent_CFD.swarm -g 200 -t 4 --module R/4.1.0
- Key libraries: ggplot2
# Load and analyze
scripts/visualize_sex_dependent_CFD.R
scripts/visualize_sex_chr_dependent_CFD.R
scripts/compare_sex_dep_vs_sex_chr_dependent_CFD.R
- Key libraries: biomaRt
# Load sequences and estimate similarity/divergence measures
scripts/evolutionary_divergence.R
- Key libraries: biomaRt
# ANOVA, variance partitioning, dimensionality reduction
# clustering
# GO annotation
# Sex chromosome enrichment
scripts/asymmetric_coupling.R
# estimate expression-weighted asymmetric coupling
scripts/exp_weighted_asym_coupling.R
# estimate expression-weighted asymmetric coupling
scripts/CLIP.R
- Key libraries: limma, mashr
# estimate sex effects
scripts/calc_sex_biased_expression.R
# compare measures and visualize
scripts/asymmetric_versus_sex_bias.R
# estimate and visualize
scripts/sex_diff_Xcoupled_versus_Ycoupled.R
- Key libraries: ggplot2
# GO and DO analyses
scripts/ASD_enrichments.R