Skip to content

ardecasien/gametologs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Evolutionary divergence between homologous X-Y chromosome genes shapes sex-biased biology

Repository for the analysis of human gametolog sex-dependent and sex-chromosome-dependent co-expression fingerprint divergence (CFD)

This repository contains scripts used in the analysis of human gametolog sex-dependent and sex-chromosome-dependent co-expression fingerprint divergence (CFD).

Note that we ran most steps on the NIH (Biowulf) high-performance computing cluster. We have aimed to generalize the code here by removing system-specific references to installed software and modules. Instead, we document required software and version numbers below (excluding standard Unix programs and R). For HPC systems, the required scripts and binaries must be in the PATH. The easiest way to do this is to use an existing module or to install your own. In these cases, the modules should be loaded prior to running the appropriate code below.

As Biowulf uses the slurm scheduler, most code below should run on slurm systems with little or no modification. For non-slurm HPC systems, slurm scripts and environmental variables will need to be adjusted, though hopefully without too much hassle.

We ran most analysis steps using R (v4.1). We recommend the following utility or visualization packages to extend base R's functionality.

Inputs

The following files are expected:

  • GTEx v8 fastq files should be placed in the fastq/ folder with the naming convention ${sample}/${sample}_1.fastq.gz (read 1) and ${sample}/${sample}_2.fastq.gz (read 2).

  • The GTEx phenotype file should placed in data/gtex_meta_edit.csv

  • The GTEx attributes file should placed in data/GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt

  • A .csv file including gametolog information should be placed in data/gametologs_in_genome.csv

Pipeline

Combine metadata

  • Key libraries: stringr
# Import metadata and combine
scripts/combine_meta.R

Map reads with pseudoalignment software

  • Required software: kallisto (v0.46.2), samtools (v1.13)
# Create sex specific transcriptomes and index
scripts/kallisto_transcriptomes.sh

# Count transcripts
sbatch --array=1-$(wc -l checkpoints/samplesM.txt | cut -d ' ' -f 1) scripts/run_kallisto_M.sh
sbatch --array=1-$(wc -l checkpoints/samplesF.txt | cut -d ' ' -f 1) scripts/run_kallisto_F.sh

Import and combine expression data

  • Key libraries: tximport, rdf5, biomaRt
# Import male and female kallisto results into R and combine
scripts/import_kallisto.R

Normalize, filter, and adjust gene expression

  • Key libraries: edgeR, limma
# Apply filters to gene expression dataset
# Normalize data
# Adjust expression for age + technical effects
scripts/normalize_adjust.R

Sex-dependent co-expression fingerprint divergence (CFD)

  • Key libraries: spqn
# calculate co-expression (in males & females)
# apply spatial quantile normalization
# calculate sex-dependent CFD 

# Create R files
scripts/sex_dependent_CFD_MXY_FXX.R
scripts/sex_dependent_CFD_MX_FXX.R

# Create swarm files
scripts/sex_dependent_CFD_swarm.R

# Submit jobs
swarm -f sex_dependent_CFD_MXY_FXX.swarm -g 200 -t 4 --module R/4.1.0
swarm -f sex_dependent_CFD_MX_FXX.swarm -g 200 -t 4 --module R/4.1.0

Sex-chromosome-dependent co-expression divergence (CFD) in males

  • Key libraries: spqn
# calculate co-expression (in males)
# apply spatial quantile normalization
# calculate differential X-Y coupling
# calculate sex-chromosome-dependent CFD 

# Create R files
scripts/sex_chr_dependent_CFD.R

# Create swarm files
scripts/sex_chr_dependent_CFD_swarm.R

# Submit job
swarm -f sex_chr_dependent_CFD.swarm -g 200 -t 4 --module R/4.1.0

Load & visualize sex-dependent and sex-chromosome-dependent CFD

  • Key libraries: ggplot2
# Load and analyze
scripts/visualize_sex_dependent_CFD.R
scripts/visualize_sex_chr_dependent_CFD.R
scripts/compare_sex_dep_vs_sex_chr_dependent_CFD.R

Estimate regulatory and sequence divergence for X-Y gametologs

  • Key libraries: biomaRt
# Load sequences and estimate similarity/divergence measures
scripts/evolutionary_divergence.R

Inviestigate the patterns and distributions of asymmetric coupling

  • Key libraries: biomaRt
# ANOVA, variance partitioning, dimensionality reduction
# clustering
# GO annotation
# Sex chromosome enrichment
scripts/asymmetric_coupling.R

Estimate expression-weighted asymmetric coupling

# estimate expression-weighted asymmetric coupling
scripts/exp_weighted_asym_coupling.R

CLIP (significance of asymmetric coupling)

# estimate expression-weighted asymmetric coupling
scripts/CLIP.R

Compare asymmetric X-Y coupling to sex-biased gene expression and co-expression

  • Key libraries: limma, mashr
# estimate sex effects
scripts/calc_sex_biased_expression.R
# compare measures and visualize
scripts/asymmetric_versus_sex_bias.R

Estimate sex differences in co-expression between X-coupled & Y-coupled genes

# estimate and visualize
scripts/sex_diff_Xcoupled_versus_Ycoupled.R

ASD risk gene enrichment analyses

  • Key libraries: ggplot2
# GO and DO analyses
scripts/ASD_enrichments.R

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published