Systems Biology-Based Analysis Indicates Global Transcriptional Impairment in Lead-Treated Human Neural Progenitor Cells
This repository contains the files necessary to reproduce the results reported in our analysis of lead-treated human neural progenitor cells. The bin folder contains the code used to generate analysis and figures. It's based on R software (version 3.5.1) and uses the libraries biomaRt, affy, data.table, dplyr, edgeR, factoextra, FactoMineR, ggplot2, ggrepel, grid, gridExtra, hgu95av2.db, igraph, limma, purrr, RColorBrewer, RCy3, readxl, RedeR, scales, stats, stringr, tibble, tidyr, topGO, and transcriptogramer (version 1.3.4).
The Data folder contains intermediate data files generated by the pipeline. So you can start the pipeline from scratch, downloading the raw data files from the experiment, or just run any phase using these files. Be sure to correct the path to data in the R scripts.
In order, to run this pipeline you will need to download the raw data sequence reads from Sequence Read Archive, accession SRP079342, Gene Expression Omnibus, accession GSE84712, get the Ensembl GRCh38 Human genome reference and annotation (release 91), and have the software Hisat2 and FeatureCounts installed.
hisat2-build -p 8 ${genRefDir}"/Homo_sapiens.GRCh38.dna.primary_assembly.fa" ${genIndexDir}
extract_splice_sites.py ${genRefDir}"/Homo_sapiens.GRCh38.94.gtf" > ${genRefDir}"/Homo_sapiens.GRCh38.94.txt"
For each sample file do:
hisat2 -x ${genIndexDir} --known-splicesite-infile ${genRefDir}"/Homo_sapiens.GRCh38.94.txt" -p 8 -1 ${sampleDir}/${file}_1".fastq.gz" -2 ${sampleDir}/${file}_2".fastq.gz"| samtools view -bS - > ${resultDir}/${file}".hisat.bam";
First, generate the bam files list in a single line
ls *.hisat.bam | tr '\n' ' '> bamList.txt
Then generate the count's file
featureCounts -T 32 -t gene -g gene_id -a
${genRefDir}"/Homo_sapiens.GRCh38.94.gtf" -o ./allCountsHisat.txt $ (cat bamList.txt)
Now you can use the R scripts below.
Is recommended run 00base.R and 02analisePCA.R before try to run any other script. Those scripts will setup the R environment and create some necessary files.
Objects generated here will be necessary later. If you are using a more recent version of transcriptogramer than version 1.3.4, the results can be a little bit different, because it uses data from STRINGdb release 11.
This is not a completely automatic process. You will need to use Cytoscape manually.
This is not a completely automatic process. The complete figure was composed by hand using Inkscape.
This is not a completely automatic process. You will need to use RedeR and Cytoscape manually. For more information on how to connect Cytoscape and R, see Cytoscape and RCy3 documentation.
This is not a completely automatic process. You will need to use RedeR and Cytoscape manually. For more information on how to connect Cytoscape and R, see Cytoscape and RCy3 documentation.
Another auxiliary analysis was performed using several scripts placed inside bin folder.