Skip to content

Latest commit

 

History

History
79 lines (53 loc) · 4.93 KB

README.md

File metadata and controls

79 lines (53 loc) · 4.93 KB

LeadNPC

Systems Biology-Based Analysis Indicates Global Transcriptional Impairment in Lead-Treated Human Neural Progenitor Cells

This repository contains the files necessary to reproduce the results reported in our analysis of lead-treated human neural progenitor cells. The bin folder contains the code used to generate analysis and figures. It's based on R software (version 3.5.1) and uses the libraries biomaRt, affy, data.table, dplyr, edgeR, factoextra, FactoMineR, ggplot2, ggrepel, grid, gridExtra, hgu95av2.db, igraph, limma, purrr, RColorBrewer, RCy3, readxl, RedeR, scales, stats, stringr, tibble, tidyr, topGO, and transcriptogramer (version 1.3.4).

The Data folder contains intermediate data files generated by the pipeline. So you can start the pipeline from scratch, downloading the raw data files from the experiment, or just run any phase using these files. Be sure to correct the path to data in the R scripts.

Running the entire pipeline from scratch

In order, to run this pipeline you will need to download the raw data sequence reads from Sequence Read Archive, accession SRP079342, Gene Expression Omnibus, accession GSE84712, get the Ensembl GRCh38 Human genome reference and annotation (release 91), and have the software Hisat2 and FeatureCounts installed.

Create hisat2 index

hisat2-build -p 8 ${genRefDir}"/Homo_sapiens.GRCh38.dna.primary_assembly.fa" ${genIndexDir}

Extract the splice sites

extract_splice_sites.py ${genRefDir}"/Homo_sapiens.GRCh38.94.gtf" > ${genRefDir}"/Homo_sapiens.GRCh38.94.txt"

Generate the aligns

For each sample file do:

hisat2 -x ${genIndexDir} --known-splicesite-infile ${genRefDir}"/Homo_sapiens.GRCh38.94.txt" -p 8 -1 ${sampleDir}/${file}_1".fastq.gz" -2 ${sampleDir}/${file}_2".fastq.gz"| samtools view -bS - > ${resultDir}/${file}".hisat.bam";

Count aligned reads

First, generate the bam files list in a single line

ls *.hisat.bam | tr '\n' ' '> bamList.txt

Then generate the count's file

featureCounts -T 32 -t gene -g gene_id -a ${genRefDir}"/Homo_sapiens.GRCh38.94.gtf" -o ./allCountsHisat.txt $(cat bamList.txt)

Now you can use the R scripts below.

Or take a shortcut, and start here...

Is recommended run 00base.R and 02analisePCA.R before try to run any other script. Those scripts will setup the R environment and create some necessary files.

Set up the enviroment and instaling the R packages

00base.R

Create logCPM file

01ProcessCounts.R

PCA analysis and create transcriptogramer objects

Objects generated here will be necessary later. If you are using a more recent version of transcriptogramer than version 1.3.4, the results can be a little bit different, because it uses data from STRINGdb release 11.

02analisePCA.R

Plot transcriptogramer graphics

03plotTrancript.R

Plot circlize graphics

04circosPlot.R

Cluster superposition analysis

05intersecClusters.R

Create clusters graphos

This is not a completely automatic process. You will need to use Cytoscape manually.

06graphTemposManual.R

Generate Figure 2 components

This is not a completely automatic process. The complete figure was composed by hand using Inkscape.

Nodes

07BarNodes.R

Conectivity

08BarConect.R

Generate Figures 3 and 4 graphos

This is not a completely automatic process. You will need to use RedeR and Cytoscape manually. For more information on how to connect Cytoscape and R, see Cytoscape and RCy3 documentation.

09GODendo.R

Generate Figures 3 and 4 dendograms

This is not a completely automatic process. You will need to use RedeR and Cytoscape manually. For more information on how to connect Cytoscape and R, see Cytoscape and RCy3 documentation.

10graphoClustersRCy3.R

Generate Markers figures

11Marquers.R

Other analysis

Another auxiliary analysis was performed using several scripts placed inside bin folder.