LeadNPC

Systems Biology-Based Analysis Indicates Global Transcriptional Impairment in Lead-Treated Human Neural Progenitor Cells

This repository contains the files necessary to reproduce the results reported in our analysis of lead-treated human neural progenitor cells. The bin folder contains the code used to generate analysis and figures. It's based on R software (version 3.5.1) and uses the libraries biomaRt, affy, data.table, dplyr, edgeR, factoextra, FactoMineR, ggplot2, ggrepel, grid, gridExtra, hgu95av2.db, igraph, limma, purrr, RColorBrewer, RCy3, readxl, RedeR, scales, stats, stringr, tibble, tidyr, topGO, and transcriptogramer (version 1.3.4).

The Data folder contains intermediate data files generated by the pipeline. So you can start the pipeline from scratch, downloading the raw data files from the experiment, or just run any phase using these files. Be sure to correct the path to data in the R scripts.

Running the entire pipeline from scratch

In order, to run this pipeline you will need to download the raw data sequence reads from Sequence Read Archive, accession SRP079342, Gene Expression Omnibus, accession GSE84712, get the Ensembl GRCh38 Human genome reference and annotation (release 91), and have the software Hisat2 and FeatureCounts installed.

Create hisat2 index

hisat2-build -p 8 ${genRefDir}"/Homo_sapiens.GRCh38.dna.primary_assembly.fa" ${genIndexDir}

Extract the splice sites

extract_splice_sites.py ${genRefDir}"/Homo_sapiens.GRCh38.94.gtf" > ${genRefDir}"/Homo_sapiens.GRCh38.94.txt"

Generate the aligns

For each sample file do:

hisat2 -x ${genIndexDir} --known-splicesite-infile ${genRefDir}"/Homo_sapiens.GRCh38.94.txt" -p 8 -1 ${sampleDir}/${file}_1".fastq.gz" -2 ${sampleDir}/${file}_2".fastq.gz"| samtools view -bS - > ${resultDir}/${file}".hisat.bam";

Count aligned reads

First, generate the bam files list in a single line

ls *.hisat.bam | tr '\n' ' '> bamList.txt

Then generate the count's file

featureCounts -T 32 -t gene -g gene_id -a ${genRefDir}"/Homo_sapiens.GRCh38.94.gtf" -o ./allCountsHisat.txt $(cat bamList.txt)

Now you can use the R scripts below.

Or take a shortcut, and start here...

Is recommended run 00base.R and 02analisePCA.R before try to run any other script. Those scripts will setup the R environment and create some necessary files.

Set up the enviroment and instaling the R packages

00base.R

Create logCPM file

01ProcessCounts.R

PCA analysis and create transcriptogramer objects

Objects generated here will be necessary later. If you are using a more recent version of transcriptogramer than version 1.3.4, the results can be a little bit different, because it uses data from STRINGdb release 11.

02analisePCA.R

Plot transcriptogramer graphics

03plotTrancript.R

Plot circlize graphics

04circosPlot.R

Cluster superposition analysis

05intersecClusters.R

Create clusters graphos

This is not a completely automatic process. You will need to use Cytoscape manually.

06graphTemposManual.R

Generate Figure 2 components

This is not a completely automatic process. The complete figure was composed by hand using Inkscape.

Nodes

07BarNodes.R

Conectivity

08BarConect.R

Generate Figures 3 and 4 graphos

This is not a completely automatic process. You will need to use RedeR and Cytoscape manually. For more information on how to connect Cytoscape and R, see Cytoscape and RCy3 documentation.

09GODendo.R

Generate Figures 3 and 4 dendograms

This is not a completely automatic process. You will need to use RedeR and Cytoscape manually. For more information on how to connect Cytoscape and R, see Cytoscape and RCy3 documentation.

10graphoClustersRCy3.R

Generate Markers figures

11Marquers.R

Other analysis

Another auxiliary analysis was performed using several scripts placed inside bin folder.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Data		Data
bin		bin
.Rhistory		.Rhistory
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LeadNPC

Systems Biology-Based Analysis Indicates Global Transcriptional Impairment in Lead-Treated Human Neural Progenitor Cells

Running the entire pipeline from scratch

Create hisat2 index

Extract the splice sites

Generate the aligns

Count aligned reads

Or take a shortcut, and start here...

Set up the enviroment and instaling the R packages

Create logCPM file

PCA analysis and create transcriptogramer objects

Plot transcriptogramer graphics

Plot circlize graphics

Cluster superposition analysis

Create clusters graphos

Generate Figure 2 components

Nodes

Conectivity

Generate Figures 3 and 4 graphos

Generate Figures 3 and 4 dendograms

Generate Markers figures

Other analysis

About

Contributors 2

Languages

License

dalmolingroup/LeadNPC

Folders and files

Latest commit

History

Repository files navigation

LeadNPC

Systems Biology-Based Analysis Indicates Global Transcriptional Impairment in Lead-Treated Human Neural Progenitor Cells

Running the entire pipeline from scratch

Create hisat2 index

Extract the splice sites

Generate the aligns

Count aligned reads

Or take a shortcut, and start here...

Set up the enviroment and instaling the R packages

Create logCPM file

PCA analysis and create transcriptogramer objects

Plot transcriptogramer graphics

Plot circlize graphics

Cluster superposition analysis

Create clusters graphos

Generate Figure 2 components

Nodes

Conectivity

Generate Figures 3 and 4 graphos

Generate Figures 3 and 4 dendograms

Generate Markers figures

Other analysis

About

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages