VirusGraphs3

Solidifying the VirusGraphs Infrastructure for Deployment

insert Charley's figure

Linear reference genomes

Linear references are the gold standard for genomics applications, including capturing viral genome information and viral sequence recovery. Examples include HIV sequence detection and HIV genome assembly. HIV genome assembly can be loosely classified into whole (reference) genome assebly and HIV genotyping (partial assembly).

Graph Reference genomes

Nucleobase graph reference genome

Approximate k-mer graph reference genome

Scope

Implementation

Reference graphs

For nucleobase graphs: NovoGene VG.

For approximate k-mer graphs: original SWIGG, implemented SWIGG.

Extension of SWIGG

In the original graph created by SWIGG without any filtering parameters, the number of nodes are usually so large that it would result in inefficient visualization and other downstream analysis. Therefore, we implemented the algorithm to make a compact de Bruijn graph. Contraction of nodes starts by a depth-first-search from the source node of the graph. Nodes are included into a supernode continuously as the algorithm walks through the graph. A new supernode is created when the algorithm encounters a node with more than 1 neighbor. The size of the contracted graph is significantly reduced after contraction algorithm is applied (Table ?).

Transcript modeling

For split-read mapping, HISAT2 was used, and output .bam was pipped into StringTie, both in Galaxy.

Visualization

Gephi is used to visualize the output from SWIGG. SnapGene Graphviz

Results

for SWIGG

Graphs built with different k-mer lengths:

The above figure contains 5 graphs built with different k-mer lengths. (a) k=16, (b) k=20, (c) k=32, (d) k=50, (e) k=90. Longer k-mers cover more repetitive regions. Therefore, longer k-mers result in simpler graphs. Red rectangles highlights the large loop topology in graphs made by small k-mers. The loops are the results of the repeititive k-mers in regions far apart.

Table of number of nodes and edges before and after contraction

	Before Conrtaction		After Contraction
kmer	# Nodes	# Edges	# Nodes	# Edges
16	38772	55265	3781	4788
20	42442	55241	2867	3641
32	49041	55169	1350	1703
50	52794	55061	442	554
90	54449	54821	68	83

for NovoGraph

Current NovoGraph script.. (TBD).

for VG

HIV-1 transcript model

Figure: cDNA+PCR DNAseq ("classic RNAseq"). A. Coverage summary of reads mapped to HXB2 K03455 with HISAT2 with usegalaxy.eu. B. RmDup-processed reads, controling for PCR duplicates after initial alignment. Search strategy = SRA. Searchterms: "HIV-1 and RNAseq and virus". Bioproject: PRJNA320293, specifically SRR3472915. Viewed in IGV. Source: From https://github.com/NCBI-Codeathons/Virus_Graphs/edit/master/README.md.

Will add predicted isoform models soon..

Team & contact info

Alejandro Gener (Lead/Corresponding Author)
[email protected]; [email protected]
Baylor College of Medicine, Houston, TX, USA
MD Anderson Cancer Center, Houston, TX, USA
Universidad Central del Caribe, Bayamón, PR, USA

Nicolas Cooley
[email protected]
Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh PA, USA,15206

Charles Scott Kirby
[email protected]
Johns Hopkins University School of Medicine, Baltimore, MD, USA

Zhao Liu
[email protected]
Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA

Rahil Sethi

Yutong Qiu
[email protected]
Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.ipynb_checkpoints		.ipynb_checkpoints
swigg_figure		swigg_figure
vg_prunes		vg_prunes
HIV1.fasta		HIV1.fasta
LICENSE		LICENSE
README.md		README.md
SWIGG_extension.ipynb		SWIGG_extension.ipynb
Virus _Graphs_3_Workflow.pdf		Virus _Graphs_3_Workflow.pdf
Virus3 v1.pptx		Virus3 v1.pptx
Virus3 v2.pptx		Virus3 v2.pptx
Virus_Graphs_3_Workflow_med_screen.tif		Virus_Graphs_3_Workflow_med_screen.tif
Virus_Graphs_3_Workflow_screen.tif		Virus_Graphs_3_Workflow_screen.tif
igv_snapshot_HXB2-mapping_reads_from_SRR3472915_v2.png		igv_snapshot_HXB2-mapping_reads_from_SRR3472915_v2.png
limit_of_linear.pdf		limit_of_linear.pdf
limit_of_linear.png		limit_of_linear.png
swigg_ext.py		swigg_ext.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VirusGraphs3

Linear reference genomes

Graph Reference genomes

Nucleobase graph reference genome

Approximate k-mer graph reference genome

Scope

Implementation

Reference graphs

Extension of SWIGG

Transcript modeling

Visualization

Results

for SWIGG

for NovoGraph

for VG

HIV-1 transcript model

Team & contact info

About

Releases

Packages

Languages

License

rahil19/VirusGraphs3

Folders and files

Latest commit

History

Repository files navigation

VirusGraphs3

Linear reference genomes

Graph Reference genomes

Nucleobase graph reference genome

Approximate k-mer graph reference genome

Scope

Implementation

Reference graphs

Extension of SWIGG

Transcript modeling

Visualization

Results

for SWIGG

for NovoGraph

for VG

HIV-1 transcript model

Team & contact info

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages