title | author | institute | date | output | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Pangenome visualisation |
Alexander Leonard |
ETH Zürich |
Day 2 |
|
We have learned about:
- how accurate long reads have reshaped genome assembly
- pangenomes are a natural step to encapsulating all the new genomes
- There is no "one true pangenome", but each has their own strengths
By the end of the lecture, we should be able to:
- create figures of pangenomic bubbles
- interactively explore pangenomes with different layers of data
- identify large-scale homology between pangenome assemblies
- local-scale visualisations
- global-scale visualisations
- layering visualisations with additional data
What types of genomic data do we normally try and visualise?
IGV (Integrative Genomics Viewer, https://igv.org/doc/desktop/) is a useful tool for visualising different formats of genomic data:
- read alignments
- bed files
- gene annotations
. . .
Seeing the data can often influence later analyses:
- too many/few reads where we expect them
- overlap of variants and complex annotations
There are many other ways to visualise genomic data, such as:
- JBrowse
- Ribbon
- USCS Genome Browser
. . .
Is there a pangenomic equivalent?
Sadly, not really...
Everything is more complicated in the pangenomic world.
. . .
But it depends what are we interested in:
- viewing relationship between many assemblies?
- viewing alignments/annotations on pangenome graphs?
How do we visualise the GFA output of pangenome construction?
One of the most common tools is Bandage
(https://github.com/asl/BandageNG).
It has several advantages:
- easy to install
- quick to load small-moderate graphs
- lots of extra functionality
A relatively easy example of a minigraph bubble.
A not-so-easy example of that region in cactus.
A hard example of that region in pggb.
Beyond viewing graphs, we can also use Bandage
for:
- searching for sequence hits (
blastn
,minimap2
, etc.) - annotating paths
- loading BED files
Let's explore Bandage
a bit.
Bandage is a powerful tool for working on a local scale.
How can we look at pangenomes (and the relationships between assemblies) on a global scale?
Synteny Circos plots can be an informative way to compare assemblies.
. . .
We can easily construct "multi-assembly" synteny plots.
. . .
. . .
But are they helpful?
Many one-to-one alignments is not the same as many-to-many alignments.
. . .
Very easy to misinterpret or even miss key relationships.
But, this can be a helpful stepping stone to transition to pangenomic concepts.
Viewing too much information can be just as unhelpful as viewing too little.
Even the variation you see is hard to relate amongst all assemblies.
And then some odgi
!
Another critical pangenome tool is odgi
(https://github.com/pangenome/odgi).
odgi
is a play on the Italian word "oggi" (/ˈɔd.dʒi/), which means "today".
As of 2019, a standard refrain in genomics is that genome graphs will be useful in x years.
But, if we make them efficient and scalable, they will be useful today.
We can use odgi viz
to get something in between 1D linear (easy) and pangenome (informative) views.
This bins the pangenome and produces a linear, static visualisation of the graph.
. . .
Nodes are "ordered" left to right, but what does that mean?
. . .
THEY ARE NOT NECESSARILY SEQUENTIAL
. . .
How do we interpret the links (graph topology)?
We can also plot a "compressed" mode, and see which regions are variable.
There are many additional layers of information we can use:
- inversions
- traversal depth
- any annotations from BED files?
We can "inject" an annotation from any assembly into the pangenome.
. . .
. . .
Let the pangenome do the hard work for dealing with pangenome coordinates!
We can also "untangle" a graph bubble locally, and "linearise" it.
Easier to see copy number variation (like VNTRs) or gene structure.
. . .
Pangenomes are efficient data structures.
. . .
Pangenomes can be efficient visualiation structures.
. . .
Linear sequence context will always make more sense to us.
. . .
These plots still use pangenomics before the final image.
These tools are also likely best used in combination, so we can understand the graph at different scales.
. . .
"Publication figures" might focus on different "graphness" levels.
There are many other tools:
- Waragraph (and earlier gfaestus; depends on odgi layout)
- sequenceTubeMap from
vg
- Panache
- Panagram
. . .
And even more "Pan" puns at https://github.com/colindaven/awesome-pangenomes.
. . .
:::incremental
- Visualising your data is critical, even more so for pangenomes!
Bandage
is powerful for interactively exploring "local" regions.- Increasing graph complexity will be impossible to responsively display.
odgi
is powerful for statically visualising entire graphs. :::
Goals of this afternoon.
Part 1:
- assemble an entire chromosome from long reads
- build a chromosome pangenome from six assemblies
Part 2:
- visually explore minigraph and pggb pangenomes
- examine sequences and annotated features in pangenomes
And then coffee