Extending existing graph with long reads #518

BaxW · 2023-07-12T16:56:56Z

Hello!

I've been using odgi to explore graphs I've built from long read assemblies with both Minigraph/Cactus and pggb, and it's been incredibly useful (thank you !)
I have some hifi data for some additional samples (individuals not already represented in the graph) that I'd like to try to use to extend an existing graph, but the hifi data is not sufficient depth for de novo assembly (10-15x coverage).

I noticed that there is a section about this in the FAQ for odgi which says:

"Here, our recommendation is to actually rebuild the graph with PGGB. One could use Graphaligner to align the long sequences against the graph and then use vg augment to extend the already existing graph, but that would be comparatively inexact and the resolutions of complex regions might drop dramatically. A reference-biased method would be Minigraph followed by Cactus."

So, if I want to extend an existing graph with lowish coverage long reads for additional samples I should:

assemble the reads as best I can (despite low coverage) and re-build the graph with pggb or Minigraph/Cactus
...or...
align the reads to the existing graph and use vg augment (less ideal)

Am I understanding this correctly?

ekg · 2023-07-12T17:32:16Z

Apologies that the documentation on this isn't very good! In principle you can simply include nanopore reads in your input sequences, along side the other reference genomes you want to include. That could be a subset or a whole pangenome. You may want to do this in a single region or chromosome at a time using reference alignment or mapping to collect reads and contigs by locus. In principle you can use many scaffolded references for this, but the most we tested was 2 (chm13+grch38). There is also reference free partitioning but it is worth noting that it's hard for large numbers of sequences. We can link docs if you don't find them immediately. Once you've collected a set of nanopore sequences and pangenome assemblies, it is possible to put them into pggb as inputs. Then there will be some paths that correspond to nanopore reads and some that correspond to assemblies. Downstream it does get harder to work with this. There isn't a strong pipeline to use the nanopore sequences this way and then make variant calls from the aligned sample. Tools in VG should work, but I'm not sure if they can handle the diversity between the nanopore reads when these get included. If this isn't making sense, please let me know what needs more clarification.

…

On Wed, Jul 12, 2023, 18:57 Baxter Worthing ***@***.***> wrote: Hello! I've been using odgi to explore graphs I've built from long read assemblies with both Minigraph/Cactus and pggb, and it's been incredibly useful (thank you !) I have some hifi data for some additional samples (individuals not already represented in the graph) that I'd like to try to use to extend an existing graph, but the hifi data is not sufficient depth for de novo assembly (10-15x coverage). I noticed that there is a section about this <https://odgi.readthedocs.io/en/latest/rst/faqs.html#graph-constructed-from-long-read-or-sequence-data-extension-with-long-reads-or-sequences> in the FAQ for odgi which says: "Here, our recommendation is to actually rebuild the graph with PGGB <https://github.com/pangenome/pggb>. One could use Graphaligner <https://github.com/maickrau/GraphAligner> to align the long sequences against the graph and then use vg augment to extend the already existing graph, but that would be comparatively inexact and the resolutions of complex regions might drop dramatically. A reference-biased method would be Minigraph <https://github.com/lh3/minigraph> followed by Cactus <https://github.com/glennhickey/progressiveCactus>." So, if I want to extend an existing graph with lowish coverage long reads for additional samples I should: 1. assemble the reads as best I can (despite low coverage) and re-build the graph with pggb or Minigraph/Cactus ...or... 2. align the reads to the existing graph and use vg augment (less ideal) Am I understanding this correctly? — Reply to this email directly, view it on GitHub <#518>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABDQEKQFTVTNAD5X64MCZTXP3JOHANCNFSM6AAAAAA2HYST7U> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

BaxW · 2023-07-13T17:10:10Z

Okay yes that makes sense, thanks! Out of curiosity, what advantage would this approach have over using vg augment?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extending existing graph with long reads #518

Extending existing graph with long reads #518

BaxW commented Jul 12, 2023

ekg commented Jul 12, 2023 via email

BaxW commented Jul 13, 2023

Extending existing graph with long reads #518

Extending existing graph with long reads #518

Comments

BaxW commented Jul 12, 2023

ekg commented Jul 12, 2023 via email

BaxW commented Jul 13, 2023