-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extending existing graph with long reads #518
Comments
Apologies that the documentation on this isn't very good!
In principle you can simply include nanopore reads in your input sequences,
along side the other reference genomes you want to include. That could be a
subset or a whole pangenome.
You may want to do this in a single region or chromosome at a time using
reference alignment or mapping to collect reads and contigs by locus. In
principle you can use many scaffolded references for this, but the most we
tested was 2 (chm13+grch38).
There is also reference free partitioning but it is worth noting that it's
hard for large numbers of sequences. We can link docs if you don't find
them immediately.
Once you've collected a set of nanopore sequences and pangenome assemblies,
it is possible to put them into pggb as inputs. Then there will be some
paths that correspond to nanopore reads and some that correspond to
assemblies.
Downstream it does get harder to work with this. There isn't a strong
pipeline to use the nanopore sequences this way and then make variant calls
from the aligned sample. Tools in VG should work, but I'm not sure if they
can handle the diversity between the nanopore reads when these get included.
If this isn't making sense, please let me know what needs more
clarification.
…On Wed, Jul 12, 2023, 18:57 Baxter Worthing ***@***.***> wrote:
Hello!
I've been using odgi to explore graphs I've built from long read
assemblies with both Minigraph/Cactus and pggb, and it's been incredibly
useful (thank you !)
I have some hifi data for some additional samples (individuals not already
represented in the graph) that I'd like to try to use to extend an existing
graph, but the hifi data is not sufficient depth for de novo assembly
(10-15x coverage).
I noticed that there is a section about this
<https://odgi.readthedocs.io/en/latest/rst/faqs.html#graph-constructed-from-long-read-or-sequence-data-extension-with-long-reads-or-sequences>
in the FAQ for odgi which says:
"Here, our recommendation is to actually rebuild the graph with PGGB
<https://github.com/pangenome/pggb>. One could use Graphaligner
<https://github.com/maickrau/GraphAligner> to align the long sequences
against the graph and then use vg augment to extend the already existing
graph, but that would be comparatively inexact and the resolutions of
complex regions might drop dramatically. A reference-biased method would be
Minigraph <https://github.com/lh3/minigraph> followed by Cactus
<https://github.com/glennhickey/progressiveCactus>."
So, if I want to extend an existing graph with lowish coverage long reads
for additional samples I should:
1. assemble the reads as best I can (despite low coverage) and
re-build the graph with pggb or Minigraph/Cactus
...or...
2. align the reads to the existing graph and use vg augment (less
ideal)
Am I understanding this correctly?
—
Reply to this email directly, view it on GitHub
<#518>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDQEKQFTVTNAD5X64MCZTXP3JOHANCNFSM6AAAAAA2HYST7U>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Okay yes that makes sense, thanks! Out of curiosity, what advantage would this approach have over using vg augment? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello!
I've been using odgi to explore graphs I've built from long read assemblies with both Minigraph/Cactus and pggb, and it's been incredibly useful (thank you !)
I have some hifi data for some additional samples (individuals not already represented in the graph) that I'd like to try to use to extend an existing graph, but the hifi data is not sufficient depth for de novo assembly (10-15x coverage).
I noticed that there is a section about this in the FAQ for odgi which says:
"Here, our recommendation is to actually rebuild the graph with PGGB. One could use Graphaligner to align the long sequences against the graph and then use vg augment to extend the already existing graph, but that would be comparatively inexact and the resolutions of complex regions might drop dramatically. A reference-biased method would be Minigraph followed by Cactus."
So, if I want to extend an existing graph with lowish coverage long reads for additional samples I should:
...or...
Am I understanding this correctly?
The text was updated successfully, but these errors were encountered: