Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting graph with -Y option triggers "Assertion `idx < this->size()' failed" error #548

Open
sivico26 opened this issue Jan 16, 2024 · 12 comments

Comments

@sivico26
Copy link

Dear odgi team,

Thanks for developing odgi. I am working on a huge graph, so each processing step takes a long time. I was pruning some empty nodes from my graph to later explore it with some of my tools. Anyway, when I was optimizing the node space after the pruning, I met the following error:

odgi: /opt/conda/conda-bld/odgi_1687621144080/work/build/sdsl-lite-prefix/src/sdsl-lite-build/include/sdsl/int_vector.hpp:1360: sdsl::int_vector<<anonymous> >::reference sdsl::int_vector<<anonymous> >::operator[](const size_type&) [with unsigned char t_width = 1; reference = sdsl::int_vector_reference<sdsl::int_vector<1> >; size_type = long unsigned int]: Assertion `idx < this->size()' failed.
/var/spool/pbs/mom_priv/jobs/19616399.meta-pbs.metacentrum.cz.SC: line 56: 131458 Aborted                 odgi sort -t $threads -i odgi_pruned.og -p Ygs -O -o og_opt_transfer.og

Looking at previous issues, I found that #430 was a lengthy, relevant discussion. In the end, I adjusted my command and removed the -Y from odgi sort (everything else equal), and it worked. So I can continue with my analyses.

However, this is somewhat unsatisfactory since I cannot do the PG-SGD sort with my graph. If I understood the discussion, the possible reasons listed do not apply to this case since I pruned the graph without trouble. To be precise, this is the command I used:

odgi prune -t $threads -TEc 1 -i $graph -o odgi_pruned.og ## $graph is in .gfa format

Correct me if I am wrong, but this indicates that odgi build does not have any trouble with my graph, which should discard many of the possible problems (e.g. W lines). Furthermore, my input graph for odgi sort was written by odgi prune. Thus, I wonder what could be causing the assertion error.

My graphs are big (before pruning .gfa ~118 Gb, and .og ~245 Gb; after pruning .gfa ~112 Gb and .og ~ 179 Gb), so not so easily shareable, but maybe possible if needed. I can help to check or run commands on them if instructed.

We are missing something around this problem. I wanted to report what I found and continue the discussion.

Let me know what you think.

P.S: Another minor issue: why does odgi prune require -E for -c to work? That does not make sense to me. If I remove some nodes, it follows that I want to get rid of the associated edges as well. The current behavior is that if you specify only -c, it somehow thinks that, since no edges are being removed, you can not let the edges without their associated nodes, so it does not prune the nodes that match the criteria (thus the output graph is identical to the input). To me, this is not a sensible behavior. Why is it like that? I am probably missing something.

@subwaystation
Copy link
Member

@sivico26 Could you please share both graphs? You can drop a mail to [email protected].
On first glance, I would try odgi sort -O first, without the PG-SGD step. Then I would do odgi sort -Ygs.
Also did you try vg convert to obtain a GFAv1 file compatible with ODGI? Or how did you generate the graph?

@sivico26
Copy link
Author

Hi @subwaystation, thanks for the quick reply.

I am loading the graphs to our filesystem to see if I can send them that way.

The graphs come from using cactus and its progressive algorithm (it is a super-pangenome actually), which generates a .hal, then I used hal2vg and then vg convert to get the first .gfa. I then post-processed that graph with smoothxg and gfaffix. I made the mistake of not turning off the generation of the consensus paths when using smoothxg, so I need to prune those from the graph. I used odgi to remove the paths successfully, but then that left the 0 coverage nodes (that used to be crossed by consensus paths but not by any other paths), and now I am trying to remove those too.

I hope that helps.

@sivico26
Copy link
Author

sivico26 commented Jan 16, 2024

@subwaystation,

In theory, a link to download the graph should be in your mail. Let me know if it works.

odgi sort -O should work (it already did for me). Since I added -p gs too, the difference maker is Y.

@subwaystation
Copy link
Member

I downloaded your graph, need to run your commands next.

@subwaystation
Copy link
Member

subwaystation commented Jan 18, 2024

@sivico26 Using the most recent master of ODGI v0.8.4-2-g1e12685c, I was not even able to complete the odgi build step:

/usr/bin/time --verbose odgi build -g og_opt_transfer.gfa -o og_opt_transfer.og -t 28 -P
[odgi::gfa_to_handle] building nodes: 100.00% @ 1.46e+06 bp/s elapsed: 00:00:10:54 remain: 00:00:00:00
[odgi::gfa_to_handle] building edges: 100.00% @ 1.52e+06 bp/s elapsed: 00:00:14:38 remain: 00:00:00:00
[odgi::gfa_to_handle] building paths: 13.64% @ 3.53e-02 bp/s elapsed: 00:00:05:39 remain: 00:00:35:50
[odgi::gfa_to_handle] id parsing failure for path Hbul.Hbul_1_chr6H attempting to parse node id from ''
terminate called after throwing an instance of 'std::invalid_argument'
  what():  stoull
Command terminated by signal 6
        Command being timed: "odgi build -g og_opt_transfer.gfa -o og_opt_transfer.og -t 28 -P"
        User time (seconds): 3582.92
        System time (seconds): 643.94
        Percent of CPU this job got: 178%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 39:31.37
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 364934480
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 127634161
        Voluntary context switches: 152059032
        Involuntary context switches: 2896258
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Not sure if the file is corrupt, or does not fit the GFA specs. Which version of ODGI were you using?

@sivico26
Copy link
Author

Hi @subwaystation,

That's strange. odgi prune uses odgi build under the hood when the input is .gfa, right? If that is the case, it worked for me indirectly. The version I am using is v0.8.3-26-gbc7742ed, installed through conda.

Do you think it is related to #549? This is the same pruned graph I am referring to. It indeed deviates from GFA specs.

@subwaystation
Copy link
Member

I was expecting the raw, unpruned graph. But you already send me the pruned one?

@sivico26
Copy link
Author

sivico26 commented Jan 18, 2024

I realized that in your log odgi build failed while parsing the path Hbul.Hbul_1_chr6H. Following the commands described in #549, I can confirm this is the first path affected by the trailing ,. So it is very likely this is the problem.

In that case, running something like:

sed -E "s|,\t\*|\t\*|" og_opt_transfer.gfa > new_og_opt_transfer.gfa

Should do the trick

@sivico26
Copy link
Author

This is indeed the pruned graph. Sorry if it was not the desired one. I can send you the one before pruning. Should I proceed?

@subwaystation
Copy link
Member

Please do so! Thanks :)
This should also help @AndreaGuarracino to better understand the odgi prune problem. And we can find out, if odgi prune actually is the guilty one here.

@subwaystation
Copy link
Member

Hi @subwaystation,

That's strange. odgi prune uses odgi build under the hood when the input is .gfa, right? If that is the case, it worked for me indirectly. The version I am using is v0.8.3-26-gbc7742ed, installed through conda.

Do you think it is related to #549? This is the same pruned graph I am referring to. It indeed deviates from GFA specs.

While it uses odgi build before pruning, the graph after the pruning step is making the problems it seems.

@sivico26
Copy link
Author

sivico26 commented Jan 18, 2024

Yes, what is strange is odgi prune (or odgi view) writing problematic P lines after the pruning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants