-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
racon_wrapper: Overflow and corrupting overlaps? #54
Comments
Probably bugs in the sampler, will fix it :) Best regards, |
Can you please try the latest commit? |
I run into the same problem. The latest commit did not help. Racon works, racon_wrapper not. Long reads don't have duplicate ids. PAF contains only IDs that are also in the long reads. Any chance to check which reads/alignments are causing this? Bests, |
Did you do a clean |
Fresh start from 378dd81 using git clone within a nightly container build. |
My idea was a dead end. Thought that the chunk size somehow influenced the whole thing but that is not the case. I is there a way to report the alignments / reads causing this? |
Your error is |
Yes,
|
I finally could give it a go, but now I am running into a new error: [RaconWrapper::run] preparing data with rampler When I run the same data with Racon directly it is able to load the files. |
@EinarBaldvin, please paste the whole command you were using (plus mapping). @fbemm, I'll investigate shortly. Sorry for the delay. |
@fbemm, shouldn't asm.0.fa be also in minimap2 command? Is that a typo or? |
Correct! Just a type <- Kinda this stuff ;) ... typo |
@fbemm, can you please run |
|
minimap2 -ax map-ont -L -I 120G -t $NCPUS asm.fasta reads.fq > reads2asm.sam Then kept only primary mappings before sam2paf with paftools.js in minimap2. I am currently rerunning the mapping with: minimap2 -ax map-ont -I 80G -t $NCPUS asm.fasta reads.fq > reads2asm.sam However, same dataset two behaviors: racon_wrapper -m 8 -x -6 -g -8 -w 500 -t 96 --subsample 4400000000 30 --split 500000000 --cudapoa-batches 50 --cudaaligner-batches 10 reads.fastq reads2asm.paf asm.fasta > asm_polished.fasta [RaconWrapper::run] preparing data with rampler racon -m 8 -x -6 -g -8 -w 500 -t 64 --cudapoa-batches 50 --cudaaligner-batches 10 reads.fastq reads2asm.paf asm.fasta > asm_polished.fasta Using 1 GPU(s) to perform polishing head -n 1 reads2asm.paf head -n 1 asm.fasta
head -n 1 reads.fastq @rvaser Should I drop the -c/a flag, base-level alignment, when mapping? |
@fbemm, looks fine. @EinarBaldvin, if you are converting to paf anyway, then do not run minimap2 with -a/-c. |
I managed to use the wrapper on chm13 draft genome and some reads, tried split with size of the largest contig and that size divided by ten, in both instances wrapper is running as intended. The error "overlap not transmuted" means that one or both sequences in the overlap are missing from the sequence/target file. @fbemm, @EinarBaldvin, please comment out https://github.com/lbcb-sci/racon/blob/master/scripts/racon_wrapper.py#L53-L56 and put |
Actually, I get the same error at the last chunk. @fbemm, does the error on your data occur immediately on the first chunk or? |
From looking at the run time I would rather say it is a later one, if not the last. Restarted with the edited wrapper now. Can take a while. |
You do not have to run it again, I found the bug and will resolve it soon. |
Please try the latest commit. |
Running. Will take a while. Thx! |
Do you have a time limit on the server? Weird that there is nothing printed afterwards. |
No, the same thing happens when executed manually. Should I comment in the pass procedure and rerun? |
How many chunks are created and finished successfully? You can put |
How do I figure that out? There is a racon workdir created but with only the chunked fasta files inside. |
You can count number of |
Just a suspicion, it might be that the error only occurs if --split is empty or zero. Seems as racon than splits in n chunks and n corresponds to the number of sequences in the assembly. Or it actually happens only if split number equals sequence number. |
Suspicion is wrong. |
Did you run it again and have the working folder? Did you run the chunk that breaks? |
Similar/same issue at isovic/racon#194, I think I fixed it with the latest commit. Please try. |
Hi Robert,
I am having some interesting issues with both split and subsample in the wrapper. I am using Racon v1.4.21.
1st: When I use subsample, it seems that my genome size could be causing a 32bit overflow in the subsampler.
2nd: Using split results in "[racon::Overlap::find_breaking_points] error: overlap is not transmuted!" error.
Below is an example with both parameters set:
racon_wrapper -m 8 -x -6 -g -8 -w 500 -t 2 –subsample 4400000000 70 –split 500000000 –cudapoa-batches 50 –cudaaligner-batches 10 raw_readID_integer_whitespaceRemoved.fastq default_intRead_int_reads2asm_SecNo.paf default.pol2.fasta
[RaconWrapper::run] preparing data with rampler
[RaconWrapper::run] total number of splits: 8
[RaconWrapper::run] processing data with racon
[racon::Polisher::initialize] loaded target sequences 1.991097 s
[racon::Polisher::initialize] loaded sequences 12.319651 s
[racon::Polisher::initialize] loaded overlaps 71.611163 s
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
Fastq is 600 GB
paf ~100 GB
Assembly ~ 3.7 Gbp
We could get Racon running without the wrapper, with the same datasets, same compilation and on the same hardware. (Currently still running... )
Any ideas to what is happening in the wrapper?
Best regards,
Einar
The text was updated successfully, but these errors were encountered: