Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QUESTION: What's the largest genome that end-users have assembled with RAVEN? #36

Open
cement-head opened this issue Feb 17, 2021 · 7 comments

Comments

@cement-head
Copy link

  1. What's the largest genome that end-users have assembled with RAVEN?
  2. Did you use the GPU version (built for CUDA/GPU)
  3. What were your options, if any?
  4. How long did it take?
  5. Approximately how big was your computer?
@rvaser
Copy link
Collaborator

rvaser commented Feb 17, 2021

Here is the preprint: https://www.biorxiv.org/content/10.1101/2020.08.07.242461v1. Although, the version in the benchmark is 1.1.10, and versions 1.3.0 and upwards use far less memory. We should update the preprint soon. Answers:

  1. I think 3Gbp (haploid) size, not sure tho.
  2. We did not benchmark with CUDA enabled.
  3. No additional options, only number of threads.
  4. Depends on coverage, see preprint.
  5. 1TB RAM/128 cores (run on 64 threads).

@cement-head
Copy link
Author

cement-head commented Mar 3, 2021

Okay, we just did a 6.0 Gbp beastie; but RAVEN gave us just over 7.0 Gbp.

Took five days, 2 TB ECC RAM; 124 threads; two CUDAS (RTX TITANS used for polishing; -c=100)


Given that the assembly is a little large, I'm wondering if I should change any of these three parameters, and whether or not you'd have some recommendations?

-m, --match <int>
      default: 3
      score for matching bases
    -n, --mismatch <int>
      default: -5
      score for mismatching bases
    -g, --gap <int>
      default: -4
      gap penalty (must be negative)

@cement-head
Copy link
Author

Also, would increasing the rounds of polishing (RACON) drastically improve the assembly?

@cement-head
Copy link
Author

Okay - got 0.1% Complete with a BUSCO analysis. Something is wrong, would you suggest increasing the penalty for the mismatch score?

@rvaser
Copy link
Collaborator

rvaser commented Mar 7, 2021

Can you print the assembly statistics (length/#contigs/NX/NGX)? Which sequencing technology are you using? What is the sequencing depth? The BUSCO score is abysmal, not sure if changing alignment parameters will help. Running more than 2 iterations of Racon will not increase the accuracy by much either.

Sorry for my late reply!
Best regards,
Robert

P.S. You can also paste here the log Raven created.

@cement-head
Copy link
Author

Technology is PacBioSII CLR with the N50 of the raw reads >36Kbp.

The coverage is about 70x.

Q: Would adjusting the -m, -n, -g parameters improve assembly?

What file is the RAVEN logfile?

Here's the QUAST analysis; the # of contigs is good-ish, but the N50 isn't the greatest:

Assembly                    raven_asm 
# contigs (>= 0 bp)         25505     
# contigs (>= 1000 bp)      25505     
# contigs (>= 5000 bp)      25505     
# contigs (>= 10000 bp)     25504     
# contigs (>= 25000 bp)     25504     
# contigs (>= 50000 bp)     25473     
Total length (>= 0 bp)      7048262437
Total length (>= 1000 bp)   7048262437
Total length (>= 5000 bp)   7048262437
Total length (>= 10000 bp)  7048257309
Total length (>= 25000 bp)  7048257309
Total length (>= 50000 bp)  7046876721
# contigs                   25505     
Largest contig              3296975   
Total length                7048262437
GC (%)                      43.05     
N50                         337254    
N75                         208232    
L50                         6350      
L75                         13031     
# N's per 100 kbp           0.00

@rvaser
Copy link
Collaborator

rvaser commented Mar 11, 2021

The log is outputed to stderr. I am not sure if changing alignment parameters will help at all. The assembly is quite fragmented which might be the reason for bad BUSCO performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants