Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More fragmented assembly after updating from version 1.3.0 #50

Open
ilyavs opened this issue Sep 13, 2021 · 6 comments
Open

More fragmented assembly after updating from version 1.3.0 #50

ilyavs opened this issue Sep 13, 2021 · 6 comments

Comments

@ilyavs
Copy link

ilyavs commented Sep 13, 2021

Hello,
I have been using raven for a while and recently I reran an assembly of the same bacterial data with a newer version of raven and got a more fragmented genome. With version 1.3.0 I got the complete bacterial genome in one contig. With any later version I got the genome more fragmented and with a smaller total assembly size.
Is it possible to keep the improvements done in recent raven versions but restore the better contiguity observed in version 1.3.0?
Sorry but I can't share the data.
Thanks,
Ilya.

@rvaser
Copy link
Collaborator

rvaser commented Sep 13, 2021

Hi Ilya,
which versions have you tried so far? What data type do you have and how fragmented is the assembly? From version 1.4.x, bubble similarity check via minimizers was replaced with alignments, while versions 1.5.x have different repeat annotations to save execution time.

Best regards,
Robert

@ilyavs
Copy link
Author

ilyavs commented Sep 13, 2021

Hi,
Version 1.3.0 produced a 2.8 Mbp staph aureus genome. I tried versions 1.4.0, 1.5.1 and 1.5.3 (all via the docker images on quay.io). These versions were unable to produce the 2.8 Mbp genome contig. The largest contig was around 1 Mbp.
The data type is minion nanopore sequencing basecalled with guppy 4.2.2. The dataset has 3.6e8 bp in the fastq file.
Best,
Ilya.

@rvaser
Copy link
Collaborator

rvaser commented Sep 16, 2021

The data set seems it has enough coverage and not too bad accuracy, not sure why the latter versions do not work as 1.3.0. You could try v1.6.0 from branch options (you can also try different k,w values). Sorry for my delayed reply.

@ilyavs
Copy link
Author

ilyavs commented Sep 19, 2021

Can you please elaborate on how the k and w values are expected to affect the assembly?
When do you expect to have the next version released to bioconda?
Thanks,
Ilya.

@rvaser
Copy link
Collaborator

rvaser commented Sep 20, 2021

I have create a new release, it will be picked up automatically by bioconda soon.

Regarding parameters, I think you can first try with k = 19. We have recently evaluated higher k values (up to 25) on Guppy 5 data, which has tendency to increase contiguity. Earlier Raven versions used (k, w) = (29, 9) (option --weaken, now removed) for HiFi data to improve assembly. I am not sure how it will affect Guppy 4.x datasets, but your dataset is quite small so you can try a couple of values around the default (k, w) = (15, 5).

@ilyavs
Copy link
Author

ilyavs commented Sep 22, 2021

Thank you for the new release and information.
Version 1.6.0 assembled the complete 2.8 Mbp genome but failed to circularize the chromosome while version 1.3.0 assembled the complete 2.8 Mbp genome and circularized the chromosome.
In version 1.6.0 increasing the k value resulted in shorter largest contig.
In version 1.5.3 running with --weaken resulted in a 2.7 Mbp non circular largest contig.
So for now, it seems that version 1.3.0 is still the best option for my data, although version 1.6.0 comes close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants