More fragmented assembly after updating from version 1.3.0 #50

ilyavs · 2021-09-13T15:24:59Z

Hello,
I have been using raven for a while and recently I reran an assembly of the same bacterial data with a newer version of raven and got a more fragmented genome. With version 1.3.0 I got the complete bacterial genome in one contig. With any later version I got the genome more fragmented and with a smaller total assembly size.
Is it possible to keep the improvements done in recent raven versions but restore the better contiguity observed in version 1.3.0?
Sorry but I can't share the data.
Thanks,
Ilya.

rvaser · 2021-09-13T15:45:22Z

Hi Ilya,
which versions have you tried so far? What data type do you have and how fragmented is the assembly? From version 1.4.x, bubble similarity check via minimizers was replaced with alignments, while versions 1.5.x have different repeat annotations to save execution time.

Best regards,
Robert

ilyavs · 2021-09-13T20:03:46Z

Hi,
Version 1.3.0 produced a 2.8 Mbp staph aureus genome. I tried versions 1.4.0, 1.5.1 and 1.5.3 (all via the docker images on quay.io). These versions were unable to produce the 2.8 Mbp genome contig. The largest contig was around 1 Mbp.
The data type is minion nanopore sequencing basecalled with guppy 4.2.2. The dataset has 3.6e8 bp in the fastq file.
Best,
Ilya.

rvaser · 2021-09-16T11:34:41Z

The data set seems it has enough coverage and not too bad accuracy, not sure why the latter versions do not work as 1.3.0. You could try v1.6.0 from branch options (you can also try different k,w values). Sorry for my delayed reply.

ilyavs · 2021-09-19T14:00:53Z

Can you please elaborate on how the k and w values are expected to affect the assembly?
When do you expect to have the next version released to bioconda?
Thanks,
Ilya.

rvaser · 2021-09-20T11:59:00Z

I have create a new release, it will be picked up automatically by bioconda soon.

Regarding parameters, I think you can first try with k = 19. We have recently evaluated higher k values (up to 25) on Guppy 5 data, which has tendency to increase contiguity. Earlier Raven versions used (k, w) = (29, 9) (option --weaken, now removed) for HiFi data to improve assembly. I am not sure how it will affect Guppy 4.x datasets, but your dataset is quite small so you can try a couple of values around the default (k, w) = (15, 5).

ilyavs · 2021-09-22T16:41:42Z

Thank you for the new release and information.
Version 1.6.0 assembled the complete 2.8 Mbp genome but failed to circularize the chromosome while version 1.3.0 assembled the complete 2.8 Mbp genome and circularized the chromosome.
In version 1.6.0 increasing the k value resulted in shorter largest contig.
In version 1.5.3 running with --weaken resulted in a 2.7 Mbp non circular largest contig.
So for now, it seems that version 1.3.0 is still the best option for my data, although version 1.6.0 comes close.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More fragmented assembly after updating from version 1.3.0 #50

More fragmented assembly after updating from version 1.3.0 #50

ilyavs commented Sep 13, 2021

rvaser commented Sep 13, 2021

ilyavs commented Sep 13, 2021

rvaser commented Sep 16, 2021

ilyavs commented Sep 19, 2021

rvaser commented Sep 20, 2021 •

edited

Loading

ilyavs commented Sep 22, 2021

More fragmented assembly after updating from version 1.3.0 #50

More fragmented assembly after updating from version 1.3.0 #50

Comments

ilyavs commented Sep 13, 2021

rvaser commented Sep 13, 2021

ilyavs commented Sep 13, 2021

rvaser commented Sep 16, 2021

ilyavs commented Sep 19, 2021

rvaser commented Sep 20, 2021 • edited Loading

ilyavs commented Sep 22, 2021

rvaser commented Sep 20, 2021 •

edited

Loading