Runtime? #58
-
A few practical questions:
Any rough estimates or pointers to the documentation would be highly appreciated! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
It would be quite varied and depend on where you run it (spec, cloud vs HPC), species of interest (repeat content, size), and data (coverage, length) |
Beta Was this translation helpful? Give feedback.
-
Hi Ilante, The actual computing cost is very hard to predict for genome assembly. repeat content does matter the most. For example, maize (2G) is actually more expensive than mammalian (3G). All genomes I ever work with that are bigger than 6G are always repetitive. However, some insect or algae genome even if it's just 1G could be repetitive as well. |
Beta Was this translation helpful? Give feedback.
Hi Ilante,
Just roughly. These are based on amount of data.
Falcon (and most OLC based assemblers) O(n^2)
Arrow polishing/ Freebayes polishing and most mapper O(n); note that the main reason these two steps are expensive is because memory requirement
The rest of them are not expensive, so I never look in detail. I would guess they are O(n) too.
The actual computing cost is very hard to predict for genome assembly. repeat content does matter the most. For example, maize (2G) is actually more expensive than mammalian (3G). All genomes I ever work with that are bigger than 6G are always repetitive. However, some insect or algae genome even if it's just 1G could be repetitive as well.