You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i)Disk serialisation of index is significantly smaller than in-RAM
For eg on TB dataset:
(yoda: /nfs/leia/research/iqbal/bletcher/Gramtools/profiling_gramtools/simulated_reads_150_30_reference_9/gramtools_runs/gram_k10_04157)
Disk
RAM
0.2GB
1.5GB
This is most likely simply due to sdsl::bit_compress called on each of the paths, the sa_intervals and the kmer_stats (which allow matching up sa_intervals and paths for each instance of a kmer mapping to graph)
Index made by gramtools
build
is memory hungry.Two things:
i)Disk serialisation of index is significantly smaller than in-RAM
For eg on TB dataset:
(yoda: /nfs/leia/research/iqbal/bletcher/Gramtools/profiling_gramtools/simulated_reads_150_30_reference_9/gramtools_runs/gram_k10_04157)
This is most likely simply due to
sdsl::bit_compress
called on each of the paths, the sa_intervals and the kmer_stats (which allow matching up sa_intervals and paths for each instance of a kmer mapping to graph)But why not keep them compressed in RAM?
The compression seems only to be reducing number of bits to represent the integer's value:
http://algo2.iti.kit.edu/gog/docs/html/namespacesdsl_1_1util.html#ad5528f84e3036b9be3faf43a49f15b76
ii) Absolute index size
Most of the memory seems to lie in
SearchState
s (cf #142 )For TB genome of 4MB, we have an index of 1.5 GB in memory
For Plasmodium genome of 23MB, we have an index of ~60 GB in memory
Note in the latter case, it was ~80GB before cutting each uint64 in the
SearchState
struct to uint32.How can we do better?
Ideas:
The text was updated successfully, but these errors were encountered: