Large memory usage by gIMble #124

MengLu-flw · 2023-12-13T12:01:25Z

MengLu-flw
Dec 13, 2023

Hi :^)

Thanks for creating this very neat program! All instructions (manual) given on the GitHub main page are clear to follow.

I have been running gIMble on a cluster with Slurm, and I found that it can take extremely large memory (more than 500 G of maxRSS) when running some commands (especially window & tally).

Detailed descriptions:

My datasets
Dataset 1

Filtered VCF (gimble.vcf.gz) contains 20,322,365 SNPs
11 scaffolds/chromosomes were included in gimble.genomefile
18 samples in population A, 18 samples in population B > 324 heterospecific sample pairs

Dataset 2

Filtered VCF (gimble.vcf.gz) contains 13,203,623 SNPs
21 scaffolds/chromosomes were included in gimble.genomefile
17 samples in population A, 23 samples in population B > 391 heterospecific sample pairs

MaxRSS captured by Slurm
For Dataset 1
gimbleprep - 6.26 G
gimble parse - 210.24 G
gimble block (-l 64) - 86.61 G
gimble window - 541.51G
gimble tally - 544.45 G
gimble optimize - 1.55 G

For Dataset 2
gimbleprep - 24.16 G
gimble parse - 747.44 G
gimble block (-l 64) - 282.13 G
gimble window - 1788.34 G
gimble tally - 1780.72 G
gimble optimize - 1.59 G

I was wondering if this behavior is normal and which aspect of my data caused such a large MEM consumption (I thought it would be the number of heterospecific sample pairs)?

Thank you so much! Looking forward to your reply :^)

Gratefully,
Meng

Answered by DRL

Dec 13, 2023

Hi Meng,

that is somewhat to be expected. The way things are being saved righht now is not yet "optimal" and we are thinking about improving the datastructure(s), both those kept in memory during computations and those saved to store, in future releases.

In the meantime the easiest way to deal with this is making window-wise analysis on a scaffold/chromosome basis. So you make a new gimble store for each sequence by editing the genomefile to only include that sequence.

Hope that helps,

dom

View full answer

DRL · 2023-12-13T12:11:21Z

DRL
Dec 13, 2023
Maintainer

Hi Meng,

that is somewhat to be expected. The way things are being saved righht now is not yet "optimal" and we are thinking about improving the datastructure(s), both those kept in memory during computations and those saved to store, in future releases.

In the meantime the easiest way to deal with this is making window-wise analysis on a scaffold/chromosome basis. So you make a new gimble store for each sequence by editing the genomefile to only include that sequence.

Hope that helps,

dom

1 reply

MengLu-flw Dec 14, 2023
Author

Hi Dominik,

Thank you so much for your prompt reply and good advice!

Best,
Meng

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large memory usage by gIMble #124

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Large memory usage by gIMble #124

MengLu-flw Dec 13, 2023

Replies: 1 comment · 1 reply

DRL Dec 13, 2023 Maintainer

MengLu-flw Dec 14, 2023 Author

MengLu-flw
Dec 13, 2023

Replies: 1 comment 1 reply

DRL
Dec 13, 2023
Maintainer

MengLu-flw Dec 14, 2023
Author