Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fastmultigather gets killed because of memory usage #273

Open
AnneliektH opened this issue Mar 12, 2024 · 1 comment
Open

fastmultigather gets killed because of memory usage #273

AnneliektH opened this issue Mar 12, 2024 · 1 comment

Comments

@AnneliektH
Copy link

Trying to run a multifastgather of genbank-viral-database against viral sequences (~56.000 fasta sequences)
When using 1 fasta sequence it does work and does not run out of memory, but I wanted it to run against the whole zip file of fastas. Tried with up to 250GB of mem, but keeps being OOM killed.

running this in: /group/ctbrowngrp2/scratch/annie/2023-swine-sra/sourmash/viral_taxonomy/genbank

command:

/usr/bin/time -v sourmash scripts fastmultigather \
> vOTUs.k21.s100.zip \
> genbank.2023-05.viral.dna-k21-sc100.rocksdb \
> -c 4 -k 21 -t 300 -s 100 -o votus.x.genbank.csv

output:

== This is sourmash version 4.8.6. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

=> sourmash_plugin_branchwater 0.9.1; cite Irber et al., doi: 10.1101/2022.11.02.514947

ksize: 21 / scaled: 100 / moltype: DNA / threshold bp: 300.0
gathering all sketches in 'vOTUs.k21.s100.zip' against 'genbank.2023-05.viral.dna-k21-sc100.rocksdb' using
 4 threads
Loaded DB
Reading query(s) from: 'vOTUs.k21.s100.zip'
Loaded 56816 query signature(s)
Command terminated by signal 9
        Command being timed: "sourmash scripts fastmultigather vOTUs.k21.s100.zip genbank.2023-05.viral.dn
a-k21-sc100.rocksdb -c 4 -k 21 -t 300 -s 100 -o votus.x.genbank.csv"
        User time (seconds): 1917.82
        System time (seconds): 79.46
        Percent of CPU this job got: 388%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 8:34.61
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 52269600
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 77016
        Minor (reclaiming a frame) page faults: 13193732
        Voluntary context switches: 79240
        Involuntary context switches: 57217
        Swaps: 0
        File system inputs: 5196344
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
@ctb
Copy link
Collaborator

ctb commented Mar 13, 2024

@mr-eyes reported something similar in #268. I wonder if maybe we are loading all the queries into memory?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants