aomoints memory usage #11

solomonik · 2016-01-11T16:09:55Z

Memory usage spikes during calculation of aomoints, outside of CTF write calls (which are now buffered based on available memory). From my email on 2.10.2015,

When running w15-cc-pVDZ on 384 cores with 1 process/core (which used to work in the past), my debugging output is telling me the following (first explanation then output below).

Here tot_mem_used keeps track of how much memory is used to store all CTF tensors on processor 0. The code successfully executes the write on line 920 of aomoints.cxx, but crashes immediately thereafter. Previously, the crash was inside the write, but with buffering the write works. However, the memory used is for some reason really high when the write occurs. It seems to be that Aquarius allocates something of size 1.5GB, while the write is of size (max over all processes) of .27 GB. My guess is that aomoints then runs out of memory right after the write, but I am not sure (getting a valgrind trace would take forever for this).

...
tot_mem_used = 4.61970E+08/5.20561E+08, proc_bytes_available() = 1.59376E+09
tot_mem_used = 4.68126E+08/5.26718E+08, proc_bytes_available() = 1.58760E+09
Performing write of 102600 (max 102600) elements (max mem 1.6E+06) in 1 parts 1.58550E+09 memory available, 5.28815E+08 used
max received elements is 270, mine are 270
Completed write of 102600 elements
Performing write of 102600 (max 102600) elements (max mem 1.6E+06) in 1 parts 1.58468E+09 memory available, 5.29636E+08 used
max received elements is 270, mine are 270
Completed write of 102600 elements
Performing write of 21600 (max 21600) elements (max mem 3.5E+05) in 1 parts 1.58543E+09 memory available, 5.28884E+08 used
max received elements is 60, mine are 60
Completed write of 21600 elements
Performing write of 21600 (max 21600) elements (max mem 3.5E+05) in 1 parts 1.58526E+09 memory available, 5.29057E+08 used
max received elements is 60, mine are 60
Completed write of 21600 elements
... // printfs from aomoints.cxx:920 here, most processes writing about 1.7M elements.
Performing write of 0 (max 17099072) elements (max mem 2.7E+08) in 4088 parts 4.87430E+07 memory available, 2.06557E+09 used
Completed write of 0 elements
// segfault here

aomoints is also clearly using more memory than before

For instance, in 2014 for w20 cc-pVDZ on 256 nodes of Edison with 1024 processes 6 threads/process,

Wed Mar 26 02:05:31 2014: Starting task: aomoints
Wed Mar 26 02:06:17 2014: Finished task: aomoints in 46.402 s
Wed Mar 26 02:06:17 2014: Task: aomoints achieved 5203.736 Gflops/sec

and now

Sun Jan 10 20:25:49 2016: Starting task: aomoints
Sun Jan 10 20:26:46 2016: Finished task: aomoints in 56.492 s
Sun Jan 10 20:26:46 2016: Task: aomoints achieved 1.133 Gflops/sec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aomoints memory usage #11

aomoints memory usage #11

solomonik commented Jan 11, 2016

aomoints memory usage #11

aomoints memory usage #11

Comments

solomonik commented Jan 11, 2016