Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aomoints memory usage #11

Open
solomonik opened this issue Jan 11, 2016 · 0 comments
Open

aomoints memory usage #11

solomonik opened this issue Jan 11, 2016 · 0 comments

Comments

@solomonik
Copy link
Collaborator

Memory usage spikes during calculation of aomoints, outside of CTF write calls (which are now buffered based on available memory). From my email on 2.10.2015,

When running w15-cc-pVDZ on 384 cores with 1 process/core (which used to work in the past), my debugging output is telling me the following (first explanation then output below).

Here tot_mem_used keeps track of how much memory is used to store all CTF tensors on processor 0. The code successfully executes the write on line 920 of aomoints.cxx, but crashes immediately thereafter. Previously, the crash was inside the write, but with buffering the write works. However, the memory used is for some reason really high when the write occurs. It seems to be that Aquarius allocates something of size 1.5GB, while the write is of size (max over all processes) of .27 GB. My guess is that aomoints then runs out of memory right after the write, but I am not sure (getting a valgrind trace would take forever for this).

...
tot_mem_used = 4.61970E+08/5.20561E+08, proc_bytes_available() = 1.59376E+09
tot_mem_used = 4.68126E+08/5.26718E+08, proc_bytes_available() = 1.58760E+09
Performing write of 102600 (max 102600) elements (max mem 1.6E+06) in 1 parts 1.58550E+09 memory available, 5.28815E+08 used
max received elements is 270, mine are 270
Completed write of 102600 elements
Performing write of 102600 (max 102600) elements (max mem 1.6E+06) in 1 parts 1.58468E+09 memory available, 5.29636E+08 used
max received elements is 270, mine are 270
Completed write of 102600 elements
Performing write of 21600 (max 21600) elements (max mem 3.5E+05) in 1 parts 1.58543E+09 memory available, 5.28884E+08 used
max received elements is 60, mine are 60
Completed write of 21600 elements
Performing write of 21600 (max 21600) elements (max mem 3.5E+05) in 1 parts 1.58526E+09 memory available, 5.29057E+08 used
max received elements is 60, mine are 60
Completed write of 21600 elements
... // printfs from aomoints.cxx:920 here, most processes writing about 1.7M elements.
Performing write of 0 (max 17099072) elements (max mem 2.7E+08) in 4088 parts 4.87430E+07 memory available, 2.06557E+09 used
Completed write of 0 elements
// segfault here

aomoints is also clearly using more memory than before

For instance, in 2014 for w20 cc-pVDZ on 256 nodes of Edison with 1024 processes 6 threads/process,

Wed Mar 26 02:05:31 2014: Starting task: aomoints
Wed Mar 26 02:06:17 2014: Finished task: aomoints in 46.402 s
Wed Mar 26 02:06:17 2014: Task: aomoints achieved 5203.736 Gflops/sec

and now

Sun Jan 10 20:25:49 2016: Starting task: aomoints
Sun Jan 10 20:26:46 2016: Finished task: aomoints in 56.492 s
Sun Jan 10 20:26:46 2016: Task: aomoints achieved 1.133 Gflops/sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant