Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Energy density t-moves related crash in DMC #5245

Open
prckent opened this issue Nov 27, 2024 · 6 comments
Open

Energy density t-moves related crash in DMC #5245

prckent opened this issue Nov 27, 2024 · 6 comments
Labels

Comments

@prckent
Copy link
Contributor

prckent commented Nov 27, 2024

Describe the bug
Attempts to run the example in #5214 are unsuccessful. In a pure CPU build (GCC14, real, MPI), I get a reliable SEGV after a few blocks of DMC when the energy density estimator is enabled. I could not get runs without the energy density estimator to crash, including runs with no estimators. I also tried putting many small VMC sections ahead of the DMC section, but could not get a crash in VMC, only the DMC. Crashes were obtained with 16xMPI 1 thread each, 4xMPI 4 threads each, 1x MPI 16 threads, and 1 x MPI 1 thread.

To Reproduce
Modify qmcpack/nexus/examples/qmcpack/rsqmc_misc/estimators/iron_ldaU_dmc.py to run calculations by setting generate only = 0; run. This needs both QE and QMCPACK. Starting from scratch, the first crash takes O(1h). The actual DMC crash can be rigged to occurs within minutes.

I can provide just the generated inputs including jastrow & orbital files if preferred.

Unhelpful error:

nitrogen2:3382958] *** Process received signal ***
[nitrogen2:3382958] Signal: Segmentation fault (11)
[nitrogen2:3382958] Signal code: Address not mapped (1)
[nitrogen2:3382958] Failing at address: (nil)
[nitrogen2:3382958] [ 0] /lib64/libc.so.6(+0x3e730)[0x7f1c1be3e730]
[nitrogen2:3382958] [ 1] qmcpack[0x989b1f]
[nitrogen2:3382958] [ 2] qmcpack[0x87567b]
[nitrogen2:3382958] [ 3] qmcpack[0x802eab]
[nitrogen2:3382958] [ 4] qmcpack[0x6510ff]
[nitrogen2:3382958] [ 5] qmcpack[0x642637]
[nitrogen2:3382958] [ 6] qmcpack[0x63caa8]
[nitrogen2:3382958] [ 7] /home/pk7/apps/spack/opt/spack/linux-rhel9-zen3/gcc-11.5.0/gcc-14.2.0-5c6egxwthhh2tklbcegw5y7yjk2me35s/lib64/libgomp.so.1(GOMP_parallel+0x46)[0x7f1c1ef355e6]
[nitrogen2:3382958] [ 8] qmcpack[0x63e086]
[nitrogen2:3382958] [ 9] qmcpack[0x527cdb]
[nitrogen2:3382958] [10] qmcpack[0x52be66]
[nitrogen2:3382958] [11] qmcpack[0x52f920]
[nitrogen2:3382958] [12] qmcpack[0x4d8a93]
[nitrogen2:3382958] [13] /lib64/libc.so.6(+0x295d0)[0x7f1c1be295d0]
[nitrogen2:3382958] [14] /lib64/libc.so.6(__libc_start_main+0x80)[0x7f1c1be29680]
[nitrogen2:3382958] [15] qmcpack[0x51c625]

Typical output:

 branching cutoff scheme = classic
  branch cutoff, max      = 5.0000e+01 7.5000e+01
  QMC Status (BranchMode) = 0000001101
===================================================================
--- Memory usage report : DMCBatched after initialLogEvaluation ---
===================================================================
Available memory on node 0, free + buffers :   79911 MiB
Memory footprint by rank 0 on node 0       :     627 MiB
===================================================================
Completed block    1 of 5 average 2.453 secs/block after 232.7 secs
Completed block    2 of 5 average 2.428 secs/block after 235.1 secs

Expected behavior
No crash

System:

nitrogen2, nightly "gcc new mpi" configuration with GCC 14.2.0, OpenMPI etc.

@prckent prckent added the bug label Nov 27, 2024
@prckent
Copy link
Contributor Author

prckent commented Nov 27, 2024

Quick follow-up: Interestingly, switching non-local moves from v3 (used in all the reported crashes) to 'no' made the crash go away. 'v0' restores the crash => there is an issue when t-moves are used with the energy density.

@prckent prckent changed the title Energy density related crash in DMC Energy density t-moves related crash in DMC Dec 2, 2024
@prckent
Copy link
Contributor Author

prckent commented Dec 2, 2024

If this is not known to work in legacy and/or is not immediately needed, having the energy density work only for locality approximation could be listed as a "known limitation", i.e. the current issue is only a bug in that we claim it is supported when it does not.

@PDoakORNL @jtkrogel What do we know of the status of energy density in legacy with different locality schemes, if anything, and what is needed in the immediate future?

@PDoakORNL
Copy link
Contributor

Via hand built vmc-dmc input with offload on nvidia I don't have this crash, I'm still looking at it.

@prckent
Copy link
Contributor Author

prckent commented Dec 3, 2024

I suggest a regular CPU build.

@PDoakORNL
Copy link
Contributor

I can reproduce it with CPU build. Looking at it in the debugger now.

@PDoakORNL
Copy link
Contributor

Working on the fix now, updates to many QMCHamiltonian potentials such as LocalECPotential, CoulombPotential will be necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants