Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when running socket in run mode b with OFDFT. #50

Open
ltimmerman3 opened this issue Sep 21, 2024 · 1 comment
Open
Labels
bug Something isn't working

Comments

@ltimmerman3
Copy link
Collaborator

ltimmerman3 commented Sep 21, 2024

Describe the bug
ConnectionResetError due to SPARC exiting with exit code 139. Log files show segfault during socket function calls. So far, has only occurred with OFDFT.

To Reproduce
Provide a minimal list of settings / codes to help us debug, such as

  • Python 3.9.18
  • Version b6b9022
  • dev_SPARC
  • PACE Phoenix

Expected behavior
Geometry optimization on Si nanocluster

Actual output or error trace
sparc.log
[atl1-1-02-018-18-2:649235:0:649235] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid: 649230) ====
0 0x000000000005f10c ucs_callbackq_cleanup() ???:0
1 0x000000000005f2ca ucs_callbackq_cleanup() ???:0
2 0x000000000003e6f0 __GI___sigaction() :0
3 0x0000000000409f4e Calculate_local_kpoints() ???:0
4 0x00000000005cd934 reinit_mesh() ???:0
5 0x00000000005ce61e read_atoms_position_fom_socket() ???:0
6 0x00000000005cf9b2 main_Socket() ???:0
7 0x000000000040550d main() ???:0
8 0x0000000000029590 __libc_start_call_main() ???:0
9 0x0000000000029640 __libc_start_main_alias_2() :0
10 0x0000000000405535 _start() ???:0

socket.log
Accepting clients on UNIX-socket /tmp/ipi_sparc_ce1e0a
Close socket server
pted connection from
Driver: calculate
Driver: status
Driver: sendmsg 'STATUS'
Driver: recvmsg 'READY'
Driver: sendposdata
Driver: sendmsg 'POSDATA'
Driver: send 72 bytes of <class 'numpy.float64'>
Driver: send 72 bytes of <class 'numpy.float64'>
Driver: send 4 bytes of <class 'numpy.int32'>
Driver: send 120 bytes of <class 'numpy.float64'>
Driver: status
Driver: sendmsg 'STATUS'
Close socket server

Traceback
/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/calculators/socketio.py:364: UserWarning: Subprocess exited with status 139
warnings.warn('Subprocess exited with status {}'
Traceback (most recent call last):
File "/storage/coda1/p-amedford6/0/ltimmerman3/socketApplications/sparc_runs/OFDFT_run_mode_b/test_run_b.py", line 39, in
dyn.run(fmax=0.05)
File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/optimize/optimize.py", line 269, in run
return Dynamics.run(self)
File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/optimize/optimize.py", line 156, in run
for converged in Dynamics.irun(self):
File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/optimize/optimize.py", line 122, in irun
self.atoms.get_forces()
File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/atoms.py", line 788, in get_forces
forces = self._calc.get_forces(self)
File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/calculators/abc.py", line 23, in get_forces
return self.get_property('forces', atoms)
File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/calculators/calculator.py", line 737, in get_property
self.calculate(atoms, [name], system_changes)
File "/storage/home/hcoda1/9/ltimmerman3/p-amedford6-0/socketApplications/SPARC-X-API/sparc/calculator.py", line 532, in calculate
self._calculate_with_socket(
File "/storage/home/hcoda1/9/ltimmerman3/p-amedford6-0/socketApplications/SPARC-X-API/sparc/calculator.py", line 632, in _calculate_with_socket
ret = self.in_socket.calculate_origin_protocol(atoms[self.sort])
File "/storage/home/hcoda1/9/ltimmerman3/p-amedford6-0/socketApplications/SPARC-X-API/sparc/socketio.py", line 226, in calculate_origin_protocol
return self.protocol.calculate(atoms.positions, atoms.cell)
File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/calculators/socketio.py", line 189, in calculate
msg = self.status()
File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/calculators/socketio.py", line 152, in status
msg = self.recvmsg()
File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/calculators/socketio.py", line 62, in recvmsg
msg = self._recvall(12)
File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/calculators/socketio.py", line 51, in _recvall
chunk = self.socket.recv(remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

Python run file attached

Using SPARC

from sparc.calculator import SPARC
from ase import Atoms
from ase.io import read
from ase.build import molecule
from ase.optimize import BFGS
import numpy as np

Si = read('struct.in.traj')
Si.pbc = [True, True, True]

calc_params = {
"EXCHANGE_CORRELATION": "LDA_PZ",
"KPOINT_GRID": [1,1,1],
"MESH_SPACING": 0.35,
"MAXIT_SCF": 150,
"ELEC_TEMP_TYPE": "fermi-dirac",
"ELEC_TEMP": 100,
"ION_TEMP": 100,
"PRINT_RESTART_FQ": 10,
"PRINT_ATOMS": 1,
"PRINT_FORCES": 1,
"SPIN_TYP": 0,
"OFDFT_FLAG": 1,
"OFDFT_LAMBDA": 0.2,
"TOL_OFDFT": 1e-3,
}

with SPARC(use_socket=True, **calc_params) as calc:
# Execute single-point calculations
Si.calc = calc
#water.get_potential_energy()
dyn = BFGS(Si)
dyn.run(fmax=0.05)

@ltimmerman3 ltimmerman3 added the bug Something isn't working label Sep 21, 2024
@alchem0x2A
Copy link
Collaborator

Thx for the info, it seems the error is related to the function in socket C-code where the mesh is re-initialized, possibly need to update that function to match the parameters in ofdft, will look into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants