You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Duplicate/related to #214. I think this feature exists somewhere, but does not appear to be implemented yet.
Is your feature request related to a problem? Please describe.
I want the ability to restart AFQMC simulations from a previously saved walker state. Right now, if a run is interrupted or needs to be extended, there’s no way to resume from where the walkers left off. This makes things inefficient because I either lose the work already done or have to restart the whole simulation from scratch.
Describe the solution you’d like
A way to save the walker state (positions, weights, overlaps, etc.) at a given point in the simulation and then load that state to resume later. Ideally, this should work seamlessly without resetting things like population control or other runtime parameters.
Describe alternatives you’ve considered
I tried manually saving walker data and restoring it using dicts, but it is feels clunky/inefficient and it's really more of a hack. Also, it’s not clear if population control and other internal states are being properly re-initialized when restarting this way. This is what I’ve been doing:
This works to some extent, as the energy is mostly correct but it’s not efficient and doesn’t handle everything cleanly. Like the walker weights are not correct on restart, cause i'm not totally sure what attributes to pass or what to let the system repopulate. Letting run handle population control after restoring walkers also feels like it’s reinitializing some things unnecessarily.
Additional context
This feature would make simulations much easier to manage, especially when running on systems with time limits or when simulations are interrupted. Being able to checkpoint and restart would save a lot of effort. Something like saving walkers to a binary file and having a dedicated method to reload and resume would be ideal.
The text was updated successfully, but these errors were encountered:
firstly, if you saved weights and walkers's phi. the data should be able to connect with the 1st round job. but maybe recomputing green's function and overlap before the executing the new afqmc run step is crutial.
FYI, here is the previous implementation for restarting, apparently the interfaces need to be updated to make it work again.
Long time ago, I tried h5 with mpi driver for writing and reading walkers, but I didn't make it work, I think it also required a properly installed hdf5 with mpi support.
so I finally ended up with creating a separate h5 file for one rank.
currently I don't have the bandwidth but I am happy to be involved to make this feature work.
Duplicate/related to #214. I think this feature exists somewhere, but does not appear to be implemented yet.
Is your feature request related to a problem? Please describe.
I want the ability to restart AFQMC simulations from a previously saved walker state. Right now, if a run is interrupted or needs to be extended, there’s no way to resume from where the walkers left off. This makes things inefficient because I either lose the work already done or have to restart the whole simulation from scratch.
Describe the solution you’d like
A way to save the walker state (positions, weights, overlaps, etc.) at a given point in the simulation and then load that state to resume later. Ideally, this should work seamlessly without resetting things like population control or other runtime parameters.
Describe alternatives you’ve considered
I tried manually saving walker data and restoring it using
dict
s, but it is feels clunky/inefficient and it's really more of a hack. Also, it’s not clear if population control and other internal states are being properly re-initialized when restarting this way. This is what I’ve been doing:This works to some extent, as the energy is mostly correct but it’s not efficient and doesn’t handle everything cleanly. Like the walker weights are not correct on restart, cause i'm not totally sure what attributes to pass or what to let the system repopulate. Letting run handle population control after restoring walkers also feels like it’s reinitializing some things unnecessarily.
Additional context
This feature would make simulations much easier to manage, especially when running on systems with time limits or when simulations are interrupted. Being able to checkpoint and restart would save a lot of effort. Something like saving walkers to a binary file and having a dedicated method to reload and resume would be ideal.
The text was updated successfully, but these errors were encountered: