-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monitor observables every N
epochs
#573
Comments
Hi @elcorto, thanks for raising this issue! I generally agree that having a mechanic that would track a targeted metric only after |
@RandomDefaultUser Sure, I can integrate this, I already have this on my branch |
Great, thank you! |
I implemented it in #584 in a way that LDOS error is evaluated every epoch, while other metrics only every |
Thanks @nerkulec for this addition. I took the liberty to re-open this issue such that we can discuss this (which should be quickly resolved). So to make sure I understand: There are two new parameters:
I was under the impression that all of those need to be added, in the So as I understand #584:
Given this new feature, what is the difference between I think what I had in mind was this workflow, based on using shuffled snapshots by default:
So, for certain |
I am wondering if maybe this is a discussion we should have during a meeting (or potentially the design workshop?) The current implementation only allows for either shuffled validation snapshots and no observables or unshuffled validation snapshots and observables, just as you have mentioned. I am wondering though what the intended use should be. I personally always use shuffled validation snapshots and no observables, but as I understand both you and @nerkulec use unshuffled validation snapshots and observables (or would like to at least incorporate that into the process). In that case, it may make sense to modify the entire interface, and subsume such a change to larger modifications of the data management/training subroutine? What do you think? |
I agree that this is best discussed F2F. I'd volunteer to document the current state just as you summarized above, afterwards I think we can close this issue. To do this, there is for me still the question what the difference between the new |
@elcorto The difference between |
This issue is not a blocker for the 1.3.0 release, but I would propose to keep it open until we have documented the workflow enabled by the new features from #584. Of course, if you consider those settings experimental or not to be set by users, please feel free to close this issue, since some form of "every N epochs" is implemented, albeit not (I think) the one I outlined above. I offered to document things, but ATM I'm still not clear on when to use which setting (
In terms of docs, currently
|
Regarding docstrings and documentation, I believe that's done in #609. Please tell me if something is missing. |
Yes, sorry, I just saw #609. With the help of that, one needs to go ahead and use the settings in production in order to suggest doc improvements, so let's close this one then. |
When using
during_training_metric
, the respective quantity is calculated in every epoch, which may be costly ifduring_training_metric="total_energy"
.When using shuffled snapshots, adding the required
calculation_output_file
as inmay not be valid since the reference data in
Be_snapshot1.out
doesn't match the validation data. I'm not sure what data is read from this file, so this may or may not be a problem, but in any case one must provide some file here, else we seeException: Could not guess type of additional calculation data provided to MALA.
.In addition, #571 and #572 make it hard to use the feature in production at the moment.
So, is there a way to do something like
examples/basic/ex02_test_network.py
everyN
epochs only, where one defines non-shuffled test snapshots plus reference datacalculation_output_file="/path/to/qe.out"
. This would be independent of the validation data (one could call it second validation data set) and save compute as well.The text was updated successfully, but these errors were encountered: