28 add md relevant outputs #87

JFRudzinski · 2024-06-13T09:09:43Z

new PR for #28 since the first one is somehow broken

JFRudzinski · 2024-06-21T06:47:03Z

@JosePizarro3 @ndaelman-hu Could you do a preliminary brief check of the structure within energies.py. My main question at the moment is how we deal with the organization of "sub-contributions".

Right now, all energies fall under total_energies.contributions. On the classical side, I have made this a flat list, e.g., ElectrostaticEnergy, ElectrostaticShortRangeEnergy, ElectrostaticLongRangeEnergy all fall directly under TotalEnergy, even though 2 are subsets of the third. I find it a bit easier this way since it's not a given that users/codes always store all of these options.

Will this work for the quantum case? Looking at the old schema (old quantities that I have not yet converted are commented out in the file) it seems like there are a lot of sub-structures in terms of the energies. How much are these used? Can you conceive a good functionality with flat lists?

JosePizarro3 · 2024-06-24T11:46:14Z

@JosePizarro3 @ndaelman-hu Could you do a preliminary brief check of the structure within energies.py. My main question at the moment is how we deal with the organization of "sub-contributions".

Right now, all energies fall under total_energies.contributions. On the classical side, I have made this a flat list, e.g., ElectrostaticEnergy, ElectrostaticShortRangeEnergy, ElectrostaticLongRangeEnergy all fall directly under TotalEnergy, even though 2 are subsets of the third. I find it a bit easier this way since it's not a given that users/codes always store all of these options.

Will this work for the quantum case? Looking at the old schema (old quantities that I have not yet converted are commented out in the file) it seems like there are a lot of sub-structures in terms of the energies. How much are these used? Can you conceive a good functionality with flat lists?

Ok, let me check. I'd like to see how much of the quantum energies were restructured into ElectronicEigenvalues. I think there is some inconsistency there (from the legacy schema), as I would consider Energy to be a global quantity, not refering to the eigenvalues, and it might be that in the run schema it was not like this.

ndaelman-hu

Just some quick takes, not a thorough review.

My main impression is that there's some confusion between classical vs quantum and potential vs energy. Regarding the former, it's very simple: all classical terms have a quantum counterpart. I think it makes more sense to distinguish the systems for which we specify the energy, e.g. electrons, nuclei, both, external.

Also note that magnetism is completely left out.

ndaelman-hu · 2024-06-25T11:47:04Z

src/nomad_simulations/properties/energies.py

+class ClassicalEnergy(Energy):
+    """
+    Abstract physical property section describing some classical energy of a (sub)system.
+    """
+
+    def normalize(self, archive, logger) -> None:
+        super().normalize(archive, logger)
+
+        if not self.type:
+            self.type == 'classical'
+        elif self.type != 'classical':
+            logger.error(
+                f"Misidentified type for classical energy."
+            )
+
+class QuantumEnergy(Energy):
+    """
+    Abstract physical property section describing some quantum energy of a (sub)system.
+    """
+
+    def normalize(self, archive, logger) -> None:
+        super().normalize(archive, logger)
+
+        if not self.type:
+            self.type == 'quantum'
+        elif self.type != 'quantum':
+            logger.error(
+                f"Misidentified type for quantum energy."
+            )


Tbh, I'm not sure what would constitute the boundary between classical and quantum energies.
Every classical energy has a quantum variant. There are some unique quantum phenomena that don't have a classical counterpart, but it's so few that I would not distinguish at this level.

The only example I can think of now is exchange energy.

I definitely understand your point. My reasons for proposing to distinguish between the two were:

To help clarify the origin of certain energy contributions in the case of a hybrid model. E.g., you may have a QM/MM model and want to store a particular contribution to the total energy from the QM and MM parts of your model separately.

I will argue against myself here and say that the proper way to do this would be with the method_ref, which is true, but this is a bit opaque for a user. Having the type quantity makes it easier to see quickly what level of model one used to get this output.

There are certain energy contributions, e.g., kinetic energy, where the standard method of calculation is fundamentally different from a classical or a quantum model.

Again maybe it is not a good idea to conflate the output and the method too much. In fact this goes against a bunch of my comments where I asked if we should take the reference to a specific calculation method out of the description. So, from this perspective I am on the fence.

There are many examples where there is not a quantum counterpart to the classical energy. In addition to a few quantum phenomena as you mentioned, in classical simulations it is common to store the contribution to the total energy (usually per particle) from a particular term of the force field.

E.g., you may want to know the total energy on a particle in a configuration coming from some dihedral potential only.

In the previous schema it became unclear at times the usage of certain contributions. This is partially due to the inclusion of specific methodology for calculation within the description, but also because I think in some cases people had a certain class of models in mind. So, I was trying to prevent this a little.

I think this is why you thought there is a confusion between energy and potential. The potential is indeed stored within the FF section of method. But the corresponding energy is the evaluation of that potential or set of potentials for a particular configuration.

All that being said, I definitely am not set on this distinction. Curious to hear your rebuttal given the additional information, and also the thoughts of @Bernadette-Mohr and @JosePizarro3

Regarding your points:

QM/MM are indeed a mapping between methods and structure. How would you, using the distinction above, distinguish the energy contributions in QM/QM/MM (e.g. CCSD/DFT/Amber) or QM/MM/MM?

... QM in the limit of heavy particles converges to CM, without exception. The main difference in maths is that variables and functions in CM become functions (in the Hilbert space) and operators in QM. That is to say, the same physics is presented in higher-order structures. Some new physical phenomena emerge, but all the old ones remain. More importantly, why does the mathematical representation matter here? We're just getting scalars (maybe tensors) in our data.

Well, that's just plain wrong. At best, these communities are not interested in these contributions, but that's another story. As for your example: the dihedral potential is captured by DFT, just not named as such. ML potentials actually use DFT data to reformulate the physics in terms of the (local) neighborhood.

I agree with your sentiment that a lot of these contributions are model-specific, and there's value in extracting those. Then we need a system to annotate the method terms. I would mentally distinguish it from the contributions of physical phenomena, which are not subordinate to the method used to simulate them. Let's reserve the energies here for what experimentalists could also request.

Ok, here's my unqualified opinion:

Making the different energy contributions storable/accessible is valid, in my opinion, either because they were relevant for the original study or because somebody is interested in that particular aspect when conducting secondary analysis.

While I agree with @ndaelman-hu that, theoretically, especially QM and MM/MD describe the same physics, each approach has it's strengths and weaknesses.

We should absolutely provide that information somewhere, but maybe more with methods than with outputs?
One use case that comes to mind is training data for an ML model. If you want to have any faith in your predictions, you should be interested in the underlying distribution or potential issues.

@JFRudzinski, I'm not sure why you think including the methodology makes things confusing here; streamlining what information we provide where will probably make things clearer, though.

Generally, it seems that the slightly different usage of "potential" vs. "energy" in the different methods is a source of confusion even among us, so I think we should try to prevent this with the users. I remember us discussing something like a "derived" flag, to differentiate quantities that are calculated and aspects that are directly part of or automatically generated by the method. Could something along this line help?

ndaelman-hu · 2024-06-25T11:50:56Z

src/nomad_simulations/properties/energies.py

+class AngleEnergy(ClassicalEnergy):
+    """
+    Physical property section describing contributions to the potential energy from angle interactions of a (sub)system.
+    """
+
+    def normalize(self, archive, logger) -> None:
+        super().normalize(archive, logger)
+
+
+class DihedralEnergy(ClassicalEnergy):
+    """
+    Physical property section describing contributions to the potential energy from dihedral interactions of a (sub)system.
+    """
+
+    def normalize(self, archive, logger) -> None:
+        super().normalize(archive, logger)


I'm confused now, these look more like terms of the potential rather than energy contributions.
Instead, I'd expect rotational and vibrational energies (especially useful in light of the virial theorem), as subtypes of kinetic energy. These modes would classically be described in function of the (dihedral) angle.

Just to be clear I paste my explanation from above:

In classical simulations it is common to store the contribution to the total energy (usually per particle) from a particular term of the force field. E.g., you may want to know the total energy on a particle in a configuration coming from some dihedral potential only. I think this is why you thought there is a confusion between energy and potential. The potential is indeed stored within the FF section of method. But the corresponding energy is the evaluation of that potential or set of potentials for a particular configuration.

the contribution to the total energy (usually per particle) from a particular term of the force field

Yes, in QM we do the same with the Hamiltonian (which may encode more than just the potential). Up to a certain extent, QM codes also report the energy decomposition as such, though not always. The crux is that this is model-specific, by definition. I.e. you're generating a semantic language to describe all physical models. This may be fine, but should then tie back to the method section.

What I was suggesting is to focus instead on physical processes. I think those are more interoperable and of general interest i. E.g. kinetic energy which breaks apart into translational, rotational, and vibrational is relevant for the virial theorem, entropy, etc. I would therefore make these into subsections.

total energy on a particle in a configuration

Well, this is firstly partitioned by the structure, so you should decide on a mapping there. The rest is method specific data, so I would see to tie to the method definition. Still, one can also analyze (but maybe not easily extract) the dihedral contribution and others in a rotational and vibrational mode.

ndaelman-hu · 2024-06-25T12:08:55Z

src/nomad_simulations/properties/energies.py

+class ExternalEnergy(ClassicalEnergy):
+    """
+    Physical property section describing contributions to the potential energy from external interactions of a (sub)system.
+    """
+
+    def normalize(self, archive, logger) -> None:
+        super().normalize(archive, logger)


I would consider this more of an energy flux, so the dimensionality is slightly different.

This term is similar to the contributions coming from specific terms in the force field as defined above. Sometimes you define an external force within the force field (i.e., not linked to a particular atom type + config). Then you may want to again know the contribution to the total energy (e.g., per particle) from this external potential. Perhaps it should be renamed?

So, it's not a flux. Maybe there is a different energy contribution that you are describing?

Well, it all depends on where you draw the boundary between your system and the outer world. If the latter interacts with the system, we can describe that as an energy flux, e.g.:

a thermal bath: heat (a flux of kinetic and radiation energy) is being exchanged.

an external electric or magnetic field: fluctuations enter the system as EM-waves, i.e. a flux of radiation energy.

Given these examples, we can represent the outer-world in simpler models, abstracting away a lot of complexity. However, I don't think most people consider them part of the system, so, in line with conservation of energy, they interact via a flux.

@JFRudzinski: With "external force," do you mean something like, e.g., a biasing potential or a wall in your box? Maybe this makes what we're talking about a bit clearer to @ndaelman-hu.

Ow, that makes more sense! Yes, these can't be modeled via flux, as they are more so numerical / statistical tricks instead of physical forces (well, the biasing potential, at least).

I would definitely seek to associate these approaches with the method itself.

ndaelman-hu · 2024-06-25T12:10:17Z

src/nomad_simulations/properties/energies.py

+class ZeroPointEnergy(QuantumEnergy):
+    """
+    Physical property section describing the zero-point vibrational energy of a (sub)system,
+    calculated using the method described in zero_point_method.
+    """
+    # ! Someone check this description!
+    # ? Do we really want to specify the method here? This can't be user-defined?
+
+    def normalize(self, archive, logger) -> None:
+        super().normalize(archive, logger)


Quantum-specific subtype of vibrational energy.

Can you be more specific what you suggest changing in this case? I need your guidance for the quantum contributions (maybe it needs to wait until we figure out whether or not to distinguish Quant/Class though)

The name is fine, I'd alter the supertype to ZeroPointEnergy(VibrationalEnergy).
This is the lowest vibrational state, which has a non-zero contribution.

ndaelman-hu · 2024-06-25T12:11:05Z

src/nomad_simulations/properties/energies.py

+class ZeroTemperatureEnergy(QuantumEnergy):
+    """
+    Physical property section describing the total energy of a (sub)system extrapolated to $T=0$, based on a free-electron gas argument.
+    """
+    # ! Someone check this description!
+    # ? Do we really want to specify the method here? This can't be user-defined?
+
+    def normalize(self, archive, logger) -> None:
+        super().normalize(archive, logger)


I think this should be introduced in the context of a workflow. Would remove it here.

ndaelman-hu · 2024-06-25T12:20:38Z

src/nomad_simulations/properties/energies.py

+class NuclearRepulsionEnergy(QuantumEnergy):
+    """
+    Physical property section describing the nuclear-nuclear repulsion energy of a (sub)system.
+    """
+
+    def normalize(self, archive, logger) -> None:
+        super().normalize(archive, logger)


This is just electrostatics for the nuclei. See my take above.

ndaelman-hu · 2024-06-25T12:22:15Z

src/nomad_simulations/properties/energies.py

+    # van_der_waals = SubSection(
+    #     sub_section=EnergyEntry.m_def,
+    #     description="""
+    #     Contains the value and information regarding the Van der Waals energy. A multiple
+    #     occurence is expected when more than one van der Waals methods are defined. The
+    #     van der Waals kind should be specified in Energy.kind
+    #     """,
+    # )


This can both refer to a set of methodologies and the Casimir effect.

ndaelman-hu · 2024-06-25T12:24:02Z

src/nomad_simulations/properties/energies.py

+    # double_counting = SubSection(
+    #     sub_section=EnergyEntry.m_def,
+    #     categories=[FastAccess],
+    #     description="""
+    #     Double counting correction when performing Hubbard model calculations.
+    #     """,
+    # )


While not a physical energy, per se, it is a term in the Hamiltonian used for corrections. This begs the question: do we want to distinguish Hamiltonian terms or physical phenomena?

Just repeating myself from above: We should definitely NOT store the Hamiltonian terms here. That goes in the method. However, we may store the evaluation of the Hamiltonian for particular configurations of particles. This should be more clear though for sure.

Just repeating myself from above: We should definitely NOT store the Hamiltonian terms here

I agree with you (see above), but I have the impression that you were doing that anyhow here.

ndaelman-hu · 2024-06-25T12:24:54Z

src/nomad_simulations/properties/energies.py

+    # fermi = Quantity(
+    #     type=np.dtype(np.float64),
+    #     shape=[],
+    #     unit='joule',
+    #     description="""
+    #     Fermi energy (separates occupied from unoccupied single-particle states)
+    #     """,
+    #     categories=[EnergyTypeReference, EnergyValue],
+    # )
+
+    # highest_occupied = Quantity(
+    #     type=np.dtype(np.float64),
+    #     unit='joule',
+    #     shape=[],
+    #     description="""
+    #     The highest occupied energy.
+    #     """,
+    # )
+
+    # lowest_unoccupied = Quantity(
+    #     type=np.dtype(np.float64),
+    #     unit='joule',
+    #     shape=[],
+    #     description="""
+    #     The lowest unoccupied energy.
+    #     """,
+    # )


These are already accommodated else where in the schema. May be removed here.

ndaelman-hu · 2024-06-25T12:25:28Z

src/nomad_simulations/properties/energies.py

+    # # TODO I suggest ewald is moved to "long range" under electrostatic->energyentry, unless there is some other usage I am misunderstanding
+    # ewald = SubSection(
+    #     sub_section=EnergyEntry.m_def,
+    #     description="""
+    #     Contains the value and information regarding the Ewald energy.
+    #     """,
+    # )


Subtype of electrostatic energy.

This is a major question that I have: How should we deal with sub-contributions? One option is to keep a flat list (flat inside total_energy.contributions at least), and somehow distinguish sub contributions with names and descriptions.

The other option is to nest. If we do this, I think we should be restrictive about the nesting, i.e., specify energy sub-contributions that are relevant for a particular contribution. We should not allow free recursive nesting or sub-contributions that don't make sense theoretically/methodologically

Or you can have both options at once if you use inheritance.

JosePizarro3 · 2024-06-26T06:58:51Z

@JosePizarro3 @ndaelman-hu Could you do a preliminary brief check of the structure within energies.py. My main question at the moment is how we deal with the organization of "sub-contributions".

Ok, I took a look on the code. I can send later on some notes on the DFT total energy, and how this could be corrected in other approaches. I also agree with what Nathan said on the confusion of TotalEnergy vs CoulombPotential; I think this can be "easily" solved writing down the DFT equations for TotalEnergy, but also the ones for the CoulombPotential. hubbard or double_counting could be left out for a later feature in which we include other contributions

@JFRudzinski your approach is good, so here I am just giving an alternative idea, also for @ndaelman-hu @Bernadette-Mohr about how to deal with contributions to PhysicalProperties.

Regarding class definitions, these changes show how annoying is to define multiple contributions using inheritance. In my opinion, I think contributions could be simplified to an abstract class and some identifier (a string/MEnum) used both for
the TotalEnergy and the sub_section that repeats. Like we have classical and quantum, I decided to keep them separated, but you could also abstract this to a string/MEnum. My reasoning on doing it this ways is that:

Contributions do not have specialized quantities or normalizations.
It gets complicated to maintain (400 lines of code, 20 classes VS. 60 lines, 3-4 classes)

My proposal would be thus something simpler, a string or MEnum (see the abstract class DOSProfile and how I handled ElectronicDOS(DOSProfile) and its contributions) to identify each contribution:

class BaseTotalEnergy(PhysicalProperty):
    """
    A base section class defining the abstract concept of `TotalEnergy`. This section is useful when defining contribution to the `TotalEnergy`, as well as the `TotalEnergy` section itself.
    """

    value = Quantity(
        type=np.float64,
        unit='joule',
        description="""
        The value of the energy.
        """,
    )

    def normalize(self, archive, logger) -> None:
        super().normalize(archive, logger)


class ClassicalTotalEnergy(BaseTotalEnergy):
    """
    A base section defining the concept of classical `TotalEnergy`. This section is useful for categorizing TotalEnergy in terms of classical or quantum approaches.
    """

    name = Quantity(type=MEnum('kinetic', 'potential', 'intermolecular'...),)  # more strings here

    def normalize(self, archive, logger) -> None:
        super().normalize(archive, logger)


class QuantumTotalEnergy(BaseTotalEnergy):
    """
    A base section defining the concept of classical `TotalEnergy`. This section is useful for categorizing TotalEnergy in terms of classical or quantum approaches.
    """

    name = Quantity(type=MEnum('kinetic', 'Hartree', 'xc', ...),)  # more strings here

    def normalize(self, archive, logger) -> None:
        super().normalize(archive, logger)


class TotalEnergy(BaseTotalEnergy):
    """
    ... TAKE DESCRIPTION FROM TAXONOMY
    """

    iri = '...'

    classical_contributions = SubSection(
        sub_section=ClassicalTotalEnergy.m_def, repeats=True
    )

    quantum_contributions = SubSection(
        sub_section=QuantumTotalEnergy.m_def, repeats=True
    )

    def normalize(self, archive, logger) -> None:
        super().normalize(archive, logger)

The question here is also, at the end of the day, how important are these contributions? I think we can ease the taxonomy by not overclassifying these kind of things.

Ah, and I edit to say that the description of the property should then reflect the contributions and what are these. I think we should bring this last point (if agreed) to the taxonomy TF.

JosePizarro3 · 2024-06-26T07:03:14Z

@ndaelman-hu I found this video which might help as a first approach: https://youtu.be/5mxVR1ikQQo?si=Cfxzs0FlKyqrkFuv&t=1424

But I am sure there are better notes somewhere, I just casually browsed Youtube.

JFRudzinski · 2024-06-26T07:24:18Z

ClassicalTotalEnergy

I definitely like the simplicity of this approach. I guess my only concern is losing the descriptions of the contributions. Could these actually simply be set during normalization? Since there will be no description coming from the taxonomy for these contributions?

Oh ok, I see now that you suggested adding these to the taxonomy. This would work for physical contributions but not really specific model contributions (see my descriptions about evaluation of terms in the Hamiltonian above). I think for the later we need to add descriptions within the schema.

(tbh I also do not know how important these contributions are in general. But I think we still want to have some sort of robust scheme for dealing with them because they will be used at least in some niche cases)

JFRudzinski · 2024-06-26T08:54:35Z

Sorry if this is confusing, but the easiest way to apply the new structure was to simply manually edit/add the files starting from the head of develop. So, this PR got killed in the process. We can keep discussing the energies here to close out the open discussion. But after that I created a new PR: #91

JosePizarro3 · 2024-06-26T09:06:44Z

Sorry if this is confusing, but the easiest way to apply the new structure was to simply manually edit/add the files starting from the head of develop. So, this PR got killed in the process. We can keep discussing the energies here to close out the open discussion. But after that I created a new PR: #91

Yeah sorry, indeed it might be better to do that.

JFRudzinski · 2024-06-26T09:33:18Z

Sorry if this is confusing, but the easiest way to apply the new structure was to simply manually edit/add the files starting from the head of develop. So, this PR got killed in the process. We can keep discussing the energies here to close out the open discussion. But after that I created a new PR: #91

Yeah sorry, indeed it might be better to do that.

it was actually quite easy to amend the structure, it's just that I lost the connection to my other changes, but tbh i don't really care since all this is very preliminary anyway

JFRudzinski · 2024-06-27T07:49:19Z

@Bernadette-Mohr Thank you very much for your input, it is very valuable.

Nathan and I just had a meeting in an attempt to settle some aspects of the above discussions. Here is a short outline of our proposal:

class Energy(PhysicalProperty):
    """
    Base class for all energy properties
    """

    value = Quantity(
        type=np.float64,
        unit='joule',
        description="""
        The value of the energy.
        """,
    )

    def normalize(self, archive, logger) -> None:
        super().normalize(archive, logger)

class TotalEnergy(Energy):
    """
    Just the total energy
    """

    def normalize(self, archive, logger) -> None:
        super().normalize(archive, logger)

class PropertyContribution(PhysicalProperty):
    """
    Base class for defining contribution link to some method. Will be used at least for energies and forces
    """
    method_contribution_ref = Quantity(
        type=MethodContribution,
        description="""
        """,
        a_eln=ELNAnnotation(component='ReferenceEditQuantity'),
    )

    def normalize(self, archive, logger) -> None:
        super().normalize(archive, logger)

class EnergyContribution(PropertyContribution, Energy):
    """
    Energy property linked to method contribution
    """

    name = Quantity(type=str,)  # may be set explicitly or via method_contr_ref

    def normalize(self, archive, logger) -> None:
        super().normalize(archive, logger)

Let me know if there are any comments or questions. My plan is to implement this structure today and then switch to #91 for another review

ndaelman-hu reviewed Jun 25, 2024

View reviewed changes

JFRudzinski closed this Jun 26, 2024

JFRudzinski force-pushed the 28-add-md-relevant-outputs branch from 685fe98 to 2c2f2fd Compare June 26, 2024 08:44

28 add md relevant outputs #87

28 add md relevant outputs #87

Conversation

JFRudzinski commented Jun 13, 2024

JFRudzinski commented Jun 21, 2024

JosePizarro3 commented Jun 24, 2024

ndaelman-hu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ndaelman-hu Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JFRudzinski Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bernadette-Mohr Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ndaelman-hu Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JosePizarro3 commented Jun 26, 2024 • edited Loading

JosePizarro3 commented Jun 26, 2024

JFRudzinski commented Jun 26, 2024 • edited Loading

JFRudzinski commented Jun 26, 2024

JosePizarro3 commented Jun 26, 2024

JFRudzinski commented Jun 26, 2024

JFRudzinski commented Jun 27, 2024

ndaelman-hu Jun 26, 2024 •

edited

Loading

JFRudzinski Jun 26, 2024 •

edited

Loading

Bernadette-Mohr Jun 26, 2024 •

edited

Loading

ndaelman-hu Jun 26, 2024 •

edited

Loading

JosePizarro3 commented Jun 26, 2024 •

edited

Loading

JFRudzinski commented Jun 26, 2024 •

edited

Loading