Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/bands wc #235

Closed
wants to merge 4 commits into from
Closed

Feature/bands wc #235

wants to merge 4 commits into from

Conversation

bosonie
Copy link
Collaborator

@bosonie bosonie commented Oct 17, 2021

Implements the abstract classes for the common workflow calculating the band structure, as discussed in #222

Some design choices that can be debated:

  1. We agreed on the fact that two possibilities are foreseen: the user specifies a full list of kpoints in input or it uses seekpath. In the implementation, the use of seekpath is default, unless the input “bands_kpoints” is specified. Should we maybe make more explicit this choice? How?
  2. We did not agree on the output energy units. I guess eV. We should also sort the issue of fermi energy in BandsData (Add support for fermi energy to the BandsData aiida-core#5032) or return the fermi energy as direct output.
  3. The way to pass wavefunction or DM is based on the idea to pass a RemoteData folder. Will this concept be present in aiida 2.0?
  4. Some repetition of code is now seen between the code-specific implementation of RelaxInputGenerator and BandsInputGenerator. Moreover the protocol list (yaml file) is duplicated. Any better way to do that?

Also the siesta implementation is provided as a reference for other codes.

On the lines of the common structure relaxation workflow, we implement
the abstract classes for the creation of the bands workflow with
common interface (inputs and outputs).
Each code will need to make its own implementation. This commit includes
the siesta implementation.
@giovannipizzi
Copy link
Member

Thanks @bosonie ! Following up from our discussion this morning, I added a comment to the issue with a slightly different suggestion. Here I summarise some answer to your points:

  1. In my suggestion both options are possible - however, the common wf interface only accepts explicit k-points, and the seekpath logic is moved to a 'common wf' combining the relax common wf interface + the bands common wf interface
  2. I would suggest indeed eV - and agreed that we need to fix the issue of specifying the Fermi energy in the BandsData (for now, I would suggest anyway to return it as an output in eV - maybe we want to keep this anyway as one of the outputs? Or this could be just temporary while the PR on the ElectronicBandsData is merged and released - it would be trivial to switch later for all implementations)
  3. Yes, the concept of using RemoteData will still be there in 2.0 as it is now. As I mentioned in the issue, the only question is to double check that, for all codes, getting the final RemoteData is enough to go back and retrieve all needed inputs to run bands, even without explicitly specifying e.g. the crystal structure and the DFT cutoffs/basis sets etc.
  4. my suggestion would remove the code duplication you mention, I think (the protocols would only remain in the common relax wf, the common bands wf wouldn't need to know)

@giovannipizzi
Copy link
Member

Question: the engines.bands.options would be the "calcjob options" in the AiiDA sense, or more options for how to calculate the bands (e.g. how many bands etc.)?


# K points for bands, possible change of structure due to SeeK-Path
if bands_kpoints is None:
res = seekpath_explicit_kp_path(structure, orm.Dict(dict=seekpath_parameters))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dilemma has come up before but I really think we shouldn't be calling calcfunctions in input generator calls. It should be possible to call get_builder without the database being affected. On the other hand, I think changing the input structure is something that should be captured in the workchain.

I am wondering whether we should simply make the seekpath analysis an actual step of the WorkChain. This is what I already do in the PwBandsWorkChain to address the problem of keeping provenance while not doing it in the generators. If we agree that using seekpath to normalize the structure and determine the high-symmetry k-point path for the bands should be done for all implementations, it only make sense this is done in the common workflow itself. I am just not sure it is possible, because the generic workchain class cannot know where in the input namespace the structure input is located and where the kpoints should be defined. I will see if there is a way around this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. We agreed on the fact that two possibilities are foreseen: the user specifies a full list of kpoints in input or it uses `seekpath`. In the implementation, the use of seekpath is default, unless the input “bands_kpoints” is specified. Should we maybe make more explicit this choice? How?

Okey, so did we agree how complex the common-workflow bands data should be like? E.g. should each plugin possibly utilize their own band generation workchain stack? I see pros and cons for both. From the VASP side I think we would like to run our own bands workchains as we anyway need those to run outside the common-workflows.

So we use seekpath as part of the workflow if the user wants to, and in doing so, when for instance running something outside of regular DFT it is sometimes required to pass the original grid in addition to the line extraction points. For us it would not be either or, both both. Assuming of course that we would not utilize our own workchain stack. Also, we should not lock into seekpath in case users want to supply their own line sets. As you know for some symmetries you have a choice and seekpath follows one of those. In order to honour reproducibility I think we should generally opt for user choice here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it only make sense this is done in the common workflow itself

Yes, but only if you look at common workflows project. It would probably be important for codes to offer their own bands workchain to run outside of common workflows if need be.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dilemma has come up before but I really think we shouldn't be calling calcfunctions in input generator calls. It should be possible to call get_builder without the database being affected. On the other hand, I think changing the input structure is something that should be captured in the workchain.

Yes, I also fully support this. Also, it is more likely for users to inspect a well defined workchain that it is to dive into the code base. Maybe documentation is also easier to do in this way. Generally I think it is good practice to keep everything related to the numerics/science in workchains.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but only if you look at common workflows project. It would probably be important for codes to offer their own bands workchain to run outside of common workflows if need be.

Sure, but then it probably is optional as well in your workchain, or maybe it should be. The PwBandsWorkChain allows to skip the relaxation and seekpath normalization entirely. The one thing I still need to address is that currently it enforces running an SCF step. Maybe we could add more flexibility there to even skip this as well and just take an existing SCF but that comes with its own complications in terms of validating the compatibility of inputs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okey, so did we agree how complex the common-workflow bands data should be like?

This is a good point and going through the text in #222 I don't see a discussion of this (I didn't attend the meetings). The current implementation in this PR also doesn't seem to be dealing with the fact that when using seekpath, the structure cannot be assumed to remain the same and so any existing parent_folder needs to be considered useless and a new SCF has to be done first before computing the bands.

The possible scenarios that I can from the user's perspective:

  1. User just has a structure and just wants to compute the bands
    • Required inputs
      • structure
    • Workflow logic:
      • Structure is primitivized and k-points are determined
      • An SCF is run for the primitivized structure
      • A bands calculation is performed on results of SCF and the k-points determined in first step
  2. User has a structure and a remote_folder containing restart files of a previously completed calculation
    • Required inputs
      • structure
      • kpoints
      • remote_folder
    • Workflow logic:
      • Just run bands calculation restarting from remote_folder using structure and kpoints as input (note, here it is not guaranteed that the inputs are compatible but it is not trivial to see that the workflow can check this and maybe we should just accept whatever and run it. User is responsible for making sure the inputs make sense)

One potential third option I see is where a user has a structure and remote_folder but no kpoints yet and would want them to be generated, but that would require seekpath (or some other method) to be able to compute them without modifying the structure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point and going through the text in #222 I don't see a discussion of this (I didn't attend the meetings). The current implementation in this PR also doesn't seem to be dealing with the fact that when using seekpath, the structure cannot be assumed to remain the same and so any existing parent_folder needs to be considered useless and a new SCF has to be done first before computing the bands.

I was also thinking along the lines of decisions made if we should rely on bands workchains on the plugin side. I think we would prefer to run those for different reasons. Obviously structure and k-points need to be determined on common grounds, leading to the unavoidable fact that seekpath needs to be executed on the common workflows to avoid dealing with verifications that the codes run this in the same way. Or if one foresee that we should be able to just run the existing workchain pipelines in the common workflows for this?

My maintenance point here being that many plugins anyway need a bands workchain outside of common-workflows so it would be nice not having to maintain two slightly different pipelines and the user support coming from those.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My maintenance point here being that many plugins anyway need a bands workchain outside of common-workflows so it would be nice not having to maintain two slightly different pipelines and the user support coming from those.

I see your point, however, I think for the common workflows having some level of consistency in how the various implementations perform the bands is important. Here I am talking about consistency on a high-level, such as making sure that k-point generation and if a structure gets modified in that case, is identical for all implementations. I think you agree with this though. So I would definitely think this needs to go in the common workflows. The problem you describe for the plugins having fully-fledged band workchains of their own also applies to aiida-quantumespresso but I don't think it necessarily adds a maintenance burden on it. All we need to do is make those workflows flexible enough to make parts of the logic, such as structure normalization and k-point generation, optional. But this is anyway valuable for the plugin users, so I don't think this should be a problem personally.

The most important part of the design process here is exactly define the high-level logic of the common bands workflow and what parts should be common. This is what I tried to sketch with my previous comment and we should probably work out before continuing with the implementation.

Copy link
Collaborator

@espenfl espenfl Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you agree with this though.

Absolutely. And I think there is no escaping having seekpath as an option in common workflows. And that should be part of its own workchain in that band common workflow stack. That we pipe structure and kpoints down to the band workchain of the respective plugins, which handle the sequence of calcs necessary.

One could also imagine generalizing this a bit to offer more of a symmetry workchain, where also say spglib is used so that we ensure some kind of consistency (by default, can be overridden by the user). I know from previous experience that these details are often overlooked by users and/or causing a lot of time and frustration. seekpath would then naturally fall into that symmetry workchain, which is something we could make the common workflow band workchain depend on. I think that would be useful in other contexts downstream as well (and is something we anyway plan on doing plugin side).

@espenfl
Copy link
Collaborator

espenfl commented Oct 20, 2021

3\. Yes, the concept of using RemoteData will still be there in 2.0 as it is now. As I mentioned in the issue, the only question is to double check that, for all codes, getting the final RemoteData is enough to go back and retrieve all needed inputs to run bands, even without explicitly specifying e.g. the crystal structure and the DFT cutoffs/basis sets etc.

Not to derail, but do you have any info you can point to which discusses the future of this?

@espenfl
Copy link
Collaborator

espenfl commented Oct 20, 2021

Question: the engines.bands.options would be the "calcjob options" in the AiiDA sense, or more options for how to calculate the bands (e.g. how many bands etc.)?

If latter we should possibly not use options.

@espenfl
Copy link
Collaborator

espenfl commented Oct 20, 2021

4\. Some repetition of code is now seen between the code-specific implementation of `RelaxInputGenerator` and `BandsInputGenerator`. Moreover the protocol list (yaml file) is duplicated. Any better way to do that?

I think at some point it makes sense to do nested protocols, but we are not there yet. Until that let us not make the code base too complex in order to save a few lines. The protocol is considered a recipe and is also a clear documentation to the user. It is thus important that the user can look quickly at it to get an idea on what is done, instead of spending time going back and forth in the code base. At least that is my opinion on this.

@sphuber
Copy link
Collaborator

sphuber commented Oct 20, 2021

Not to derail, but do you have any info you can point to which discusses the future of this?

There never have been any discussions on changing any of this (at least that I am aware of) so there is nothing to point to as far as I can tell.

@sphuber
Copy link
Collaborator

sphuber commented Oct 20, 2021

If latter we should possibly not use options.

100% agree, but this should be indeed just the options that will be passed to the engine calcjob. It is exactly the same as for the common relax workflow

@giovannipizzi
Copy link
Member

Just to clarify my original suggestion - a bit different than what's implemented here:

  • I think each implementation should reuse their internal (already implemented, non-common) work chains (for bands, in this case)
  • the common workflow interface should just map outputs to 'common' outputs, and provide the input generator to map from 'common' inputs to the workchain inputs
  • my suggestion was to split the problem in two.
    1. define a common workflow interface that always requires an explicit list (so it's the user responsibility to pass k-points, and to pass them compatible to the input cell), and get the input directly from a remote data (that should be returned by the common workflow for relax - so it's easy to reuse - but can also come from a non-common workflow). The question for me is: would all implementations be able, given a remote data folder, to reconstruct all inputs including e.g. the structure? (i.e. the idea here is not to pass again structure, parameters etc.; but just that internally the input generator would fetch the inputs from the remote data and just modify them for the bands)
    2. define a common workflow (similar to EOS) that
    3. calls seekpath
    4. calls the common relax workchain (relax or scf)
    5. (optional, if relax) re-run seekpath + common workchain in SCF mode
    6. passes the final remote data (returned by the last common relax workchain that was run) to the common interface for bands discussed above

@bosonie
Copy link
Collaborator Author

bosonie commented Oct 21, 2021

Thanks all for the feedback. Few observations:

  1. For sure @giovannipizzi suggestion also fixes the problem that @sphuber raised, meaning the undesired creation of nodes in the input generator.
  2. Gio's suggestion also avoids the problem of code repetition since the CommonBandsWorkChain will not need protocols.
  3. The concern of @giovannipizzi of reconstructing inputs from the remote data folder is legit, but any CalJob that allows a restart should be able to do so. The problem poses only if there is an input for bands calculation that is not kpoints, is not as well used for scf and is not something that the developer can fix to a sensible default inside the input generator. The only thng to be careful about is the specification of the code. Next point..
  4. The only other point to add on Giovanni's idea is that we need a way to pass the code and resources for the calculation of the bands. This is especially important in case it is not the same executable(s) that performs the scf and the bands calculation. I would do it in the same way we do for the relaxation, using the engines dictionary. However this will be optional and, if not specified, the same code and resources used for the calculation that produced the remote folder will be used.
  5. Regarding the outputs, the BandsData containing the bands is the important output. I am not sure I understand @espenfl sentence how complex the common-workflow bands data should be like. What complexity should be added?
  6. engines.bands.options is always to specify the computational resources for the code in engines.bands.code

Overall I'm getting to like @giovannipizzi suggestion and I will try to implement it. However I would leave time for some other feedback in #222 before asking a new review.

@bosonie bosonie closed this Dec 7, 2021
@bosonie bosonie deleted the feature/bands_wc branch December 17, 2021 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants