Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add code registry tools #61

Closed
wants to merge 1 commit into from
Closed

Conversation

ltalirz
Copy link
Member

@ltalirz ltalirz commented May 16, 2022

This adds a registry that allows users to place a set of yaml
files describing computers and codes installed on them in a
folder of their choice (AIIDA_CODE_REGISTRY).
When using the load_code or load_computer functions from
aiida_azq.registry, these codes are set up automatically on
demand.

Related to #60 @unkcpz

This adds a registry that allows users to place a set of yaml
files describing computers and codes installed on them in a
folder of their choice (AIIDA_CODE_REGISTRY).
When using the `load_code` or `load_computer` functions from
aiida_azq.registry, these codes are set up automatically on
demand.
@ltalirz
Copy link
Member Author

ltalirz commented May 16, 2022

Some choices made here:

  1. One question is at which level to "group" configurations. The original choice to have one file per AiiDA command arose because that made it easy to integrate into click - but this makes it somewhat tedious to edit/maintain these files. Also, it means file names need to carry meaning; one needs to duplicate labels in the file name and inside the file, ...

I think the top-level grouping we've converged to here in the registry (to group at the "HPC site" level) is a natural one.

Here, I'm adapting this type of grouping by going to one yaml file per site of the form

computers:
 - label: ...
   setup:
   configure:
     core.ssh:
       ...
   codes:
   - label:
      ...
   

The registry folder can contain multiple such yaml files, whose contents will simply be merged together. The name of these files is irrelevant.

The current script also allows a top-level codes: field, whose codes will be applied to all computers listed.

This yaml format could become how we store computer/code configurations here on the registry

@ltalirz
Copy link
Member Author

ltalirz commented May 16, 2022

  1. Template variables are passed directly to the load_code / load_computer commands.

This is to support cases like: if template variable 1 has value x, then template variable 2 should have value y, which would be hard to express statically.
Of course, if the possible values are known beforehand, such cases could also be resolved by just putting all possible static combinations of template variables inside the configuration, but that is less convenient (and maybe the possible values are not known beforehand).

In practice, passing the template variables to the load_code / load_computer commands was more convenient for me.

@ltalirz
Copy link
Member Author

ltalirz commented May 16, 2022

Still to do:

  • warn when detecting template variables that are not yet replaced. This is easily done via jinja; one just needs to pay attention to do this at the last possible moment before passing information to AiiDA's load_computer/load_code functions
  • Decide folder name in the AIIDA_PATH
  • The schemas for the configuration file are hand-crafted and incomplete (just added what I needed). Would be nice if those could be (partially) inferred from the verdi cli

@yakutovicha
Copy link
Contributor

I think the top-level grouping we've converged to here in the registry (to group at the "HPC site" level) is a natural one.

Just adding my 5 cents. As we discussed earlier, it probably makes sense to keep computers and code setup detached. So the yaml file would look something like:

computers:
- label: ...
  setup:
  configure:
    core.ssh:
      ...
codes:
- label:
    ...

It would probably then make sense to merge load_code and load_computer in one function something like load_resource and use it load_resource("code@computer"). The function could check whether the computer is already available and if not - set it up.

@sphuber
Copy link

sphuber commented May 20, 2022

Few comments:

It would probably then make sense to merge load_code and load_computer in one function something like load_resource and use it load_resource("code@computer"). The function could check whether the computer is already available and if not - set it up.

Note that load_code in principle already supports adding the @computer-label to load a code for a specific computer.

I am not sure we should hijack the current load_code and load_computer to add a lot of magic to automatically create entities. Maybe a new dedicated function would be better.

Just adding my 5 cents. As we #35, it probably makes sense to keep computers and code setup detached. So the yaml file would look something like:

I tried to read the thread to determine your argument for decoupling a code from a computer, but couldn't find it. A Code contains information that is going to be specific to a computer, is it not? For example the absolute executable path. The path for QE's pw.x might be different on different computers. I imagine this holds for many cases. So what would be the benefit of having "generic" code definitions?

@ltalirz
Copy link
Member Author

ltalirz commented May 20, 2022

Thanks for the feedback @yakutovicha !

I think the top-level grouping we've converged to here in the registry (to group at the "HPC site" level) is a natural one.

Just adding my 5 cents. As we discussed earlier, it probably makes sense to keep computers and code setup detached. So the yaml file would look something like:

computers:
- label: ...
  setup:
  configure:
    core.ssh:
      ...
codes:
- label:
    ...

I agree. As mentioned in the text below my yaml example, your layout is actually already supported by the example script as well.
All codes specified at the top level are added to all computers in the yaml file.

It would probably then make sense to merge load_code and load_computer in one function something like load_resource and use it load_resource("code@computer"). The function could check whether the computer is already available and if not - set it up.

I reused the names from aiida-core since this makes it a drop-in replacement, but if we feel strongly about it we could also call the function differently (note that for your proposition this function would return instances of different classes... to think about).

In case you were wondering: if you load_code a "code@computer" where the computer does not yet exist, the script does already set up the computer automatically (and "code@computer" is the only label format for codes that the script currently supports).

@ltalirz
Copy link
Member Author

ltalirz commented May 20, 2022

Thanks @sphuber as well!

I am not sure we should hijack the current load_code and load_computer to add a lot of magic to automatically create entities. Maybe a new dedicated function would be better.

Yes, the use case is a bit different and the templating mechanism would require the introduction of a new argument.
One can start with a separate function or a plugin and see what is the most seamless way we can come up with to simplify a new user's life.

I tried to read the thread to determine your argument for decoupling a code from a computer, but couldn't find it. A Code contains information that is going to be specific to a computer, is it not? For example the absolute executable path. The path for QE's pw.x might be different on different computers. I imagine this holds for many cases. So what would be the benefit of having "generic" code definitions?

I think one can make a case for both scenarios (which is why the script I added supports both).
The case of one code for multiple computers can come in cases e.g. where one wants to hardcode the queue and perhaps some other metadata in the computer (and have several of those for the same HPC system).
To some degree this can be alleviated by templating, but there can be cases where one just wants to create a computer with a new label for it.

@yakutovicha
Copy link
Contributor

yakutovicha commented May 23, 2022

I tried to read the thread to determine your argument for decoupling a code from a computer, but couldn't find it. A Code contains information that is going to be specific to a computer, is it not? For example the absolute executable path. The path for QE's pw.x might be different on different computers. I imagine this holds for many cases. So what would be the benefit of having "generic" code definitions?

The file structure we came up with so far is this:

daint
.
├── codes
│   ├── cp2k-8.1-cpu.yaml
│   ├── cp2k-8.1-gpu.yaml
│   ├── pp-6.5.yaml
│   ├── projwfc-6.5.yaml
│   └── pw-6.5.yaml
├── hybrid
│   ├── computer-configure.yaml
│   └── computer-setup.yaml
├── hybrid-s1005
│   ├── computer-configure.yaml
│   └── computer-setup.yaml
├── hybrid-s904
│   ├── computer-configure.yaml
│   └── computer-setup.yaml
...

or, in other words:

domain
├── codes
│   ├── ...
├── computer1
│   ├── computer-configure.yaml
│   └── computer-setup.yaml
├── computer2
│   ├── computer-configure.yaml
│   └── computer-setup.yaml

So domain refers to an actual machine while computer1... represent different configurations (different projects, different partitions, etc.). Before, we had all the codes under each computer configuration. @mbercx proposed here to decouple the computers and the codes to reduce duplication. And this is what happened in #38.

I am repeating myself here, but to be clear: by construction domain refers to the same machine so all executables listed in the codes folder are available there. So it makes total sense to me to keep the computers and the codes set up independent from each other and interchangeable.

@ltalirz ltalirz mentioned this pull request Jun 13, 2022
@ltalirz
Copy link
Member Author

ltalirz commented Jun 13, 2022

superseded by #62

@ltalirz ltalirz closed this Jun 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants