Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support of generic aiida plugins #20

Closed
wants to merge 4 commits into from
Closed

Support of generic aiida plugins #20

wants to merge 4 commits into from

Conversation

agoscinski
Copy link
Collaborator

@agoscinski agoscinski commented Oct 16, 2024

Idea

AiiDA plugins define their inputs and outputs in their CalcJobs and Ẁorkchains with specific names. For example the arithmetic add CalcJob) has the inputs x and y as well as the output sum. We therefore need to specify these ports (how AiiDA calls them) in the yaml file to create the workgraph. In the aiida-shell plugin we did not need to do this because
Each plugin defines a entry point which we can use to load the corresponding CalcJob or WorkChain using the factories

from aiida.plugins.factories import CalculationFactory
ArithmeticAddCalculation = CalculationFactory("core.arithmetic.add")
# Retrieve input ports
print(ArithmeticAddCalculation.spec().inputs)

So with these two additional information (the entry point and the port names )in the YAML file we can run almost arbitrary calculations from aiida plugins (including aiida-icon). The reason why we did not need the port names for aiida-shell is because ShellJob creates dynamically its output ports from the outputs that are provided as inputs, so we took this to our advantage and use the name specified in the yaml file as output port names. For the input ports we also simplify the actual ports that would be nodes and arguments (see code). The gist is that we treat aiida-shell differently, and we should continue to do so, because otherwise it becomes cumbersome to use.

YAML syntax

Here you find (an example to run arithmetic add)[https://github.com/C2SM/ETHIOPIA/blob/plugins/tests/files/configs/test_config_small.yml]. A snippet of it to show how it is used to define a workflow.

- adder1:
    inputs:
        - a:
            port_name: x
        - b:
            port_name: y
    outputs:
        - sum1: 
            port_name: sum
- adder2:
    inputs:
        - sum1:
            port_name: x
        - c:
            port_name: y
    outputs:
        - sum2:
            port_name: sum

Since the same data object can be used for different ports we need this information in the cycles.

Definition of computer and code

We follow more the aiida logic to define computer and code information by just specifying the label given on definition.

tasks:
  - adder1:
      plugin: core.arithmetic.add
      code: bash 
      computer: localhost

This has the strong advantage that we do not have to write our own logic to parse all the computer information and can use the well maintained CLI verdi from aiida to allow the user to create it before. It is in this PR because it was required for testing, but should be separated out in a different PR

Current state of the code

Currently the code in the workgraph.py using different functions to create plugins that are not ShellJobs, and I am not sure if this is smart or not. It is a tradeoff between code duplications and flexibility, and requires a bit more thoughts and decisions how we go with this.

@agoscinski agoscinski force-pushed the plugins branch 2 times, most recently from faebeaa to 6618dbb Compare October 16, 2024 21:06
Base automatically changed from workgraph-from-pydantic to main October 17, 2024 08:50
@agoscinski
Copy link
Collaborator Author

Icon change of namelist

For icon we need to adapt the behavior, because we rather want to change the namelist and keeping it over the calculation constant, so we can move it to the task definition. We can maybe make a calcjob out of the calcfunction https://github.com/aiida-icon/aiida-icon/blob/a982d8792006bf234fe79c18aa76fd2af7a3463f/src/aiida_icon/iconutils/masternml.py#L43-L51 that adapts the name list so we can provide a simpler use for the user, for arbitrary changes. We will use then this calcjob also to adapt it for the inputs we can infer from the workflow (date, output of last icon last run will not be passed in the aiida way but just calls this calcjob to update the namelist with the new file).

Naming of port_name

The port_name, maybe rename to input_key or input_slot.

Specify computer and code

We discussed how we deal with computer and code. For computer definition we stick with verdi, but for codes it might be useful to just just pass the filepath since we want not that the user creates a new code all the time when icon is recompiled. How to create a label in this case is still an open question. We could hashing the binary but for icon this can be 200MB which needs to be send over the transport plugin. One proposition was to hash the filepath as label, it was an open question if we generate a new uuid that preserve provenance also in cases the code is recompiled.

@leclairm
Copy link
Contributor

We will use then this calcjob also to adapt it for the inputs we can infer from the workflow (date, output of last icon last run will not be passed in the aiida way but just calls this calcjob to update the namelist with the new file).

Here, for input data, we have to choices: either we adapt the namelist with the valid absolute path to the corresponding data or we leave a constant relative path in the namelist and symlink the actual data to the correct relative path in the working directory of the job.

Comment on lines +82 to +83
self._add_aiida_task_nodes()
self._add_aiida_links()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating the sockets and the links all at once in _add_aiida_links() relies on the fact that the order in which unrolled objects are stored is compatible with the rank of the nodes in the graph. This is not guaranteed and should not be required from the user. For instance, in the yaml file, inverting the order of tasks in a cycle description or the order of cycles should not affect the final WorkGraph and its correctness.

For that to happen, like in the parser PoC, there's a need for an intermediate step where the sockets corresponding to task outputs are created, e.g. self._add_aiida_sockets , and only then add the links.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or I insert the visualization graph before that and we can use it as the IR before translating it to a WorkGraph. So I'll definitely do the first part and insert it there. Then we can see if it just leaves on the side or is part of the chain from config to WorkGraph (I'd be more confortable with the later actually)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For that to happen, like in the parser PoC, there's a need for an intermediate step where the sockets corresponding to task outputs are created, e.g. self._add_aiida_sockets , and only then add the links.

Actually, it's even easier, just move the call to _link_output_to_task() from _add_aiida_links() to _add_aiida_task_nodes(). That should fix the whole issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think this is a fair point.

agoscinski and others added 2 commits October 23, 2024 16:57
- Add case for `RemoteData` to `_utils.py`
- Remove undefined `path` property from `core.py`
- Add computer as property to `Workflow` -> Name to be discussed, could
  be `host`. Also, could be specified in `root` task instead
- Bug fix in for `port.valid_type` in `Workflow` to make it always
  iterable
- Add `test_config_small_icon_localhost.yaml` (still with hardcoded
  paths, but this is what I used for the ICON run)
- Clean up `test_config_small.yaml` with unused ICON-related stuff
- Add dependencies to `pyproject.toml`
@agoscinski
Copy link
Collaborator Author

We will not continue work on this for now.

@agoscinski agoscinski closed this Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants