Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lists of Iteration, Mesh and Particle Pathes #282

Open
wants to merge 4 commits into
base: upcoming-2.0.0
Choose a base branch
from

Conversation

franzpoeschel
Copy link

Please add a brief description (one sentence) here and link the issue this pull-request implements

This is a suggestion for making openPMD markup be usable in a user-specified group hierarchy within an Iteration.

Implements issue: #115

Description

As detailed description as possible.

In principle, custom data can be introduced to openPMD datasets today already without the openPMD standard being any wiser about it, since the standard only describes what must be there. As a result, custom groups, datasets, attributes may be used today already.

With this PR, datasets will be able to profit from openPMD markup within such custom groups, e.g. meshes and particles can be placed in user-defined places within iterations now:

Bildschirmfoto vom 2023-09-12 11-01-36

What is introduced, removed or renamed and why?

The only change necessary to this end is a new definition of the meshesPath and particlesPath attributes. The new definition is defined to be backward compatible with the old definition, e.g. existing datasets using the openPMD 1.0 specification for e.g. meshesPath will still work with openPMD 2.0.

meshesPath and particlesPath are now lists of regular expressions. (This is technically redundant as regular expressions are powerful enough to represent "lists" (e1|e2|e3), but it reduces reliance on regular expressions, and the concept of "listing" paths is important enough to be explicitly expressed in the standard.)

This way, the meshesPath and particlesPath can list the locations where meshes and particles species may be found (either as a relative or an absolute path).

What is made required, recommended, optional?

It no longer makes sense to require that a mesh exists when the meshesPath is defined, likewise the particlesPath (regular expressions might in fact denote an infinite set of locations).

What concept stands behind this change?

The fact that the openPMD standard is extensible and hierarchical. This proposal makes the openPMD standard aware of extended hierarchies and allows interacting with them.

Please also add an example.

-- will follow --

Affected Components

  • base
  • any extension that intends to organize the meshes or species in some way (notably: mesh refinement)

Logic Changes

Which logic changes are conceptually introduced?

Writer Changes

How does this change affect data writers?

Not at all if the new features are not used.

What would a writer need to change?

It can optionally use the new features.

Does this pull request change the interpretation of existing data writers?

No.

Reader Changes

How does this change affect data readers?

Data readers can no longer expect to have a single group containing meshes (likewise particles). Especially in the openPMD-api, the semantics of iteration[0].meshes become unclear when in fact the dataset contains /data/0/fields and /data/0/images.

Data readers need to traverse subdirectories if they intend to find all data.

The meshesPath and particlesPath attributes might now have values that old readers cannot interpret.

What would a reader need to change? Link implementation examples!

Data Converter

How does this affect already existing files with previous versions of the standard?

Not at all.

Is this change possible to be expressed in a way, that an automated tool can update the file?
Link code/pull requests that implement the upgrade logic!

Not necessary.

Discussion:

  • Should we use regular expressions at all? List of strings might also suffice. Otherwise, bash-like globbing might be a good compromise, but we might need to implement it manually.
  • We might wish to drop option (2) for denoting a mesh/particle species (full path to the mesh/species itself) for the sake of simplicity. This will mean that the meshesPath/particlesPath can only refer to the parent group that contains meshes/species (as it does now), implying that any group can contain only (meshes XOR particles XOR custom subgroups). With the current design, mixing them is possible.
  • Alternative approach: Keep the definition of meshesPath and particlesPath as it is now, but allow overriding the meshesPath and particlesPath at each level of the hierarchy. Maybe allow specifying lists, so that things like /data/0/meshes and /data/0/images can be meshes in the same dataset.
    This approach has the advantage of being simpler and having increased forward compatibility, since the "new" features can be better contained in places where old readers don't see them. The meshesPath and particlesPath attributes become less monolithic, i.e. the location of meshes and particles are not described in one single place inside an attribute with rather complicated semantics.

@ax3l ax3l self-requested a review October 5, 2023 17:01
@ax3l ax3l self-assigned this Oct 5, 2023
@ax3l ax3l added the major change non-backwards compatible change label Oct 5, 2023
@ax3l ax3l added this to the openPMD 2.X milestone Oct 5, 2023
@ax3l
Copy link
Member

ax3l commented Oct 5, 2023

Thank you for the standard update and PR, this is excellent.

Let's discuss details in person, yet let me answer some ideas on the open questions above.

Should we use regular expressions at all? List of strings might also suffice. Otherwise, bash-like globbing might be a good compromise, but we might need to implement it manually.

I think regex support (e.g., which flavor) is a bit tricky across languages. But clearly, there is a big benefit in at least supporting globbing for now, because it avoids that every new entry needs to be listed. E.g., for sub-hierarchies that are numbered in a sensible manner.

My feeling is to allow only bash-like globbing for now, which is easy to replace to a regex or matching syntax in any language.

@DavidSagan
Copy link
Collaborator

My feeling is to allow only bash-like globbing for now, which is easy to replace to a regex or matching syntax in any language.

I would be cautious here since globbing and regex are incompatible. If, say, you allow bash-like globbing initially, to be backwards compatible when regex is introduced the standard must support both. This is a bit messy.

@ax3l
Copy link
Member

ax3l commented Oct 5, 2023

Yes, such a change would then be for 3.0 and major. But needs a motivating set of examples why regex parsing as a dependency is needed. I just want to avoid loosing, e.g., Fortran users or people not having the exact implementation of regex flavor we require.

@ax3l
Copy link
Member

ax3l commented Oct 5, 2023

Alternative approach: [...] allow overriding the meshesPath and particlesPath at each level of the hierarchy

Discussed today on nesting (minutes):
Although this is cool and modular, there are a few severe downsides to this for readers if we did this now. The problem is in series, where data sets appear later in the iterations. To discover quickly in a reader the data that a user can select and read, say openPMD-viewer GUI or ParaView, one would need to open and parse all iterations to find where meshes and particles exist - and present a common set that can be selected.
(Mostly an issue with fileBased encoding.)

I would be interested in concrete examples that really need this "discovery" feature to explore it more. (Do we have some deep nested structures we need to support interplay with NeXuS?)

One way to implement this with faster parsing for readers: one could be to have recursive paths from root to these elements that aid discovery: a list of paths to iteration folder(s), each of them having attributes that point to their list of mesh paths and particle paths.

@ax3l ax3l changed the title Custom hierarchies Lists of Iteration, Mesh and Particle Pathes Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major change non-backwards compatible change
Projects
Status: Review
Development

Successfully merging this pull request may close these issues.

3 participants