Lists of Iteration, Mesh and Particle Pathes #282

franzpoeschel · 2023-09-12T10:15:02Z

Please add a brief description (one sentence) here and link the issue this pull-request implements

This is a suggestion for making openPMD markup be usable in a user-specified group hierarchy within an Iteration.

Implements issue: #115

Description

As detailed description as possible.

In principle, custom data can be introduced to openPMD datasets today already without the openPMD standard being any wiser about it, since the standard only describes what must be there. As a result, custom groups, datasets, attributes may be used today already.

With this PR, datasets will be able to profit from openPMD markup within such custom groups, e.g. meshes and particles can be placed in user-defined places within iterations now:

What is introduced, removed or renamed and why?

The only change necessary to this end is a new definition of the meshesPath and particlesPath attributes. The new definition is defined to be backward compatible with the old definition, e.g. existing datasets using the openPMD 1.0 specification for e.g. meshesPath will still work with openPMD 2.0.

meshesPath and particlesPath are now lists of regular expressions. (This is technically redundant as regular expressions are powerful enough to represent "lists" (e1|e2|e3), but it reduces reliance on regular expressions, and the concept of "listing" paths is important enough to be explicitly expressed in the standard.)

This way, the meshesPath and particlesPath can list the locations where meshes and particles species may be found (either as a relative or an absolute path).

What is made required, recommended, optional?

It no longer makes sense to require that a mesh exists when the meshesPath is defined, likewise the particlesPath (regular expressions might in fact denote an infinite set of locations).

What concept stands behind this change?

The fact that the openPMD standard is extensible and hierarchical. This proposal makes the openPMD standard aware of extended hierarchies and allows interacting with them.

Please also add an example.

-- will follow --

Affected Components

base
any extension that intends to organize the meshes or species in some way (notably: mesh refinement)

Logic Changes

Which logic changes are conceptually introduced?

Writer Changes

How does this change affect data writers?

Not at all if the new features are not used.

What would a writer need to change?

It can optionally use the new features.

Does this pull request change the interpretation of existing data writers?

No.

libopenPMD: The proposal is prototyped in Custom Hierarchies openPMD-api#1432

Reader Changes

How does this change affect data readers?

Data readers can no longer expect to have a single group containing meshes (likewise particles). Especially in the openPMD-api, the semantics of iteration[0].meshes become unclear when in fact the dataset contains /data/0/fields and /data/0/images.

Data readers need to traverse subdirectories if they intend to find all data.

The meshesPath and particlesPath attributes might now have values that old readers cannot interpret.

What would a reader need to change? Link implementation examples!

libopenPMD: The proposal is prototyped in Custom Hierarchies openPMD-api#1432
openPMD-validator: https://github.com/openPMD/openPMD-validator/...
openPMD-viewer: https://github.com/openPMD/openPMD-viewer/...
yt: https://github.com/yt-project/yt/...
VisIt: https://github.com/openPMD/openPMD-visit-plugin
XDMF (upcoming): https://github.com/openPMD/openPMD-tools/...

Data Converter

How does this affect already existing files with previous versions of the standard?

Not at all.

Is this change possible to be expressed in a way, that an automated tool can update the file?
Link code/pull requests that implement the upgrade logic!

Not necessary.

https://github.com/openPMD/openPMD-converter/...

Discussion:

Should we use regular expressions at all? List of strings might also suffice. Otherwise, bash-like globbing might be a good compromise, but we might need to implement it manually.
We might wish to drop option (2) for denoting a mesh/particle species (full path to the mesh/species itself) for the sake of simplicity. This will mean that the meshesPath/particlesPath can only refer to the parent group that contains meshes/species (as it does now), implying that any group can contain only (meshes XOR particles XOR custom subgroups). With the current design, mixing them is possible.
Alternative approach: Keep the definition of meshesPath and particlesPath as it is now, but allow overriding the meshesPath and particlesPath at each level of the hierarchy. Maybe allow specifying lists, so that things like /data/0/meshes and /data/0/images can be meshes in the same dataset.
This approach has the advantage of being simpler and having increased forward compatibility, since the "new" features can be better contained in places where old readers don't see them. The meshesPath and particlesPath attributes become less monolithic, i.e. the location of meshes and particles are not described in one single place inside an attribute with rather complicated semantics.

ax3l · 2023-10-05T17:19:30Z

Thank you for the standard update and PR, this is excellent.

Let's discuss details in person, yet let me answer some ideas on the open questions above.

Should we use regular expressions at all? List of strings might also suffice. Otherwise, bash-like globbing might be a good compromise, but we might need to implement it manually.

I think regex support (e.g., which flavor) is a bit tricky across languages. But clearly, there is a big benefit in at least supporting globbing for now, because it avoids that every new entry needs to be listed. E.g., for sub-hierarchies that are numbered in a sensible manner.

My feeling is to allow only bash-like globbing for now, which is easy to replace to a regex or matching syntax in any language.

DavidSagan · 2023-10-05T17:28:06Z

My feeling is to allow only bash-like globbing for now, which is easy to replace to a regex or matching syntax in any language.

I would be cautious here since globbing and regex are incompatible. If, say, you allow bash-like globbing initially, to be backwards compatible when regex is introduced the standard must support both. This is a bit messy.

ax3l · 2023-10-05T17:30:48Z

Yes, such a change would then be for 3.0 and major. But needs a motivating set of examples why regex parsing as a dependency is needed. I just want to avoid loosing, e.g., Fortran users or people not having the exact implementation of regex flavor we require.

ax3l · 2023-10-05T18:25:24Z

Alternative approach: [...] allow overriding the meshesPath and particlesPath at each level of the hierarchy

Discussed today on nesting (minutes):
Although this is cool and modular, there are a few severe downsides to this for readers if we did this now. The problem is in series, where data sets appear later in the iterations. To discover quickly in a reader the data that a user can select and read, say openPMD-viewer GUI or ParaView, one would need to open and parse all iterations to find where meshes and particles exist - and present a common set that can be selected.
(Mostly an issue with fileBased encoding.)

I would be interested in concrete examples that really need this "discovery" feature to explore it more. (Do we have some deep nested structures we need to support interplay with NeXuS?)

One way to implement this with faster parsing for readers: one could be to have recursive paths from root to these elements that aid discovery: a list of paths to iteration folder(s), each of them having attributes that point to their list of mesh paths and particle paths.

franzpoeschel added 2 commits September 12, 2023 10:54

Document extended meshesPath functionality

e91293b

Same for particlesPath

f868018

ax3l self-requested a review October 5, 2023 17:01

ax3l self-assigned this Oct 5, 2023

ax3l added the major change non-backwards compatible change label Oct 5, 2023

ax3l added this to the openPMD 2.X milestone Oct 5, 2023

ax3l changed the title ~~Custom hierarchies~~ Lists of Iteration, Mesh and Particle Pathes Oct 5, 2023

franzpoeschel added 2 commits December 21, 2023 18:37

Globbing for meshes

543c374

Globbing for particles

35f75fc

franzpoeschel mentioned this pull request Mar 15, 2024

Writing and Reading Non-openPMD Infos openPMD/openPMD-api#15

Open

franzpoeschel mentioned this pull request Aug 12, 2024

Custom Hierarchies openPMD/openPMD-api#1432

Open

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lists of Iteration, Mesh and Particle Pathes #282

Lists of Iteration, Mesh and Particle Pathes #282

franzpoeschel commented Sep 12, 2023

ax3l commented Oct 5, 2023

DavidSagan commented Oct 5, 2023

ax3l commented Oct 5, 2023 •

edited

Loading

ax3l commented Oct 5, 2023 •

edited

Loading

Lists of Iteration, Mesh and Particle Pathes #282

Are you sure you want to change the base?

Lists of Iteration, Mesh and Particle Pathes #282

Conversation

franzpoeschel commented Sep 12, 2023

Description

Affected Components

Logic Changes

Writer Changes

Reader Changes

Data Converter

Discussion:

ax3l commented Oct 5, 2023

DavidSagan commented Oct 5, 2023

ax3l commented Oct 5, 2023 • edited Loading

ax3l commented Oct 5, 2023 • edited Loading

ax3l commented Oct 5, 2023 •

edited

Loading

ax3l commented Oct 5, 2023 •

edited

Loading