-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IterationEncoding: variableBased #250
base: upcoming-2.0.0
Are you sure you want to change the base?
Conversation
4abd2bf
to
b22738b
Compare
FORMAT_ADIOS.md
Outdated
|
||
In order to correlate openPMD iterations with ADIOS steps, the *root* group (path `/`) in ADIOS must contain a variable: | ||
|
||
- `__step__` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure we used __step__
and not __steps__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @franzpoeschel let's call this snapshot
without __
and with openPMD 2.0 "iteration" naming: #148
We read snapshot
in frontend classes in openPMD-api and it might be useful for other use cases, e.g. data sets with only one snapshot (iteration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that in the current implementations this attribute is not part of the root group, but in /data
, so it's /data/snapshot
.
In variable-based encoding this makes the attribute look part of the iteration:
> bpls ../samples/variableBasedSeries.bp4 -alt
Step 0:
string /basePath attr = "/data/%T/"
uint64_t /data/changing_value attr = 0
double /data/dt attr = 1
double /data/meshes/E/0/position attr = 0
uint64_t /data/meshes/E/0/shape attr = 1
double /data/meshes/E/0/unitSI attr = 1
uint64_t /data/meshes/E/0/value attr = 0
uint64_t /data/meshes/E/attr_0 attr = 0
string /data/meshes/E/axisLabels attr = {"x"}
string /data/meshes/E/dataOrder attr = "C"
string /data/meshes/E/geometry attr = "cartesian"
double /data/meshes/E/gridGlobalOffset attr = 0
double /data/meshes/E/gridSpacing attr = 1
double /data/meshes/E/gridUnitSI attr = 1
float /data/meshes/E/timeOffset attr = 0
double /data/meshes/E/unitDimension attr = {0, 0, 0, 0, 0, 0, 0}
int32_t /data/meshes/E/x {1000}
double /data/meshes/E/x/position attr = 0
double /data/meshes/E/x/unitSI attr = 1
int32_t /data/meshes/E/y {1}
double /data/meshes/E/y/position attr = 0
double /data/meshes/E/y/unitSI attr = 1
string /data/meshes/changing_constant/axisLabels attr = {"x"}
uint64_t /data/meshes/changing_constant/changing_constant/shape attr = 0
uint64_t /data/meshes/changing_constant/changing_constant/value attr = 0
string /data/meshes/changing_constant/dataOrder attr = "C"
string /data/meshes/changing_constant/geometry attr = "cartesian"
double /data/meshes/changing_constant/gridGlobalOffset attr = 0
double /data/meshes/changing_constant/gridSpacing attr = 1
double /data/meshes/changing_constant/gridUnitSI attr = 1
double /data/meshes/changing_constant/position attr = 0
uint64_t /data/meshes/changing_constant/shape attr = 0
float /data/meshes/changing_constant/timeOffset attr = 0
double /data/meshes/changing_constant/unitDimension attr = {0, 0, 0, 0, 0, 0, 0}
double /data/meshes/changing_constant/unitSI attr = 1
uint64_t /data/meshes/changing_constant/value attr = 0
uint64_t /data/particles/changing_constant/position/position/shape attr = 0
uint64_t /data/particles/changing_constant/position/position/value attr = 0
uint64_t /data/particles/changing_constant/position/shape attr = 0
float /data/particles/changing_constant/position/timeOffset attr = 0
double /data/particles/changing_constant/position/unitDimension attr = {1, 0, 0, 0, 0, 0, 0}
double /data/particles/changing_constant/position/unitSI attr = 1
uint64_t /data/particles/changing_constant/position/value attr = 0
uint64_t /data/snapshot attr = 0
double /data/time attr = 0
double /data/timeUnitSI attr = 1
string /date attr = "2022-08-19 08:35:01 +0000"
string /iterationEncoding attr = "variableBased"
string /iterationFormat attr = "/data"
string /meshesPath attr = "meshes/"
string /openPMD attr = "1.1.0"
uint32_t /openPMDextension attr = 0
string /particlesPath attr = "particles/"
string /software attr = "openPMD-api"
string /softwareVersion attr = "0.15.0-dev"
In group-based encoding however, the snapshot
attribute is then at the level above the single iterations:
> bpls ../samples/bp4steps_yes_yes.bp/ -alt
Step 0:
string /basePath attr = "/data/%T/"
double /data/0/dt attr = 1
string /data/0/meshes/E/axisLabels attr = {"x"}
string /data/0/meshes/E/dataOrder attr = "C"
string /data/0/meshes/E/geometry attr = "cartesian"
double /data/0/meshes/E/gridGlobalOffset attr = 0
double /data/0/meshes/E/gridSpacing attr = 1
double /data/0/meshes/E/gridUnitSI attr = 1
float /data/0/meshes/E/timeOffset attr = 0
double /data/0/meshes/E/unitDimension attr = {0, 0, 0, 0, 0, 0, 0}
string /data/0/meshes/E/vector_of_string attr = {"vector", "of", "string"}
int32_t /data/0/meshes/E/x {10}
double /data/0/meshes/E/x/position attr = 0
double /data/0/meshes/E/x/unitSI attr = 1
double /data/0/time attr = 0
double /data/0/timeUnitSI attr = 1
uint64_t /data/snapshot attr = 0
string /date attr = "2022-08-19 08:38:04 +0000"
string /iterationEncoding attr = "groupBased"
string /iterationFormat attr = "/data/%T/"
string /meshesPath attr = "meshes/"
string /openPMD attr = "1.1.0"
uint32_t /openPMDextension attr = 0
string /software attr = "openPMD-api"
string /softwareVersion attr = "0.15.0-dev"
Add standard guidance for stepBased iteration encoding.
299e892
to
7823519
Compare
|
||
In order to correlate openPMD iterations with an index of data-format internal updates/steps or an index in the slowest varying dimension of an array, the *root* group (path `/`) must contain an additional variable once `stepBased` is chosen for `iterationEncoding`: | ||
|
||
- `snapshot` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: we could allow to skip this if only one iteration (snapshot) is written.
In that case, the implied value should be 0
and there must be exactly one update/step in the data format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what the other backends actually do when that iteration encoding is chosen, see the variableBasedSingleIteration
test. The snapshot
attribute is not written.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current state of openPMD/openPMD-api#949, the snapshot
attribute is always written, but not required at read-time (then assumed to be 0). I should add a test somehow to ensure that reading without snapshot
works as intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now tested
7823519
to
cee7330
Compare
FORMAT_ADIOS.md
Outdated
@@ -32,3 +32,28 @@ Output from `bpls -A` for a boolean attribute `pybool` stored in the location of | |||
|
|||
There is no convention yet for a unique representation of ADIOS2 variables with boolean type. | |||
Thus, implementations should cast the data to and from `unsigned char` instead. | |||
|
|||
## `stepBased` Encoding of Iterations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename: I am not sure why, but for some reason we now call this variableBased
in openPMD/openPMD-api#855
@franzpoeschel let's clarify what we pick, shall I update the standard PR to be named variableBased
, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, we discussed this a while ago and came to the conclusion to call it variable-based since steps are an ADIOS2-specific feature, but this encoding generally relies on a backend's ability to have variable datasets.
FORMAT_ADIOS.md
Outdated
|
||
## Attributes | ||
|
||
openPMD **attributes** stored as ADIOS `Variables` at the location where they would usually be stored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@franzpoeschel raised this is currently implemented that way and I think since we don't need to change the type of attributes over time, we can keep it so:
openPMD **attributes** stored as ADIOS `Variables` at the location where they would usually be stored. | |
openPMD **attributes** stored as ADIOS `Variables` at the location where they would usually be stored. | |
The `__is_boolean__/...` qualifiers are still stored as ADIOS `Attribute`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
openPMD attributes stored as ADIOS
Variables
Is this outdated? We will remove the new ADIOS2 schema where this happens
FORMAT_ADIOS.md
Outdated
|
||
## `stepBased` Encoding of Iterations | ||
|
||
The `iterationEncoding` mode `stepBased` must be implemented via ADIOS steps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `iterationEncoding` mode `stepBased` must be implemented via ADIOS steps. | |
The `iterationEncoding` mode `variableBased` must be implemented via a backend's feature to describe *variable* datasets and attributes. | |
This means that such datasets and attributes are present in different versions with different contents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I missed here that this file was ADIOS-specific.
FORMAT_ADIOS.md
Outdated
|
||
## Datasets | ||
|
||
An openPMD **data set** is represented by an group prefix that contains an ADIOS variable `__data__`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This describes the ADIOS2 schema that we will abolish
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, we will stick with a schema that is similar to the current schema (datasets and attributes are distinguished by using different backend features (variables and attributes)), but with two additions:
- Attributes can be variable now too
- We will use some protocol for identifying if a group is active in the current step
Since that updated schema is not yet implemented, I'd suggest we don't describe this just yet. I would not like to standardize something that in the end turns out to not work well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've started exploring that new schema here.
|
||
An openPMD **data set** is represented by an group prefix that contains an ADIOS variable `__data__`. | ||
|
||
**attributes** are defined further below and can also appear at the dataset's **group** prefix level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "further below" mean in this context?
double /data/meshes/E/x/__data__ 10*{1000} | ||
double /data/meshes/E/x/position 10*{1} | ||
double /data/meshes/E/x/unitSI 10*scalar | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated. Since the old ADIOS2 schema does not yet support variable-based iteration encoding, we currently only have this kind of experimental implementation.
STANDARD.md
Outdated
@@ -212,6 +216,7 @@ Each file's *root* group (path `/`) must further define the attributes: | |||
- allowed values: | |||
- `fileBased` (multiple files) | |||
- `groupBased` (one file) | |||
- `stepBased` (one file with internal encoding for iterations, if supported by the data format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `stepBased` (one file with internal encoding for iterations, if supported by the data format) | |
- `variableBased` (one file with internal encoding for iterations, if supported by the data format) |
STANDARD.md
Outdated
- allowed values: | ||
- see *Iterations and Time Series* below | ||
- for `fileBased` and `groupBased`, this is fixed to `/data/%T/` | ||
- for `stepBased` this is fixed to `/data/` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- for `stepBased` this is fixed to `/data/` | |
- for `variableBased` this is fixed to `/data/` |
STANDARD.md
Outdated
- data-format internal convention | ||
- *slowest varying index* of data | ||
|
||
### `stepBased` Encoding of Iterations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### `stepBased` Encoding of Iterations | |
### `variableBased` Encoding of Iterations |
STANDARD.md
Outdated
|
||
### `stepBased` Encoding of Iterations | ||
|
||
In order to correlate openPMD iterations with an index of data-format internal updates/steps or an index in the slowest varying dimension of an array, the *root* group (path `/`) must contain an additional variable once `stepBased` is chosen for `iterationEncoding`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to correlate openPMD iterations with an index of data-format internal updates/steps or an index in the slowest varying dimension of an array, the *root* group (path `/`) must contain an additional variable once `stepBased` is chosen for `iterationEncoding`: | |
In order to correlate openPMD iterations with an index of data-format internal updates/steps or an index in the slowest varying dimension of an array, the *root* group (path `/`) must contain an additional variable once `variableBased` is chosen for `iterationEncoding`: |
In order to correlate openPMD iterations with an index of data-format internal updates/steps or an index in the slowest varying dimension of an array, the *root* group (path `/`) must contain an additional variable once `stepBased` is chosen for `iterationEncoding`: | ||
|
||
- `snapshot` | ||
- type: 1-dimensional array containing N *(int)* elements, where N is the number of updates/steps in the data format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently not implemented as an array, but as a scalar variable that changes across steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation actually accepts arrays at read-time, but I should test that it works. At write time, the API currently only produces scalars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now tested
- description: for each update/step in a data format, this variable needs to be updated with the corresponding openPMD iteration. | ||
- note: in some data formats, updates/steps are absolute and not every update/step contains an update for each declared openPMD record | ||
- advice to implementers: an openPMD iteration might be spread over multiple updates/steps, but not vice versa. | ||
In such a scenario, an individual openPMD record's update/step must appear exactly once per iteration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: A similar situation can occur when using Append mode: An iteration is then present multiple times with redundant definitions. This will either be solved by truncation or by reading only the first/last instance of that iteration.
STANDARD.md
Outdated
files (`fileBased`) or series of groups (`groupBased`) should have | ||
attributes that describe the current time and the last | ||
time step. | ||
In addition to holding information about the iteration, each series of files (`fileBased`), series of groups (`groupBased`) or internally encoded iterations (`stepBased`) should have attributes that describe the current time and the last time step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to holding information about the iteration, each series of files (`fileBased`), series of groups (`groupBased`) or internally encoded iterations (`stepBased`) should have attributes that describe the current time and the last time step. | |
In addition to holding information about the iteration, each series of files (`fileBased`), series of groups (`groupBased`) or internally encoded iterations (`variableBased`) should have attributes that describe the current time and the last time step. |
Co-authored-by: Axel Huebl <[email protected]>
Make the example more realistic, also minor fixes
Updates to variable-based encoding
Please add a brief description (one sentence) here and link the issue this pull-request implements
Implements issue: #221 #236
Description
Add standard guidance for
stepBased
iteration encoding.stepBased
iteration encoding uses features of a storage, backend API or file format to encode time-varying data sets and attributes.Affected Components
base
FORMAT: ADIOS
Logic Changes
Instead of storing iterations (snapshots) in individual groups, we rely on internal capabilities of a data format to store updates/revisions.
snapshot
variable to map backend steps to openPMD iterations (Variable-Based iteration layout openPMD-api#855)ADIOS:
/__data__
used for openPMD record components (Use ADIOS variables for openPMD attributes openPMD-api#813)Writer Changes
Reader Changes
openPMD-validator
: https://github.com/openPMD/openPMD-validator/...openPMD-viewer
: https://github.com/openPMD/openPMD-viewer/...yt
: N/A (hdf5 only atm.)VisIt
: https://github.com/openPMD/openPMD-visit-pluginopenPMD-api
: Use ADIOS variables for openPMD attributes openPMD-api#813 Variable-Based iteration layout openPMD-api#855 Mapping between ADIOS steps and openPMD iterations openPMD-api#949Data Converter
Since this is a new iteration encoding that exists in parallel to existing iteration encodings, no conversion is necessary.