Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping between ADIOS steps and openPMD iterations #949

Merged
merged 8 commits into from
Oct 25, 2022

Conversation

franzpoeschel
Copy link
Contributor

@franzpoeschel franzpoeschel commented Mar 23, 2021

Background: Until now, our Streaming API assumes that each ADIOS step corresponds with exactly one openPMD iteration and that those iterations are found in ascending order. Once we expose the ADIOS2 Append access mode, this will not necessarily hold true any longer, so this PR explores more flexible alternatives.

Scenario: Run a simulation with data output all 50 steps, checkpoints all 500 steps, use step-based iteration layout (or group-based iteration layout and activate ADIOS steps). Crash at step 750, restart from 500. Data output then needs to be appended to the (single file!) output.bp. From the first run, we have the following:

0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750

When appending, we cannot remove any old steps, just append new ones. So, our file will look like:

0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 500 550 600 650 700 …

Goal: Be able to read that.

First step (useful independent of this issue): Annotate for each ADIOS step the openPMD iteration defined by it.

My current approach is to use the /data/__step__ /data/snapshot attribute introduced by #855 and use it to store the openPMD iteration(s) stored in the current ADIOS step. Afterwards, the reading procedures can inquire that attribute and see which iteration they should return to the user. Fallback to the old solution if the attribute isn't found.

TODO:

@franzpoeschel franzpoeschel force-pushed the topic-step-iteration-mapping branch 3 times, most recently from 6bb3c8a to 60727b1 Compare March 26, 2021 15:14
@ax3l ax3l requested review from ax3l and guj March 30, 2021 04:51
@franzpoeschel franzpoeschel mentioned this pull request Apr 7, 2021
13 tasks
@franzpoeschel franzpoeschel mentioned this pull request Jun 10, 2021
11 tasks
@franzpoeschel franzpoeschel force-pushed the topic-step-iteration-mapping branch 2 times, most recently from cddc98a to 897e5bc Compare July 6, 2021 10:43
@franzpoeschel franzpoeschel force-pushed the topic-step-iteration-mapping branch 2 times, most recently from 09b8a70 to 49b75ee Compare November 1, 2021 14:42
@guj
Copy link
Contributor

guj commented Nov 1, 2021

For the group/variable based files, Is there an option to not write an iteration if it already exists and is valid?

@franzpoeschel
Copy link
Contributor Author

For the group/variable based files, Is there an option to not write an iteration if it already exists and is valid?

Do you mean when appending to an existing Series? If yes, that's a bit challenging, as ADIOS Append mode does not give any read access and openPMD has no handling for redundantly defined iterations yet.
So short answer: No, but I want to look at that specific situation again once #1007 and this PR are merged

@guj
Copy link
Contributor

guj commented Nov 2, 2021

For the group/variable based files, Is there an option to not write an iteration if it already exists and is valid?

Do you mean when appending to an existing Series? If yes, that's a bit challenging, as ADIOS Append mode does not give any read access and openPMD has no handling for redundantly defined iterations yet. So short answer: No, but I want to look at that specific situation again once #1007 and this PR are merged

Oh, I was referring to in your example, restarting at check point 500, which is a few steps before the latest iteration 750. Guess it is a bit of a work to read contents from the existing file first and then ask adios to append at the right place.

Maybe in the future an alternative is to consider to have one file per checkpoint. This way there is no need to append. Always start a new file at checkpoint.

@franzpoeschel
Copy link
Contributor Author

Oh, I was referring to in your example, restarting at check point 500, which is a few steps before the latest iteration 750. Guess it is a bit of a work to read contents from the existing file first and then ask adios to append at the right place.

In that case, it would be better to overwrite the old data with the new one
For that, we will either have to truncate the old file at write time, or to skip redundant iterations at read time. Either option is not entirely trivial and might require further support from Adios

Maybe in the future an alternative is to consider to have one file per checkpoint. This way there is no need to append. Always start a new file at checkpoint.

That would still give you redundantly defined iterations which need to be handled at read time somehow, while adding the additional complexity of needing to handle several files. I'm not sure there would be any benefit to that approach?

@guj
Copy link
Contributor

guj commented Nov 3, 2021

That would still give you redundantly defined iterations which need to be handled at read time somehow, while adding the additional complexity of needing to handle several files. I'm not sure there would be any benefit to that approach?

If every checkpoint has its own file, there is no append needed. At restart always overwrites, so we shall not see redundant iterations. e.g. in your example, file0_500.bp, file550_1000.bp, etc. If crash happened at step 750, restart at step 500. and rewrite file550_1000.bp.

Yes it needs to add new support to read this set of files.

ADIOS is not likely to support remove/update functions as far as I can see. Just my two cents to work around it.

@franzpoeschel franzpoeschel force-pushed the topic-step-iteration-mapping branch 3 times, most recently from a9b6919 to b9443eb Compare January 4, 2022 12:07
@franzpoeschel
Copy link
Contributor Author

That would still give you redundantly defined iterations which need to be handled at read time somehow, while adding the additional complexity of needing to handle several files. I'm not sure there would be any benefit to that approach?

If every checkpoint has its own file, there is no append needed. At restart always overwrites, so we shall not see redundant iterations. e.g. in your example, file0_500.bp, file550_1000.bp, etc. If crash happened at step 750, restart at step 500. and rewrite file550_1000.bp.

This is not about checkpoints. It's about what happens to regular data output when restarting from a checkpoint. Checkpoints already usually work the way you describe and there's no reason to change that.

But when restarting from a checkpoint, you get an "overlap zone" where output steps are written a second time. This is tricky to handle in group/variable-based iteration encodings. This PR is a first step toward solving that, though it is not yet a solution.

Yes it needs to add new support to read this set of files.

ADIOS is not likely to support remove/update functions as far as I can see. Just my two cents to work around it.

Norbert did suggest a truncate option for appending once. Alternatively, we can eliminate duplicate iterations on our own at read time.

@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented Jan 5, 2022

Note: Please don't review this just yet, I'm currently rebasing this upon #1007 because this needs additional logic to deal with files created via Append mode.

Ok, things are fixed now, the other PR should still go first anyway

@franzpoeschel
Copy link
Contributor Author

The issue is already present in dev, I'll try going for a fix in #1218 because it has some features that should help in fixing this.

@franzpoeschel franzpoeschel force-pushed the topic-step-iteration-mapping branch 2 times, most recently from 611d834 to 9f75045 Compare July 25, 2022 15:47
@franzpoeschel franzpoeschel force-pushed the topic-step-iteration-mapping branch 3 times, most recently from be54315 to 62a4f81 Compare August 17, 2022 09:29
@franzpoeschel franzpoeschel force-pushed the topic-step-iteration-mapping branch 2 times, most recently from 8d2580e to bea37de Compare August 19, 2022 14:31
src/ReadIterations.cpp Outdated Show resolved Hide resolved
franzpoeschel and others added 8 commits October 18, 2022 12:26
1) New streaming status: RANDOM_ACCESS, for non-streaming situations
2) Variable attributes, to be written only if the backend has support
   for steps
Only set snapshot attribute if Iteration is not yet written

For v-based iteration encoding, the snapshot attribute is already being
set before this PR. Just add a comment there.

Also add missing <cstdint> includes

Co-authored-by: Axel Huebl <[email protected]>
This means that the snapshot attribute, if present, is used for
accessing iterations inside `series.readIterations()`. Fallback to the
old behavior (linear progression through iterations) if the attribute is
not found.

Variable-b. encoding: Allow several (equivalent) iterations per step

This means that a single step can be marked by /data/snapshot to
represent iterations 0,10,20,30 at the same time.
The underlying data is the same, but the API will treat it as 4 times a
different iteration with equivalent content.

Avoid const_cast by introducing a parsing state and use that when
re-parsing.

Skip repeated iterations that occur in Append mode
Before the explicit iteration-step mapping, these were not seen by
reading procedures at all. Now they are, so we skip the second instance.

Better error message when calling readIterations() too late

This commit includes some refactoring

1. Remove recursion of operator++(), this leads to constant memory usage
   rather than filling the stack at some point
2. Extract subroutines from operator++()
3. Steal some refactoring that solved some bugs on topic-read-leniently, so it stands
to reason that we should apply it here already
In the tests, don't try to read the series with listSeries after already
having fully drained it

Combined test: append mode and weird iteration order

Deactivate troublesome Schema 2021 Append test
Currently only available for BP5 engine, will be generalized into Linear
read mode in openPMD#1291.
If the backend does not support the snapshot attribute, then iterate in
ascending order, skipping duplicate and non-linear iteration indices.
Not possible if the Series is parsed ahead of time.
@franzpoeschel
Copy link
Contributor Author

I've addressed all review comments and cleaned up the commit history today. Commit descriptions are mostly very detailed. Tests ran green, so ready for review :) @ax3l

@@ -282,6 +282,13 @@ void Iteration::flushVariableBased(
Parameter<Operation::OPEN_PATH> pOpen;
pOpen.path = "";
IOHandler()->enqueue(IOTask(this, pOpen));
/*
* In v-based encoding, the snapshot attribute must always be written,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* In v-based encoding, the snapshot attribute must always be written,
* In variableBased encoding, the snapshot attribute must always be written,

@ax3l ax3l merged commit a83bc22 into openPMD:dev Oct 25, 2022
eschnett added a commit to eschnett/openPMD-api that referenced this pull request Nov 11, 2022
* dev: (70 commits)
  Docs: Recommend Static Build for Superbuilds (openPMD#1325)
  Python 3.11 (openPMD#1323)
  pybind11: v2.10.1+ (openPMD#1322)
  Add Attribute::getOptional<T>() and use to add some more dynamic datatype conversions at read time (openPMD#1278)
  Mapping between ADIOS steps and openPMD iterations (openPMD#949)
  Deprecate shareRaw (openPMD#1229)
  Fix append mode double attributes (openPMD#1302)
  Constant scalars: Don't flush double (openPMD#1315)
  Remove caching cmake vars (openPMD#1313)
  [pre-commit.ci] pre-commit autoupdate (openPMD#1311)
  storeChunk: Add an overload for shared_ptr<T[]> (openPMD#1296)
  Fix `operationAsString` Export (openPMD#1309)
  ADIOS2: more fine-grained control for file endings (openPMD#1218)
  [pre-commit.ci] pre-commit autoupdate (openPMD#1307)
  Fix file existence check in parallel tests (openPMD#1303)
  ADIOS2: Flush to disk within a step (openPMD#1207)
  [pre-commit.ci] pre-commit autoupdate (openPMD#1304)
  [pre-commit.ci] pre-commit autoupdate (openPMD#1295)
  Update catch2 to v2.13.9 (openPMD#1299)
  [pre-commit.ci] pre-commit autoupdate (openPMD#1292)
  ...

# Conflicts:
#	.github/workflows/linux.yml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants