-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mapping between ADIOS steps and openPMD iterations #949
Conversation
6bb3c8a
to
60727b1
Compare
60727b1
to
4c1df72
Compare
cddc98a
to
897e5bc
Compare
953a29a
to
0d9a9a4
Compare
09b8a70
to
49b75ee
Compare
For the group/variable based files, Is there an option to not write an iteration if it already exists and is valid? |
Do you mean when appending to an existing Series? If yes, that's a bit challenging, as ADIOS Append mode does not give any read access and openPMD has no handling for redundantly defined iterations yet. |
Oh, I was referring to in your example, restarting at check point 500, which is a few steps before the latest iteration 750. Guess it is a bit of a work to read contents from the existing file first and then ask adios to append at the right place. Maybe in the future an alternative is to consider to have one file per checkpoint. This way there is no need to append. Always start a new file at checkpoint. |
In that case, it would be better to overwrite the old data with the new one
That would still give you redundantly defined iterations which need to be handled at read time somehow, while adding the additional complexity of needing to handle several files. I'm not sure there would be any benefit to that approach? |
If every checkpoint has its own file, there is no append needed. At restart always overwrites, so we shall not see redundant iterations. e.g. in your example, file0_500.bp, file550_1000.bp, etc. If crash happened at step 750, restart at step 500. and rewrite file550_1000.bp. Yes it needs to add new support to read this set of files. ADIOS is not likely to support remove/update functions as far as I can see. Just my two cents to work around it. |
a9b6919
to
b9443eb
Compare
This is not about checkpoints. It's about what happens to regular data output when restarting from a checkpoint. Checkpoints already usually work the way you describe and there's no reason to change that. But when restarting from a checkpoint, you get an "overlap zone" where output steps are written a second time. This is tricky to handle in group/variable-based iteration encodings. This PR is a first step toward solving that, though it is not yet a solution.
Norbert did suggest a truncate option for appending once. Alternatively, we can eliminate duplicate iterations on our own at read time. |
Note: Ok, things are fixed now, the other PR should still go first anyway |
b9443eb
to
543d894
Compare
98b3c66
to
8793fe8
Compare
f82e21c
to
cf36692
Compare
The issue is already present in dev, I'll try going for a fix in #1218 because it has some features that should help in fixing this. |
611d834
to
9f75045
Compare
9f75045
to
aa628f3
Compare
be54315
to
62a4f81
Compare
8d2580e
to
bea37de
Compare
49082ee
to
3bdfe22
Compare
9cf818f
to
207a8fb
Compare
207a8fb
to
0c4e3fb
Compare
0c4e3fb
to
3bbf0f2
Compare
1) New streaming status: RANDOM_ACCESS, for non-streaming situations 2) Variable attributes, to be written only if the backend has support for steps
Only set snapshot attribute if Iteration is not yet written For v-based iteration encoding, the snapshot attribute is already being set before this PR. Just add a comment there. Also add missing <cstdint> includes Co-authored-by: Axel Huebl <[email protected]>
This means that the snapshot attribute, if present, is used for accessing iterations inside `series.readIterations()`. Fallback to the old behavior (linear progression through iterations) if the attribute is not found. Variable-b. encoding: Allow several (equivalent) iterations per step This means that a single step can be marked by /data/snapshot to represent iterations 0,10,20,30 at the same time. The underlying data is the same, but the API will treat it as 4 times a different iteration with equivalent content. Avoid const_cast by introducing a parsing state and use that when re-parsing. Skip repeated iterations that occur in Append mode Before the explicit iteration-step mapping, these were not seen by reading procedures at all. Now they are, so we skip the second instance. Better error message when calling readIterations() too late This commit includes some refactoring 1. Remove recursion of operator++(), this leads to constant memory usage rather than filling the stack at some point 2. Extract subroutines from operator++() 3. Steal some refactoring that solved some bugs on topic-read-leniently, so it stands to reason that we should apply it here already
Currently only available for BP5 engine, will be generalized into Linear read mode in openPMD#1291. If the backend does not support the snapshot attribute, then iterate in ascending order, skipping duplicate and non-linear iteration indices. Not possible if the Series is parsed ahead of time.
3bbf0f2
to
e491869
Compare
I've addressed all review comments and cleaned up the commit history today. Commit descriptions are mostly very detailed. Tests ran green, so ready for review :) @ax3l |
@@ -282,6 +282,13 @@ void Iteration::flushVariableBased( | |||
Parameter<Operation::OPEN_PATH> pOpen; | |||
pOpen.path = ""; | |||
IOHandler()->enqueue(IOTask(this, pOpen)); | |||
/* | |||
* In v-based encoding, the snapshot attribute must always be written, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* In v-based encoding, the snapshot attribute must always be written, | |
* In variableBased encoding, the snapshot attribute must always be written, |
* dev: (70 commits) Docs: Recommend Static Build for Superbuilds (openPMD#1325) Python 3.11 (openPMD#1323) pybind11: v2.10.1+ (openPMD#1322) Add Attribute::getOptional<T>() and use to add some more dynamic datatype conversions at read time (openPMD#1278) Mapping between ADIOS steps and openPMD iterations (openPMD#949) Deprecate shareRaw (openPMD#1229) Fix append mode double attributes (openPMD#1302) Constant scalars: Don't flush double (openPMD#1315) Remove caching cmake vars (openPMD#1313) [pre-commit.ci] pre-commit autoupdate (openPMD#1311) storeChunk: Add an overload for shared_ptr<T[]> (openPMD#1296) Fix `operationAsString` Export (openPMD#1309) ADIOS2: more fine-grained control for file endings (openPMD#1218) [pre-commit.ci] pre-commit autoupdate (openPMD#1307) Fix file existence check in parallel tests (openPMD#1303) ADIOS2: Flush to disk within a step (openPMD#1207) [pre-commit.ci] pre-commit autoupdate (openPMD#1304) [pre-commit.ci] pre-commit autoupdate (openPMD#1295) Update catch2 to v2.13.9 (openPMD#1299) [pre-commit.ci] pre-commit autoupdate (openPMD#1292) ... # Conflicts: # .github/workflows/linux.yml
Background: Until now, our Streaming API assumes that each ADIOS step corresponds with exactly one openPMD iteration and that those iterations are found in ascending order. Once we expose the ADIOS2 Append access mode, this will not necessarily hold true any longer, so this PR explores more flexible alternatives.
Scenario: Run a simulation with data output all 50 steps, checkpoints all 500 steps, use step-based iteration layout (or group-based iteration layout and activate ADIOS steps). Crash at step 750, restart from 500. Data output then needs to be appended to the (single file!)
output.bp
. From the first run, we have the following:When appending, we cannot remove any old steps, just append new ones. So, our file will look like:
Goal: Be able to read that.
First step (useful independent of this issue): Annotate for each ADIOS step the openPMD iteration defined by it.
My current approach is to use the
/data/__step__
/data/snapshot
attribute introduced by #855 and use it to store the openPMD iteration(s) stored in the current ADIOS step. Afterwards, the reading procedures can inquire that attribute and see which iteration they should return to the user. Fallback to the old solution if the attribute isn't found.TODO:
/data/snapshot
when readingseries.iteration[key]
fixing this is for a follow-up PR