Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorized chunk writes #23

Draft
wants to merge 32 commits into
base: main
Choose a base branch
from
Draft

Vectorized chunk writes #23

wants to merge 32 commits into from

Conversation

aliddell
Copy link
Member

@aliddell aliddell commented Dec 5, 2024

  • Add a few virtual methods to ArrayWriter
    • data_root_() constructs the path to the location where chunk or shard files are currently going to be written
    • metadata_path_() constructs the path to the array metadata file
    • Remove version_()
    • Replace compress_buffers_() with compress_chunk_buffer_(), which compresses one buffer at a time
    • Make make_data_sinks_() and close_sinks_() pure virtual methods
  • Rename SinkCreator methods to be explicit about which type of sink (S3 or filesystem).
  • Zarr V2: Enqueue each chunk to be written immediately after compression, rather than waiting until all chunks are compressed before writing them
  • Zarr V3: Write chunks to shards using FileWriteGather (Windows) or pwritev (POSIX).

Base automatically changed from vector-writes to main December 6, 2024 18:22
@aliddell aliddell marked this pull request as ready for review December 13, 2024 20:24
@aliddell aliddell requested a review from jeskesen December 13, 2024 20:24

if (should_rollover || is_finalizing_) {
rollover_();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to the old functionality, rollover_() would now be called if should_rollover==false but is_finalizing==true, whereas it wouldn't have before. I just want to make sure that is intended.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I previously didn't want to call rollover_() when finalizing because it would uselessly increment a counter, but the alternative was to add another check on is_finalizing_ to call close_files_() -- but only when we had a ragged append dimension. This is simpler, and the increment doesn't matter.

src/streaming/array.writer.cpp Show resolved Hide resolved
return success;
}));

if (!queued) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably doesn't need to queue a new job?

@aliddell aliddell marked this pull request as draft December 19, 2024 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants