-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement and test S3 Sink, with usage in runtime. #278
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t have clear blockers, but I think a round of cleanup and clarifications will be good before merging. There are several concerns that I have in general:
- I'm worried that the error handling patterns in this feature are unclear. Using catch-all for exceptions and the practice of throwing for catching and logging is especially concerning. Error handling for URIs also feels incomplete.
- I think we're using
noexcept
inappropriately (primarily to convey intent), which could cause issues over time. - I might be missing some context, but the use of certain containers (
std::list
andstd::queue
) doesn’t make a lot of sense to me. These are not trivial decisions because they typically convey intent and come with overhead.
Anyway, I think it’s almost there.
src/writers/sink.creator.cpp
Outdated
const auto object_key = object_keys.front(); | ||
object_keys.pop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why do we need to pop all the objects? If it's not absolutely necessary we should use a vector.
- What happens if an object fails (an iteration throws an exception)? If we pop it from the queue, it's lost no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I use the queue to construct paths, as we discussed in a previous PR. I'd have to rework that whole thing. I really don't think that a vector is the right tool for constructing paths like
/(1...k)/(1...m)/(1...n)/.../(1...z)
. - I've been thinking that since we're not actually doing anything resource intensive, like opening a file, when we create an S3Sink, we can do this serially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I’m missing something here. I’ll tinker with the code to better understand this requirement.
src/writers/s3.sink.hh
Outdated
std::shared_ptr<common::S3ConnectionPool> connection_pool_; | ||
|
||
// multipart upload | ||
std::array<uint8_t, 5 << 20> part_buffer_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would restore the comment or create a variable (static constexpr size_t five_mb_size = 5 << 20
) so it's obvious what this number represents.
src/writers/s3.sink.cpp
Outdated
auto get_upload_id = [this, &connection]() -> std::string { | ||
if (upload_id_.empty()) { | ||
upload_id_ = | ||
connection->create_multipart_object(bucket_name_, object_key_); | ||
} | ||
return upload_id_; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be a separate method. Sorry if my comment was confusing—I typically refer to accessor methods as computed properties.
src/writers/s3.sink.cpp
Outdated
} | ||
|
||
bool | ||
zarr::S3Sink::write(size_t offset, const uint8_t* data, size_t bytes_of_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A common pattern for identifying unused parameters is using underscores.
zarr::S3Sink::write(size_t _, const uint8_t* data, size_t bytes_of_data)
src/writers/sink.creator.cpp
Outdated
const auto object_key = object_keys.front(); | ||
object_keys.pop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I’m missing something here. I’ll tinker with the code to better understand this requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor comments, otherwise, LGTM 👍
No description provided.