-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly handle the mapped and registered regions in memory_mapped_source
#16865
Conversation
…impr-buffer_register-exact-range
…into rework-read_csv-ingest
…rework-read_csv-ingest
…into rework-read_csv-ingest
…rework-read_csv-ingest
…rework-read_csv-ingest
memory_mapped_source
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## branch-24.12 #16865 +/- ##
===============================================
Coverage ? 82.02%
===============================================
Files ? 183
Lines ? 29193
Branches ? 0
===============================================
Hits ? 23945
Misses ? 5248
Partials ? 0 ☔ View full report in Codecov by Sentry. |
size_t max_size_estimate = 0, | ||
size_t min_size_estimate = 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the max includes the padding for the last row when reading a byte range. We're now passing the byte range size without padding as well, for the operations that should not include the padding, such as the buffer registration (because registering overlapping ranges fails, and this can happen when we read a CSV or a JSON file in chunks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make that use case more obvious in the docstrings? I wouldn't know the difference between padded/non-padded and registration restrictions from this docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to throw if max < min
? If so that @throws
would go into the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added docs here and in memory_mapped_source
constructor. I think they add up to a full explanation, let me know what you think :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Thanks.
@@ -54,6 +55,30 @@ class file_source : public datasource { | |||
} | |||
} | |||
|
|||
std::unique_ptr<buffer> host_read(size_t offset, size_t size) override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved the implementation from the derived class because the memory mapped source now performs direct reads when the location is outside of the mapped range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the implementation is fine, but the public API / docstrings could be better.
size_t max_size_estimate = 0, | ||
size_t min_size_estimate = 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make that use case more obvious in the docstrings? I wouldn't know the difference between padded/non-padded and registration restrictions from this docstring.
size_t max_size_estimate = 0, | ||
size_t min_size_estimate = 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to throw if max < min
? If so that @throws
would go into the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vuule
/merge |
Description
Depends on #16826
Set of fixes that improve robustness on the non-GDS file input:
Modifies the datasource class hierarchy to avoid reuse of direct file
host_read
sChecklist