-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add custom pickle implementation #354
Conversation
src/dask_awkward/pickle.py
Outdated
layout = ak.to_layout(array, allow_record=True) | ||
form, length, container = ak.operations.to_buffers( | ||
layout, | ||
buffer_key="{form_key}-{attribute}", | ||
form_key="node{id}", | ||
byteorder="<", | ||
) | ||
|
||
# For pickle >= 5, we can avoid copying the buffers | ||
if protocol >= 5: | ||
container = {k: pickle.PickleBuffer(v) for k, v in container.items()} | ||
|
||
if array.behavior is ak.behavior: | ||
behavior = None | ||
else: | ||
behavior = array.behavior | ||
|
||
return ( | ||
object.__new__, | ||
(ak.Array,), | ||
(form.to_dict(), length, container, behavior), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now this is just to_buffers
with no packing. In future we can implement a form-preserving pack operation, perhaps in Awkward Array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No packing is not a bad way to start for serialization in Dask (as long as it's not the default for pickle outside of Dask).
@@ -66,6 +66,9 @@ test = [ | |||
[project.entry-points."dask.sizeof"] | |||
awkward = "dask_awkward.sizeof:register" | |||
|
|||
[project.entry-points."awkward.pickle.reduce"] | |||
dask_awkward = "dask_awkward.pickle:plugin" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intended to be picked up by awkward core?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep!
|
||
# For pickle >= 5, we can avoid copying the buffers | ||
if protocol >= 5: | ||
container = {k: pickle.PickleBuffer(v) for k, v in container.items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We certainly need a test to ensure this does the right thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a copy of the Awkward impl, but i agree that it should be tested here. Todo!
elif isinstance(obj, ak.Array): | ||
return pickle_array(obj, protocol) | ||
else: | ||
return NotImplemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return rather than raise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return is intentional: the sentinel object is there to permit awkward to use this mechanism for more types in a backwards compatible way. 🙂
@martindurant are you OK for me to merge this? :) |
buffers = [] | ||
array = ak.Array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])[[0, 2]] | ||
next_array = pickle.loads( | ||
pickle.dumps(array, protocol=5, buffer_callback=buffers.append), buffers=buffers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice concise way to test pickle 5!
Can we add a line that confirms that buffer
did in fact get some content?
That's all I have, otherwise good to go.
Let me also test records. |
Codecov Report
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the GitHub App Integration for your organization. Read more. @@ Coverage Diff @@
## main #354 +/- ##
==========================================
- Coverage 94.59% 94.54% -0.06%
==========================================
Files 20 21 +1
Lines 2555 2583 +28
==========================================
+ Hits 2417 2442 +25
- Misses 138 141 +3
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Fixes #344 by overriding the default Awkward
__reduce__
implementations.Requires scikit-hep/awkward#2682