Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add custom pickle implementation #354

Merged
merged 9 commits into from
Sep 6, 2023
Merged

Conversation

agoose77
Copy link
Collaborator

Fixes #344 by overriding the default Awkward __reduce__ implementations.

Requires scikit-hep/awkward#2682

Comment on lines 36 to 57
layout = ak.to_layout(array, allow_record=True)
form, length, container = ak.operations.to_buffers(
layout,
buffer_key="{form_key}-{attribute}",
form_key="node{id}",
byteorder="<",
)

# For pickle >= 5, we can avoid copying the buffers
if protocol >= 5:
container = {k: pickle.PickleBuffer(v) for k, v in container.items()}

if array.behavior is ak.behavior:
behavior = None
else:
behavior = array.behavior

return (
object.__new__,
(ak.Array,),
(form.to_dict(), length, container, behavior),
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now this is just to_buffers with no packing. In future we can implement a form-preserving pack operation, perhaps in Awkward Array

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No packing is not a bad way to start for serialization in Dask (as long as it's not the default for pickle outside of Dask).

@@ -66,6 +66,9 @@ test = [
[project.entry-points."dask.sizeof"]
awkward = "dask_awkward.sizeof:register"

[project.entry-points."awkward.pickle.reduce"]
dask_awkward = "dask_awkward.pickle:plugin"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended to be picked up by awkward core?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!


# For pickle >= 5, we can avoid copying the buffers
if protocol >= 5:
container = {k: pickle.PickleBuffer(v) for k, v in container.items()}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We certainly need a test to ensure this does the right thing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a copy of the Awkward impl, but i agree that it should be tested here. Todo!

elif isinstance(obj, ak.Array):
return pickle_array(obj, protocol)
else:
return NotImplemented
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return rather than raise?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return is intentional: the sentinel object is there to permit awkward to use this mechanism for more types in a backwards compatible way. 🙂

@agoose77
Copy link
Collaborator Author

agoose77 commented Sep 4, 2023

@martindurant are you OK for me to merge this? :)

buffers = []
array = ak.Array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])[[0, 2]]
next_array = pickle.loads(
pickle.dumps(array, protocol=5, buffer_callback=buffers.append), buffers=buffers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice concise way to test pickle 5!
Can we add a line that confirms that buffer did in fact get some content?

That's all I have, otherwise good to go.

@agoose77
Copy link
Collaborator Author

agoose77 commented Sep 5, 2023

Let me also test records.

@codecov-commenter
Copy link

Codecov Report

Merging #354 (7df393f) into main (d8a916b) will decrease coverage by 0.06%.
The diff coverage is 89.28%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the GitHub App Integration for your organization. Read more.

@@            Coverage Diff             @@
##             main     #354      +/-   ##
==========================================
- Coverage   94.59%   94.54%   -0.06%     
==========================================
  Files          20       21       +1     
  Lines        2555     2583      +28     
==========================================
+ Hits         2417     2442      +25     
- Misses        138      141       +3     
Files Changed Coverage Δ
src/dask_awkward/pickle.py 89.28% <89.28%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@agoose77 agoose77 merged commit dbe791f into main Sep 6, 2023
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Layout changes due to pickling of partitions are not tracked during typetracing
4 participants