Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PackedObjectReader does not implement readline - fails unpicking! #174

Open
edan-bainglass opened this issue Aug 9, 2024 · 2 comments
Open
Labels
bug Something isn't working

Comments

@edan-bainglass
Copy link
Member

edan-bainglass commented Aug 9, 2024

In a use case where an output of a calculation (a GeneralData type of an aiida-workgraph task) is automatically pickled and used as an input by a subsequent calculation, pickle fails to unpickle the pickled output due to a missing readline method on the file handler, which is of type disk-objectstore.utils.PackedObjectReader.

After chatting with @sphuber, it is understood that due to an optimization step by the disk-objectstore, the file handler changed from io.BufferedReader to disk_objectstore.utils.PackedObjectReader. Unfortunately, as I'm unclear as to when this optimization takes place, I am unable to reproduce the issue at the moment, .

In any case, though disk_objectstore.utils.PackedObjectReader implements read, it does not implement readline, both required by pickle for unpickling. As such, pickle fails to unpickle the output.

@sphuber recommends/suggests that disk_objectstore.utils.PackedObjectReader is revisited to ensure it can be treated on an equal footing as an io.BufferedReader.

@edan-bainglass edan-bainglass added the bug Something isn't working label Aug 9, 2024
@sphuber
Copy link
Contributor

sphuber commented Aug 9, 2024

Unfortunately, as I'm unclear as to when this optimization takes place, I am unable to reproduce the issue at the moment, .

The crux is that the bug will only appear when the pickled object is part of a pack. If it is a loose file, the API will return a normal file handle when opening it. However, if it is part of a pack, the PackedObjectReader is used instead, which doesn't implement readline.

The following script provides a minimal working example:

#!/usr/bin/env python
import disk_objectstore
import pickle
import tempfile

with tempfile.TemporaryDirectory() as dirpath:

    # Initialize a disk objectstore container
    container = disk_objectstore.Container(dirpath)
    container.init_container()

    # Pickle an arbitrary object (a random string in this example) and write directly to a pack file
    pickled_container = pickle.dumps('some-string')
    if True:
        key = container.add_objects_to_pack([pickled_container])[0]
    else:
        key = container.add_object(pickled_container)

    # Retrieve the pickled byte stream and try to unpickle it
    with container.get_object_stream(key) as handle:
        pickle.load(handle)

If the conditional is True it writes the pickled object directly to a pack and then the pickle.load call will except

Traceback (most recent call last):
  File "scripts/run_pickle_dos.py", line 21, in <module>
    pickle.load(handle)
TypeError: file must have 'read' and 'readline' attributes

If we switch it to False such that the pickled object is written to a loose file instead, the code will work just fine.

So I think we just need to implement readline on the PackedObjectReader class and we should be golden

@edan-bainglass
Copy link
Member Author

Thanks @sphuber. I'll open a PR draft soon and give it a go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants