Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pickleable and allow data_array to be retrieved and set #44

Closed
wants to merge 4 commits into from

Conversation

stevesimmons
Copy link
Contributor

Here's my first go at making in-memory bloom filters pickleable.

  • Adds a BloomFilter.data_array property to retrieve the bf array data.
  • Adds an optional data_array input to the BloomFilter constructor. This is only valid if no filename is given. If used, the hash_seeds must also be provided.
  • Add a reduce function so that the hash_seeds and data_array are saved when pickling.

I also added cases to add() and contains() for bytes. Without this, the bloomfilter doesn't work for byte inputs when saved/reopened in a new Python process.

Please can you take a look and provide feedback?
I should add tests :), have run out of time today.

@vmizg
Copy link
Contributor

vmizg commented Oct 26, 2020

Hi @stevesimmons, thanks a lot for your contribution! I will see if I have a bit of time during this week to have a look at this.

Comment on lines +34 to +36
if (data) {
memcpy(array->vector, data, num_bits / 8);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this to below the if (!array) check?

else:
# Warning! Only works reliably for objects whose hash is based on value not memory address.
Copy link
Owner

@prashnts prashnts Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, this basically means:

class KlassX:
    pass

class KlassHashable:
    def __hash__(self):
        return 123

bf.add(KlassX()) # this won't work.
bf.add(KlassHashable()) # this will "work" but not in a way we want?

right?

@prashnts
Copy link
Owner

No problem copying the tests from your numba implementation at
https://github.com/stevesimmons/bloomfilter/blob/main/tests/test_bloomfilter.py#L8 ;)

@kylebd99
Copy link

kylebd99 commented Mar 9, 2022

Hey, I just wanted to second the merge on this pull request. I use pybloom with the multiprocessing library and this branch worked like a charm!

@123epsilon
Copy link

Any updates on getting this merged?

@vmizg
Copy link
Contributor

vmizg commented May 4, 2024

I've addressed comments above and resolved conflicts in a separate PR here: #56

@prashnts
Copy link
Owner

prashnts commented May 6, 2024

Merged in #56, thanks everyone.

@prashnts prashnts closed this May 20, 2024
@prashnts
Copy link
Owner

I've just released this as v0.6.0 on pypi. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants