-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the "Hash" and make it faster #10
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR, I like the direction! It does a lot of things at once and I'd prefere to be more diligent and evaluate each separately.
) | ||
|
||
// randi returns either i1 or i2 randomly. | ||
func randi(i1, i2 uint) uint { | ||
if rand.Int31()%2 == 0 { | ||
// it's faster than mod, but the result is the almost same. | ||
if uint32(uint64(uint32(rng.Uint64()))*uint64(2)>>32) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes it much harder to read, is the optimization really worth it? Please do a separate PR/evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After studying the code in rng.Uint64(), I see that there will be a speed up in particular on concurrent use. So moving to the new library looks good to me.
But I don't like the obfuscated code here. Please revert it to
rng.Uint64() % 2 == 0
In my opinion, readability is more important here than the small gain from converting and bit shifting.
} | ||
func init() { | ||
b := make([]byte, 2) | ||
endian := getEndian() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the advantage in deriving endianes? Hard-coding either big or little for all platforms should do the same afaik.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undo, I don't see the benefit.
|
||
var ( | ||
// plus 1 to prevent out of range. | ||
altHash = [maxFingerprint + 1]uint{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes with a memory penalty of 2^16 /1024 = 64kb.
Please do a separate PR/evaluation to see if it's worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I agree with your argument that in most cases, the filter itself will already be quite big, hence the additional 64 kb don't add much. Please send as separate pull request.
Hello, sorry for delay. Here is the evaluation benchmark result. The source code :
Answer:
Yes, it worths. Golang's rand has a mutex lock, which slows down the process, while SRNG doesn't require a lock(replaced by atomical operation) Also, Wyrng is a faster pseudorandom algorithm than the go's.
Maybe no?
Although it takes 64kb to store the hash result, its advantage is that it only calculate for the first time. When someone is quering the cuckoofilter for many times, it's a great advantage without doubt. Because getAltIndex is used for many times, which calculates the repeat Hash value for many times, this repeat calculation can be reduced by this. And I believe that 64kb is a "small" memory block, while cuckoofilter may takes 2MB to store the hash entries. |
And Why I replace the MetroHash with XXH3 . According to the tests, MetroHash 64 has a LongNeighbor problem, while XXH3 hasn't. https://github.com/hmakholm/smhasher/blob/master/src/LongNeighborTest.md As far as i'm concerned, XXH3 is more quality than Metro. A quality Hash algorithm will have a great impact on the correctness of the cuckoofilter. |
Thanks for breaking up the evaluation! I would still prefer merging/discussing this as separate pull requests. Could you please split them up? See also comments above. |
Sorry, i misunderstand your meaning. |
github.com/seiflotfy/cuckoofilter first calculate the hash when it's in initialization.
And I try this to optimize this "better" version of cuckoofilter.
Test is all passed.
Here is the benchmark result:
Before optimization:
After optimization:
github.com/seiflotfy/cuckoofilter(The benchmark file has been replaced by the same.):