-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Hashing #33
Optimize Hashing #33
Conversation
Hey @BrennenMM7! This looks great, thanks for submitting it! I'm going to do some testing myself soon and merge this in. |
Sounds good, to note this will obviously effect anyone's current DB files going forward as this would be a breaking change. Since the computed hash is different they'll never match unless, the DB is opened and rehashed to the new mechanism. |
Yeah I'm wondering if there is an easy way to do that. Try to read the metadata page as an FNV-1a hash, and if it doesn't work try the old one. Then always write FNV-1a metadata, so databases will be migrated automatically. What do you think? |
That is definitely an option, could also split the versioning of the crate to 0.1.x vs 0.2.x keeping the new hash as the latest version, and the old for other who wish to not migrate. Another option, or additionally provide a helper function/toolkit to migrate a DB from one version to the other. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #33 +/- ##
==========================================
- Coverage 94.22% 93.67% -0.56%
==========================================
Files 21 21
Lines 3778 3826 +48
==========================================
+ Hits 3560 3584 +24
- Misses 218 242 +24
☔ View full report in Codecov by Sentry. |
@BrennenMM7 thanks again! I ended up making it backwards compatible for now. Will probably remove that eventually but for now it's not too bad! |
Hello 👋 ,
Stumbled across jammdb as I was in middle of creating my own Rust rendition of BoltDB, deciding to fork/contrib here going forward as this project looks like a great foundation to continue forward.
This PR is to optimize the hashing packages utilized on meta pages for better read/write performance from the db. Utilizing some observability tactics, I found the longest function stalling operations was the hash_self function in meta.rs. This is due to the fact that every time a new TX is created hashing must occur of the meta pages, and the following package "sha3::{Digest, Sha3_256}" creates a deep call stack.
BoltDB implements a FNV-1a hash for its meta pages, which is significantly faster to calculate than utilizing a secure Sha3_256.
I also bumped the crates to newer versions, but no real perf gain was seen from this, fine with reversion if needed.
Looking forward to more contribs 👍
Benchmarking ( Each Get/Set is a new TX, No batching, No concurrency)
Before:
Time taken to set 100000 keys: 179.91754808s
Time taken to get 100000 keys: 155.249245ms
After:
Time taken to set 100000 keys: 151.29345334s
Time taken to get 100000 keys: 44.5641ms