Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Sieve algorithm #220

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kurtextrem
Copy link

As I've prepared the code based on the LRU data structure from mnemonist already for my own purposes, I figured I'd open a draft PR so anyone can take a look or do further improvements/iterations on the code (or tests with it).

As written in #212, in my benchmark, the Map() implementation comes out on top (with strings as keys).

@Yomguithereal
Copy link
Owner

Thanks @kurtextrem. I will merge your PR as soon as I start working on the SIEVE cache. It's just that I am swamped currently.

@kurtextrem
Copy link
Author

no worries, take your time.

@NeoPhi
Copy link

NeoPhi commented Feb 17, 2024

@kurtextrem I'm probably doing something wrong in my benchmark. Running this implementation of SieveMap I'm seeing much worse hit ratio compared to your js-sieve implementation, my own simple Sieve implementation, or mnemonist/lru-map. Output of hit % on a sample K/V trace.

Sieve 94.69188501697717%
js-sieve 94.69199169100739%
SieveMap 86.2063926088917%
LRUMap 94.2229459801187%

Sample code for the above can be found here: https://github.com/NeoPhi/cache-playground

@kurtextrem
Copy link
Author

@NeoPhi Cool stuff, thank you for doing this!
From looking at your implementation, it looks like you're not using a doubly-linked list. From reading the Sieve paper I had the impression it's a must, but apparently it isn't? This will definitely also be much faster in execution than my js-sieve implementation (at least as long as hash collisions in the map stay unexpensive to resolve).

The differences in hit ratio are surprising to me, but I can't really tell you what's off. I took the example code referenced by the sieve paper and fixed some JS errors (#212 (comment)).

@NeoPhi
Copy link

NeoPhi commented Feb 18, 2024

I'm relying heavily on the iterator semantics of the JS Map. which has edge cases compared to a proper doubly linked list implementation, hence the small discrepancy in hit rates and why your implementation is slightly higher. (There was a bug in my hand implementation, hit count is identical).

I think the proposed code changes linked to from #212 (comment) are buggy. I tried a simple use case and it looks like the internal data structures are getting corrupted:

import SieveMap from "../mnemonist/sieve-map.js";
const test = new SieveMap(3);
test.set('A', 'A');
test.set('B', 'B');
test.set('C', 'C');
console.log(Array.from(test.entries()));
// Output
// [ [ 'C', 'C' ], [ undefined, undefined ], [ undefined, undefined ] ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants