Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add other binsizes into hashtable than 1 bit and 8 bits. #51

Open
ctb opened this issue Apr 20, 2013 · 5 comments
Open

Add other binsizes into hashtable than 1 bit and 8 bits. #51

ctb opened this issue Apr 20, 2013 · 5 comments

Comments

@ctb
Copy link
Member

ctb commented Apr 20, 2013

Right now, Hashbits supports 1 bit per hashtable entry and CountingHash supports 8 bits per hashtable entry. 2 and 4 bits/entry would be great extensions that could enable substantially decreased memory usage in some useful circumstances; not sure if its worth doing it for arbitrary numbers of bits, or if there would be performance penalties.

@emcd
Copy link
Contributor

emcd commented Apr 20, 2013

We can exploit some efficiencies in the 1-, 2-, 4-, and 8-bit cases because those uniformly "tile" a byte. The other cases do not uniformly tile a byte (e.g., the first 3-bit value appears at bit 0 of byte 0, the second 3-bit value appears at bit 3 of byte 0, the third at bit 6 of byte 0, but the fourth appears at bit 1 (and not bit 0) of byte 1). Bitmasks are thus position-dependent in the latter cases, whereas they are either position-independent or unneeded in the former cases.

If one wanted a generalized solution (for n-bit counters, where n >= 1), then I would probably create a template with the generalized bitmask calculation and then create specializations of the template to selectively override this in the cases where we know we can do better.

@emcd
Copy link
Contributor

emcd commented Apr 22, 2013

Related to issue #27. As I see it, the choice of hash functions is independent of counter memory model. We may wish to split the hash table design into the hashing method used and accesses to the counter memory, since those are essentially orthogonal operations. Splitting the read parsers into data pumps and parsers helped us; I believe that a similar split for hashing and counters, as proposed above, would yield benefits as well.

@ctb
Copy link
Member Author

ctb commented Apr 22, 2013

Agree


C. Titus Brown, [email protected]

On Apr 22, 2013, at 18:55, Eric McDonald [email protected] wrote:

Related to issue #27. As I see it, the choice of hash functions is independent of counter memory model. We may wish to split the hash table design into the hashing method used and accesses to the counter memory, since those are essentially orthogonal operations. Splitting the read parsers into data pumps and parsers helped us; I believe that a similar split for hashing and counters, as proposed above, would yield benefits as well.


Reply to this email directly or view it on GitHub.

@ctb
Copy link
Member Author

ctb commented Jun 12, 2015

#1035 proposes using the MiniFloat standard to do this.

@betatim
Copy link
Member

betatim commented Dec 13, 2016

#1551 implements a storage class with 4bits per entry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants