Add other binsizes into hashtable than 1 bit and 8 bits. #51

ctb · 2013-04-20T17:51:12Z

Right now, Hashbits supports 1 bit per hashtable entry and CountingHash supports 8 bits per hashtable entry. 2 and 4 bits/entry would be great extensions that could enable substantially decreased memory usage in some useful circumstances; not sure if its worth doing it for arbitrary numbers of bits, or if there would be performance penalties.

emcd · 2013-04-20T19:30:03Z

We can exploit some efficiencies in the 1-, 2-, 4-, and 8-bit cases because those uniformly "tile" a byte. The other cases do not uniformly tile a byte (e.g., the first 3-bit value appears at bit 0 of byte 0, the second 3-bit value appears at bit 3 of byte 0, the third at bit 6 of byte 0, but the fourth appears at bit 1 (and not bit 0) of byte 1). Bitmasks are thus position-dependent in the latter cases, whereas they are either position-independent or unneeded in the former cases.

If one wanted a generalized solution (for n-bit counters, where n >= 1), then I would probably create a template with the generalized bitmask calculation and then create specializations of the template to selectively override this in the cases where we know we can do better.

emcd · 2013-04-22T22:55:57Z

Related to issue #27. As I see it, the choice of hash functions is independent of counter memory model. We may wish to split the hash table design into the hashing method used and accesses to the counter memory, since those are essentially orthogonal operations. Splitting the read parsers into data pumps and parsers helped us; I believe that a similar split for hashing and counters, as proposed above, would yield benefits as well.

ctb · 2013-04-22T23:23:15Z

Agree

C. Titus Brown, [email protected]

On Apr 22, 2013, at 18:55, Eric McDonald [email protected] wrote:

Related to issue #27. As I see it, the choice of hash functions is independent of counter memory model. We may wish to split the hash table design into the hashing method used and accesses to the counter memory, since those are essentially orthogonal operations. Splitting the read parsers into data pumps and parsers helped us; I believe that a similar split for hashing and counters, as proposed above, would yield benefits as well.

—
Reply to this email directly or view it on GitHub.

ctb · 2015-06-12T19:57:51Z

#1035 proposes using the MiniFloat standard to do this.

betatim · 2016-12-13T14:26:19Z

#1551 implements a storage class with 4bits per entry

ctb mentioned this issue Aug 6, 2015

Meta issue for technical upgrades in 2.0+ #1230

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add other binsizes into hashtable than 1 bit and 8 bits. #51

Add other binsizes into hashtable than 1 bit and 8 bits. #51

ctb commented Apr 20, 2013

emcd commented Apr 20, 2013

emcd commented Apr 22, 2013

ctb commented Apr 22, 2013

ctb commented Jun 12, 2015

betatim commented Dec 13, 2016

Add other binsizes into hashtable than 1 bit and 8 bits. #51

Add other binsizes into hashtable than 1 bit and 8 bits. #51

Comments

ctb commented Apr 20, 2013

emcd commented Apr 20, 2013

emcd commented Apr 22, 2013

ctb commented Apr 22, 2013

ctb commented Jun 12, 2015

betatim commented Dec 13, 2016