Skip to content

Latest commit

 

History

History
25 lines (18 loc) · 2.32 KB

ReadMe.md

File metadata and controls

25 lines (18 loc) · 2.32 KB

PerfectPrecomputedHashtable.py Unlicensed work

wheel (GHA via nightly.link) GitHub Actions Libraries.io Status Code style: antiflash

A framework to create scripts to precompute a near-perfect hashtable of small number of objects that can be stored and used in future.

Why not a CLI tool? Because usually a hashtable needs to be matched against objects. So you would still need a bit of own logic.

The hashtable has the following construction:

  1. objects are converted into a byte strings
  2. a tweakable hash function needing a nonce is applied to objects to get 4-byte values. The function should map all the strings into unique 4-byte integers. We currently use blake2s because it is in python standard library.
  3. the 4 byte valuea are converted into 2-byte control value
  4. the 4 byte values are reduced into a single byte value using one of the reducer finctions.
  5. the single byte values are deoffsetted by subtracting the minimum value across the whole dataset.
  6. the single byte values are passed through a perfect hash generator (perfection) to remap them into a nearly-continious range of indices.
  7. the indices are used to populate an array.

pow dir contains C++ program brute-forcing nonces (so it is kinda a proof-of-work) in order to minimize the "span" (the distance between maximum and minimum reduced hash) of reduced values (only the nonces mapping keys to distinct values are considered). strings.cpp should be populated with the byte strings.

After the optimal nonce and reducer have been chosen, it can be used to produce a python source file with the hashtable and the lookup function.