This repository has the source code for the implementation of various hash functions and schemes used in our "Can Learned Models Replace Hash Functions?" VLDB submission.
Run the following command: git clone --recurse-submodules https://github.com/DominikHorn/hashing-benchmark.git
-
To download the SOSD datasets
- run
bash download.sh
in the data folder
- run
-
To run the hash table experiments
- Change the path of SOSD datasets in file
src/support/datasets.hpp
- To build and run the hash table experiemnts, run the following command:
bash benchmark.sh
- Change the path of SOSD datasets in file
The results of the hash table experiments are stored in JSON format in "results.json", and other stats are loggged in "log_stats.out".
- To run the range query experiments
- Change the path of SOSD datasets in file
src/support/datasets.hpp
- To build and run the range query experiemnts, run the following command:
bash benchmark_range.sh
- Change the path of SOSD datasets in file
The results of the range query experiments are stored in JSON format in "results.json", and other stats are loggged in "log_stats.out".
- To run the join experiments
- Change the path of SOSD datasets in file
include/join/utils/datasets.hpp
- Change the path of
OUTPUT_FOLDER
in filescripts/evaluation/join_tuner.sh
by changing the variableoutput_folder_path
- To run the join experiments, run the following command
sh scripts/evaluation/join_tuner.sh
- Change the path of SOSD datasets in file
The results of the join experiments are stored in CSV format in the OUTPUT_FOLDER
.
-
Hash table implementation using different combinations of hashing schemes and functions:
include/chained.hpp
: chained hash table using traditional hash functionsinclude/chained_model.hpp
: chained hash table using learned hash functionsinclude/chained_exotic.hpp
: chained hash table using perfect hash functionsinclude/probe.hpp
: linear probing hash table using traditional hash functionsinclude/probe_model.hpp
: linear probing hash table using learned hash functionsinclude/probe_exotic.hpp
: linear probing hash table using perfect hash functionsinclulde/cuckoo.hpp
: cuckoo hash table using traditional hash functionsinclude/cuckoo_model.hpp
: cuckoo hash table using learned hash functionsinclude/cuckoo_exotic.hpp
: cuckoo hash table using perfect hash functions
-
Non-partitioned hash join implementation using different combinations of hashing schemes and functions:
include/join
: it hasnpj_join_runner.cpp
which provides the main implementation and other helper/configuration files
-
Optimization stuff
include/convenience/
: commonly used cpp macros (e.g.,forceinline
) and related functionalityinclude/support.hpp
: simple tape storage implementation to eliminate small allocs in hashtables
-
Testing and benchmarking driver code
src/benchmarks/
:passive_stats.hpp
: benchmark code for collecting passive stats of hash tablestemplate_tables.hpp
: benchmark code for collecting insert and probe stats of hash tablestables.hpp
: some hashtable benchmark experimentstemplate_tables_range.hpp
: benchmark code for collecting range query stats of hash tables
src/support/
: code shared by different benchmarks and tests for loading datasets and generating probe distributions
src/benchmarks.cpp
: original entry point for benchmarks targetsrc/tests.cpp
: original entry point for tests targetcleanup.py
: deduplicate and sort measurements json file
-
Building and running scripts
setup.sh
: original script to setup repo (submodule checkout, cmake configure etc)requirements.txt
: python requirementsCMakeLists.txt
: cmake target definitionsthirdparty/
: cmake dependency declarationsbuild-debug.sh
: make debug buildbuild.sh
: make production buildrun.sh
: original script to build and execute benchmark targetperf.sh
: likerun.sh
but with perf instrumentationonly_new.py
: helper script forrun.sh
, which extracts all datapoints we already measured from results.json and ensures that we only run new datapointstest.sh
: orignal script to build and execute testsbenchmark.sh
: script to run probe and insert relevant code for benchmarkingscripts/evaluation/join_tuner.sh
: script to run the join experiments
-
*results*.json
: benchmark results from internal measurements -
README.md
this file