This repository is available for reproducibility of our paper MxTasks: How to Make Efficient Synchronization and Prefetching Easy.
This is a set of micro benchmarks for a BLinkTree using different programming models like Threads and Intel Threading Building Blocks running on top of linux.
Required:
cmake
in version3.10
or higher.clang
in version10
or higher (for supportingc++
with standard17
)libnuma
(orlibnuma-dev
)tcmalloc
(for running the Masstree)
Optional:
perf
for recording performance counters (Please note: Installing perf takes some other packages)
- Use
cmake .
to generate aMakefile
- Use
make
to build the project.
make
will create the following executables:
bin/blinktree_thread_spinlock
: Benchmark for ap_thread
-based parallel B link tree using spinlocks to protect nodes.bin/blinktree_thread_rw_lock
: Benchmark for ap_thread
-based parallel B link tree using reader-writer-locks to protect nodes, allowing parallel reads but synchronizing writes.bin/blinktree_thread_olfit
: Benchmark for ap_thread
-based parallel B link tree using versioning to allow parallel reads without locks.bin/blinktree_tbb_lock
: Benchmark for a task-based parallel B link tree using Intel TBB. Nodes are protected byqueuing_mutex
.bin/blinktree_tbb_rw_lock
: Benchmark for a task-based parallel B link tree using Intel TBB. Nodes are protected byqueuing_rw_mutex
, allowing parallel reading tasks on a node.bin/blinktree_tbb_tsx_lock
: Benchmark for a task-based parallel B link tree using Intel TBB. Nodes are protected byspeculative_spin_rw_mutex
which uses transactional memory to acquire the lock. Nodes can be read parallel.bin/blinktree_tbb_no_lock
: Benchmark for a task-based parallel B link tree using Intel TBB. Nodes are not protected in the second benchmark phase.bin/blinktree_tbb_olfit
: Benchmark for a task-based parallel B link tree using Intel TBB and versioning to allow parallel reads without locks.bin/olc_btree_thread
: Benchmark for thep_threads
-based B tree using optimistic lock coupling (see on Github, read the paper).bin/bwtree_thread
: Benchmark for the open BwTree (see on Github, read the paper).bin/mass_tree_thread
: Benchmark for the Masstree (see on Github, read the paper).
We support the Yahoo! Cloud System Benchmark (YCSB).
The target make ycsb-a
will generate workloada
with random keys, equivalent for make ycsb-c
.
The workloads can be specified in workload_specification/
Use one of the binaries with -h
for help output.
Example for ./bin/blinktree_thread_spinlock -h
:
Usage: ./bin/blinktree_thread_spinlock [options] cores
Positional arguments:
cores Range of the number of cores (1 for using 1 core, 1: for using 1 up to available cores, 1:4 for using cores from 1 to 4).
Optional arguments:
-h --help show this help message and exit
-b --batch-size Number of operations launched at a time (per core).
-s --steps Steps, how number of cores is increased (1,2,4,6,.. for -s 2).
-i --iterations Number of iterations for each workload
-sco --system-core-order Use systems core order. If not, cores are ordered by node id (should be preferred).
-p --perf Use performance counter.
--print-stats Print tree statistics after every iteration.
-f --workload-files Files containing the workloads (workloads/fill workloads/mixed for example).
-o --out Name of the file to log the results.
Examples:
./bin/blinktree_thread_spinlock 1:32 -s 2 for using increasing core count 1,2,4,6,..,32
./bin/blinktree_thread_spinlock 1 for using just a single core
Running one of the executables will generate output to the console:
$ ./bin/blinktree_thread_spinlock 1:4
core configuration:
1: 0
2: 0 1
4: 0 1 2 3
workload: fill: 10000000 / mixed: insert(0) update(0) read(10000000)
1 1 0 4,826 ms 2.07211e+06 op/s
1 1 1 4,491 ms 2.22668e+06 op/s
2 1 0 2,849 ms 3.51e+06 op/s
2 1 1 2,534 ms 3.94633e+06 op/s
4 1 0 1,792 ms 5.58036e+06 op/s
4 1 1 1,520 ms 6.57895e+06 op/s
The first lines will give some statistics about the used cores, loaded workload, and configuration.
The result lines (beginning at line 9
) are specified like follows:
- The first column is the core count (starting with one core, up to four cores).
- The second column is an index of the run within the current core configuration (here: using one run for each phase and each core configuration).
- The third column is an identifier of the phase, where
0
means the fill phase and1
the workload phase. In this example, the fill phase will contain fifty million inserts and the workload phase will contain the same number of reads. - The fourth column is the time for this run in milliseconds.
- The fifth column is the throughput, calculated by the time and given workload.
- BTreeOLC: Optimistic BTree implementation for threads, available on Github
- open BwTree: General purpose, concurrent and lock-free B+-Tree index, available on Github.
- Masstree: A fast, multi-core key-value store, available on Github.
- RWSpinLock:
util/RWSpinLock
is based on the implementation of Facebook'sfolly
library, available on GitHub. - argparse: An argument parsing library, available on view on github.