The following modifications have been made to the base warpcore
library in this fork:
- Support for aggregations in
SingleValueHashTable
(SQL GROUP BY). Values in the table must be initialized by a call toinit_values()
, where the initial value must be the identity for the aggregate operation (e.g.,0
foratomicAdd()
,INT32_MAX
foratomicMin()
, etc.). Theatomic_ aggregator
functor argument has signatureatomic_aggregator(value_type* value_address, value_type value_to_aggregate)
. BloomFilter::retrieve_write()
writes out the filtered input table.- In general, the
writer
functor argument has signaturewriter(int write_index, int read_index, [HashTableValueType hash_table_value, FilterValueType filter_value])
, depending on the hash table type and retrieval type. BloomFilter::insert_if()
inserts into the bloom filter if a corresponding values passes a predicate (SQL WHERE).BloomFilter::retrieve_write_if()
andBloomFilter::retrieve_if()
are defined similarly.SingleValueHashTable::insert_if()
,SingleValueHashTable::retrieve_write()
andSingleValueHashTable::retrieve_write_if()
are also implemented.HashSet::insert_if()
,HashSet::retrieve_write()
, andHashSet::retrieve_write_if()
are implemented.
The goal is to eliminate the need to write hash table kernel operations customized to specific SQL queries. The exception is hash join pipelining, but I have found that in cases of high selectivity, the thread divergence caused by pipelining is not amortized by the lack of materialization. The underlying implementation details are unaltered.
Below you will find the original README.
NOTE: There is a bug in the test build (which is also present in the original repo).
Hashing at the speed of light on modern CUDA-accelerators
warpcore
is a framework for creating high-throughput, purpose-built hashing data structures on CUDA-accelerators.
This library provides the following data structures:
SingleValueHashTable
: stores a set of key-value pairsHashSet
: stores a set of keysCountingHashTable
: keeps track of the number of occurrences of each inserted keyBloomFilter
: pattern-blocked bloom filter for approximate membership queriesMultiValueHashTable
: stores a multi-set of key-value pairsBucketListHashTable
: alternative variant ofMultiValueHashTable
MultiBucketHashTable
: alternative variant ofMultiValueHashTable
Implementations support key types std::uint32_t
and std::uint64_t
together with any trivially copyable value type. In order to be adaptable to a wide range of possible usecases, we provide a multitude of combinable modules such as hash functions, probing schemes, and data layouts (visit the documentation for further information).
warpcore
has won the best paper award at the IEEE HiPC 2020 conference (link to manuscript)(link to preprint) and is based on our previous work on massively parallel GPU hash tables warpdrive
which has been published in the prestigious IEEE IPDPS conference (link to manuscript).
This library is still under heavy development. Users should expect breaking changes and refactoring to be common. Developement mainly takes place on our in-house GitLab instance. However, we plan to migrate to GitHub in the near future.
- CUDA-capable device with device architecture 6.0 or higher (Pascal+; see link)
-
NVIDIA CUDA toolkit/compiler version
$\ge$ v11.2 - C++14 or higher
- hpc_helpers - utils, timers, etc.
- kiss_rng - a fast and lightweight GPU PRNG
-
CUB - high-throughput primitives for GPUs (already included in newer versions of the CUDA toolkit, i.e.,
$\ge$ v10.2)
Note: Dependencies are automatically managed via CMake.
warpcore
is header only and can be incorporated manually into your project by downloading the headers and placing them into your source tree.
warpcore
is designed to make it easy to include within another CMake project.
The CMakeLists.txt
exports a warpcore
target that can be linked1 into a target to setup include directories, dependencies, and compile flags necessary to use warpcore
in your project.
We recommend using CMake Package Manager (CPM) to fetch warpcore
into your project.
With CPM, getting warpcore
is easy:
cmake_minimum_required(VERSION 3.18 FATAL_ERROR)
include(path/to/CPM.cmake)
CPMAddPackage(
NAME warpcore
GITHUB_REPOSITORY sleeepyjack/warpcore
GIT_TAG/VERSION XXXXX
)
target_link_libraries(my_target warpcore)
This will take care of downloading warpcore
from GitHub and making the headers available in a location that can be found by CMake. Linking against the warpcore
target will provide everything needed for warpcore
to be used by my_target
.
1: warpcore
is header-only and therefore there is no binary component to "link" against. The linking terminology comes from CMake's target_link_libraries
which is still used even for header-only library targets.
Since warpcore
is header-only, there is nothing to build to use it.
To build the tests, benchmarks, and examples:
cd $WARPCORE_ROOT
mkdir -p build
cd build
cmake .. -DWARPCORE_BUILD_TESTS=ON -DDWARPCORE_BUILD_BENCHMARKS=ON -DDWARPCORE_BUILD_EXAMPLES=ON
make
Binaries will be built into:
build/tests/
build/benchmarks/
build/examples/
Take a look at the examples, test your own system performance using the benchmark suite and be sure everything works as expected by running the test suite.
BibTeX:
@inproceedings{DBLP:conf/hipc/JungerKM0XLS20,
author = {Daniel J{\"{u}}nger and
Robin Kobus and
Andr{\'{e}} M{\"{u}}ller and
Christian Hundt and
Kai Xu and
Weiguo Liu and
Bertil Schmidt},
title = {WarpCore: {A} Library for fast Hash Tables on GPUs},
booktitle = {27th {IEEE} International Conference on High Performance Computing,
Data, and Analytics, HiPC 2020, Pune, India, December 16-19, 2020},
pages = {11--20},
publisher = {{IEEE}},
year = {2020},
url = {https://doi.org/10.1109/HiPC50609.2020.00015},
doi = {10.1109/HiPC50609.2020.00015},
timestamp = {Wed, 05 May 2021 09:45:30 +0200},
biburl = {https://dblp.org/rec/conf/hipc/JungerKM0XLS20.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{DBLP:conf/ipps/Junger0S18,
author = {Daniel J{\"{u}}nger and
Christian Hundt and
Bertil Schmidt},
title = {WarpDrive: Massively Parallel Hashing on Multi-GPU Nodes},
booktitle = {2018 {IEEE} International Parallel and Distributed Processing Symposium,
{IPDPS} 2018, Vancouver, BC, Canada, May 21-25, 2018},
pages = {441--450},
publisher = {{IEEE} Computer Society},
year = {2018},
url = {https://doi.org/10.1109/IPDPS.2018.00054},
doi = {10.1109/IPDPS.2018.00054},
timestamp = {Sat, 19 Oct 2019 20:31:38 +0200},
biburl = {https://dblp.org/rec/conf/ipps/Junger0S18.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
warpcore Copyright (C) 2018-2021 Daniel Jünger
This program comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. See the file LICENSE for details.