Hash Join

Related PRs/Issues:

PR#52
PR#78

TODOs:

Join hash table
Single-thread join operator for INNER join type
Support of ANTI, SEMI, and OUTER join types
Multiple-threaded hash join operator
Perfect join hash table

Build Hash Table

Storage Layout

The storage of a simple hash table is divided into two parts: tuple blocks (htBlocks) and directory (htDirectory). Tuple blocks hold all materialized tuples, and the directory contains slots with pointers. Inside tuple blocks, each tuple reserves a pointer space pointing to a previous tuple within the same hash slot, in this way, we implement a chain-based collision handling. For each tuple, the storage layout is: |key|payloads|prev|. Inside the tuple payloads, we might have variable-length items, whose content will be stored separately in the overflowBlocks.

The materialization of build side tuples

Hash (based on the join key) can happen on both a flat vector and a list. For the join key, we only consider equality of nodeID for now (no join on property values, and no multiple join conditions).

Query example for join on a flat vector: (a)->(b)->(c)->(d)->(e) WHERE a.id=1 AND e.id=1. this is a bidirectional query, hash join happens on c.id. and the build side is the subquery (a)->(b)->(c). which starts from SCAN(a), then EXTEND(b) and EXTEND(c). thus, c might be a list chunk.

Query example for join on a list: (a)->(b)->(c), (b)->(d)->(e). the hash join happens on b.id, and the build side is the subquery (a)->(b)->(c), which starts from SCAN(a), then EXTEND(b). thus, b might be a flat chunk.

During materialization, there are two cases for a list chunk:

If it's the key, we would duplicate tuples in build chunks;
Else, each list vector is treated as a variable-sized value in the tuple.

Probe Hash Table

Coding Style
- Style Reference
Casting Rules
- Casting Rules
Frontend
- Parser
- Binder
- Optimizer
Processor
- Hash Join
- DataChunksIterator
Storage
Test
- Testing Framework

Provide feedback

Saved searches

Use saved searches to filter your results more quickly