-
Notifications
You must be signed in to change notification settings - Fork 102
Hash Join
The storage of a simple hash table is divided into two parts: tuple blocks (htBlocks
) and directory (htDirectory
).
Tuple blocks hold all materialized tuples, and the directory contains slots with pointers. Inside tuple blocks, each tuple reserves a pointer space pointing to a previous tuple within the same hash slot, in this way, we implement a chain-based collision handling.
For each tuple, the storage layout is: |key|payloads|prev|
.
Inside the tuple payloads, we might have variable-length items, whose content will be stored separately in the overflowBlocks
.
Hash (based on the join key) can happen on both a flat vector and a list. For the join key, we only consider equality of nodeID for now (no join on property values, and no multiple join conditions).
Query example for join on a flat vector: (a)->(b)->(c)->(d)->(e) WHERE a.id=1 AND e.id=1
. this is a bidirectional query, hash join happens on c.id. and the build side is the subquery (a)->(b)->(c)
. which starts from SCAN(a)
, then EXTEND(b)
and EXTEND(c)
. thus, c might be a list chunk.
Query example for join on a list: (a)->(b)->(c), (b)->(d)->(e)
. the hash join happens on b.id
, and the build side is the subquery (a)->(b)->(c)
, which starts from SCAN(a)
, then EXTEND(b)
. thus, b might be a flat chunk.
During materialization, there are two cases for a list chunk:
- If it's the key, we would duplicate tuples in build chunks;
- Else, each list vector is treated as a variable-sized value in the tuple.
Construction of the output dataChunks:
- probe side non-key data chunk stay unchanged. (flat will be flat, unflat will be unflat)
- build side key data chunk and probe side key data chunk are merged. (flat/unflat depends on build side non-key data chunks)
- build side non-key data chunk: if flat, merge into key; else, keep as separate data chunks
- if there is no unflat non-key data chunk from build side, we always keep key data chunk as unflat, because all variables/data chunks from build side are merged into the key data chunk
- Coding Style
- Casting Rules
- Frontend
- Processor
- Storage
- Test