Skip to content

Commit

Permalink
Storage-Partitioned Joins and KeyGroupedPartitioning
Browse files Browse the repository at this point in the history
  • Loading branch information
jaceklaskowski committed May 26, 2024
1 parent 6409885 commit 0a990a4
Show file tree
Hide file tree
Showing 5 changed files with 36 additions and 4 deletions.
15 changes: 15 additions & 0 deletions docs/connector/KeyGroupedPartitioning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# KeyGroupedPartitioning

`KeyGroupedPartitioning` is a [Partitioning](Partitioning.md) where rows are split across partitions based on the [partition transform expressions](#keys).

`KeyGroupedPartitioning` is a key part of [Storage-Partitioned Joins](../storage-partitioned-joins/index.md).

!!! note
Not used in any of the [built-in Spark SQL connectors](../connectors/index.md) yet.

## Creating Instance

`KeyGroupedPartitioning` takes the following to be created:

* <span id="keys"> Partition transform [expression](../expressions/Expression.md)s
* <span id="numPartitions"> Number of partitions
8 changes: 4 additions & 4 deletions docs/connector/Partitioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@ title: Partitioning

# Partitioning

`Partitioning` is an [abstraction](#contract) of [output data partitioning requirements](#implementations) (_data distribution_) of a Spark SQL connector.
`Partitioning` is an [abstraction](#contract) of [output data partitioning requirements](#implementations) (_data distribution_) of a [Spark SQL connector](index.md).

!!! note
This `Partitioning` interface for Spark SQL developers mimics the internal Catalyst [Partitioning](../physical-operators/Partitioning.md) that is converted into with the help of [DataSourcePartitioning](../physical-operators/Partitioning.md#DataSourcePartitioning).

## Contract

### <span id="numPartitions"> Number of Partitions
### Number of Partitions { #numPartitions }

```java
int numPartitions()
Expand All @@ -21,7 +21,7 @@ Used when:

* [DataSourcePartitioning](../physical-operators/Partitioning.md#DataSourcePartitioning) is requested for the [number of partitions](../physical-operators/Partitioning.md#numPartitions)

### <span id="satisfy"> Satisfying Distribution
### Satisfying Distribution { #satisfy }

```java
boolean satisfy(
Expand All @@ -34,5 +34,5 @@ Used when:

## Implementations

* `KeyGroupedPartitioning`
* [KeyGroupedPartitioning](KeyGroupedPartitioning.md)
* `UnknownPartitioning`
4 changes: 4 additions & 0 deletions docs/storage-partitioned-joins/.pages
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
title: Storage-Partitioned Joins
nav:
- index.md
- ...
12 changes: 12 additions & 0 deletions docs/storage-partitioned-joins/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Storage-Partitioned Joins

**Storage-Partitioned Joins** (_SPJ_) are a new type of [join](../joins.md) in Spark SQL that use the existing storage layout for a partitioned join to avoid expensive shuffles (similarly to [Bucketing](../bucketing/index.md)).

!!! note
Storage-Partitioned Joins feature was added in Apache Spark 3.3.0 ([\[SPARK-37375\] Umbrella: Storage Partitioned Join (SPJ)]({{ spark.jira }}/SPARK-37375)).

Storage-Partitioned Join is meant mainly, if not exclusively, for [Spark SQL connectors](../connector/index.md) (_v2 data sources_).

Storage-Partitioned Join was proposed in this [SPIP](https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE).

Storage-Partitioned Join uses [KeyGroupedPartitioning](../connector/KeyGroupedPartitioning.md) to determine partitions.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,7 @@ nav:
- ... | bloom-filter-join/**.md
- ... | bucketing/**.md
- ... | cache-serialization/**.md
- ... | storage-partitioned-joins/**.md
- Catalog Plugin API:
- connector/catalog/index.md
- CatalogExtension: connector/catalog/CatalogExtension.md
Expand Down

0 comments on commit 0a990a4

Please sign in to comment.