Skip to content

Commit

Permalink
DistinctKeyVisitor and LogicalPlanDistinctKeys logical operators
Browse files Browse the repository at this point in the history
  • Loading branch information
jaceklaskowski committed Mar 23, 2024
1 parent c071c93 commit 13c984b
Show file tree
Hide file tree
Showing 5 changed files with 107 additions and 1 deletion.
49 changes: 49 additions & 0 deletions docs/DistinctKeyVisitor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# DistinctKeyVisitor

`DistinctKeyVisitor` is a [LogicalPlanVisitor](cost-based-optimization/LogicalPlanVisitor.md) (of [Expression](expressions/Expression.md)s).

`DistinctKeyVisitor` is used when a [logical operator](logical-operators/LogicalPlanDistinctKeys.md) is requested for the [distinct keys](#distinctKeys) with [spark.sql.optimizer.propagateDistinctKeys.enabled](configuration-properties.md#spark.sql.optimizer.propagateDistinctKeys.enabled) configuration property enabled.

??? note "Singleton Object"
`DistinctKeyVisitor` is a Scala **object** which is a class that has exactly one instance. It is created lazily when it is referenced, like a lazy val.

Learn more in [Tour of Scala](https://docs.scala-lang.org/tour/singleton-objects.html).

## visitAggregate { #visitAggregate }

??? note "LogicalPlanVisitor"

```scala
visitAggregate(
p: Aggregate): Set[ExpressionSet]
```

`visitAggregate` is part of the [LogicalPlanVisitor](cost-based-optimization/LogicalPlanVisitor.md#visitAggregate) abstraction.

`visitAggregate`...FIXME

## visitJoin { #visitJoin }

??? note "LogicalPlanVisitor"

```scala
visitJoin(
p: Join): Set[ExpressionSet]
```

`visitJoin` is part of the [LogicalPlanVisitor](cost-based-optimization/LogicalPlanVisitor.md#visitJoin) abstraction.

`visitJoin`...FIXME

## visitOffset { #visitOffset }

??? note "LogicalPlanVisitor"

```scala
visitOffset(
p: Offset): Set[ExpressionSet]
```

`visitOffset` is part of the [LogicalPlanVisitor](cost-based-optimization/LogicalPlanVisitor.md#visitOffset) abstraction.

`visitOffset`...FIXME
4 changes: 4 additions & 0 deletions docs/SQLConf.md
Original file line number Diff line number Diff line change
Expand Up @@ -926,6 +926,10 @@ Used when:

[spark.sql.legacy.ctePrecedencePolicy](configuration-properties.md#spark.sql.legacy.ctePrecedencePolicy)

## PROPAGATE_DISTINCT_KEYS_ENABLED { #PROPAGATE_DISTINCT_KEYS_ENABLED }

[spark.sql.optimizer.propagateDistinctKeys.enabled](configuration-properties.md#spark.sql.optimizer.propagateDistinctKeys.enabled)

## <span id="replaceDatabricksSparkAvroEnabled"><span id="LEGACY_REPLACE_DATABRICKS_SPARK_AVRO_ENABLED"> replaceDatabricksSparkAvroEnabled

[spark.sql.legacy.replaceDatabricksSparkAvro.enabled](configuration-properties.md#spark.sql.legacy.replaceDatabricksSparkAvro.enabled)
Expand Down
20 changes: 19 additions & 1 deletion docs/configuration-properties.md
Original file line number Diff line number Diff line change
Expand Up @@ -833,11 +833,29 @@ Default: `TRACE`

Default: `(undefined)`

### propagateDistinctKeys.enabled { #spark.sql.optimizer.propagateDistinctKeys.enabled }

**spark.sql.optimizer.propagateDistinctKeys.enabled**

**(internal)** Controls whether the [Logical Query Optimizer](catalyst/Optimizer.md) propagates the distinct attributes of logical operators for query optimization

Default: `true`

Used when:

* `LogicalPlanDistinctKeys` logical operator is requested for the [distinct keys](logical-operators/LogicalPlanDistinctKeys.md#distinctKeys)

### replaceExceptWithFilter { #spark.sql.optimizer.replaceExceptWithFilter }

**spark.sql.optimizer.replaceExceptWithFilter**

**(internal)** When `true`, the apply function of the rule verifies whether the right node of the except operation is of type Filter or Project followed by Filter. If yes, the rule further verifies 1) Excluding the filter operations from the right (as well as the left node, if any) on the top, whether both the nodes evaluates to a same result. 2) The left and right nodes don't contain any SubqueryExpressions. 3) The output column names of the left node are distinct. If all the conditions are met, the rule will replace the except operation with a Filter by flipping the filter condition(s) of the right node.
**(internal)** When `true`, the `apply` function of the rule verifies whether the right node of the `except` operation is of type `Filter` or `Project` followed by `Filter`. If so, the rule further verifies the following conditions:

1. Excluding the filter operations from the right (as well as the left node, if any) on the top, whether both the nodes evaluates to a same result
1. The left and right nodes don't contain any [SubqueryExpression](expressions/SubqueryExpression.md)s
1. The output column names of the left node are distinct

If all the conditions are met, the rule will replace the `except` operation with a `Filter` by flipping the filter condition(s) of the right node.

Default: `true`

Expand Down
34 changes: 34 additions & 0 deletions docs/logical-operators/LogicalPlanDistinctKeys.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
title: LogicalPlanDistinctKeys
---

# LogicalPlanDistinctKeys Logical Operators

`LogicalPlanDistinctKeys` is an extension of the [LogicalPlan](LogicalPlan.md) abstraction for logical operators that know their [distinct keys](#distinctKeys).

All [logical operators](LogicalPlan.md) are `LogicalPlanDistinctKeys`.

## Distinct Keys { #distinctKeys }

```scala
distinctKeys: Set[ExpressionSet]
```

??? note "Lazy Value"
`distinctKeys` is a Scala **lazy value** to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.

Learn more in the [Scala Language Specification]({{ scala.spec }}/05-classes-and-objects.html#lazy).

`distinctKeys` uses [DistinctKeyVisitor](../DistinctKeyVisitor.md) to [determine the distinct keys](../cost-based-optimization/LogicalPlanVisitor.md#visit) of this [logical operator](LogicalPlan.md) when [spark.sql.optimizer.propagateDistinctKeys.enabled](../configuration-properties.md#spark.sql.optimizer.propagateDistinctKeys.enabled) configuration property is enabled.

Otherwise, `distinctKeys` is always empty.

---

`distinctKeys` is used when:

* `EliminateOuterJoin` logical optimization is executed
* `EliminateDistinct` logical optimization is executed
* `RemoveRedundantAggregates` logical optimization is executed
* `JoinEstimation` is requested to [estimateInnerOuterJoin](../cost-based-optimization/JoinEstimation.md#estimateInnerOuterJoin)
* `SizeInBytesOnlyStatsPlanVisitor` is requested to [visitJoin](../cost-based-optimization/SizeInBytesOnlyStatsPlanVisitor.md#visitJoin)
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -714,6 +714,7 @@ nav:
- ... | demo/**.md
- Misc:
- AggregatingAccumulator: AggregatingAccumulator.md
- DistinctKeyVisitor: DistinctKeyVisitor.md
- JoinSelectionHelper.md
- PushDownUtils: PushDownUtils.md
- UnsafeExternalRowSorter: UnsafeExternalRowSorter.md
Expand Down

0 comments on commit 13c984b

Please sign in to comment.