diff --git a/docs/DistinctKeyVisitor.md b/docs/DistinctKeyVisitor.md new file mode 100644 index 00000000000..0e3247bea03 --- /dev/null +++ b/docs/DistinctKeyVisitor.md @@ -0,0 +1,49 @@ +# DistinctKeyVisitor + +`DistinctKeyVisitor` is a [LogicalPlanVisitor](cost-based-optimization/LogicalPlanVisitor.md) (of [Expression](expressions/Expression.md)s). + +`DistinctKeyVisitor` is used when a [logical operator](logical-operators/LogicalPlanDistinctKeys.md) is requested for the [distinct keys](#distinctKeys) with [spark.sql.optimizer.propagateDistinctKeys.enabled](configuration-properties.md#spark.sql.optimizer.propagateDistinctKeys.enabled) configuration property enabled. + +??? note "Singleton Object" + `DistinctKeyVisitor` is a Scala **object** which is a class that has exactly one instance. It is created lazily when it is referenced, like a lazy val. + + Learn more in [Tour of Scala](https://docs.scala-lang.org/tour/singleton-objects.html). + +## visitAggregate { #visitAggregate } + +??? note "LogicalPlanVisitor" + + ```scala + visitAggregate( + p: Aggregate): Set[ExpressionSet] + ``` + + `visitAggregate` is part of the [LogicalPlanVisitor](cost-based-optimization/LogicalPlanVisitor.md#visitAggregate) abstraction. + +`visitAggregate`...FIXME + +## visitJoin { #visitJoin } + +??? note "LogicalPlanVisitor" + + ```scala + visitJoin( + p: Join): Set[ExpressionSet] + ``` + + `visitJoin` is part of the [LogicalPlanVisitor](cost-based-optimization/LogicalPlanVisitor.md#visitJoin) abstraction. + +`visitJoin`...FIXME + +## visitOffset { #visitOffset } + +??? note "LogicalPlanVisitor" + + ```scala + visitOffset( + p: Offset): Set[ExpressionSet] + ``` + + `visitOffset` is part of the [LogicalPlanVisitor](cost-based-optimization/LogicalPlanVisitor.md#visitOffset) abstraction. + +`visitOffset`...FIXME diff --git a/docs/SQLConf.md b/docs/SQLConf.md index 7997bba95da..42a9297cbbf 100644 --- a/docs/SQLConf.md +++ b/docs/SQLConf.md @@ -926,6 +926,10 @@ Used when: [spark.sql.legacy.ctePrecedencePolicy](configuration-properties.md#spark.sql.legacy.ctePrecedencePolicy) +## PROPAGATE_DISTINCT_KEYS_ENABLED { #PROPAGATE_DISTINCT_KEYS_ENABLED } + +[spark.sql.optimizer.propagateDistinctKeys.enabled](configuration-properties.md#spark.sql.optimizer.propagateDistinctKeys.enabled) + ## replaceDatabricksSparkAvroEnabled [spark.sql.legacy.replaceDatabricksSparkAvro.enabled](configuration-properties.md#spark.sql.legacy.replaceDatabricksSparkAvro.enabled) diff --git a/docs/configuration-properties.md b/docs/configuration-properties.md index b651e18cded..b46b52b6b2f 100644 --- a/docs/configuration-properties.md +++ b/docs/configuration-properties.md @@ -833,11 +833,29 @@ Default: `TRACE` Default: `(undefined)` +### propagateDistinctKeys.enabled { #spark.sql.optimizer.propagateDistinctKeys.enabled } + +**spark.sql.optimizer.propagateDistinctKeys.enabled** + +**(internal)** Controls whether the [Logical Query Optimizer](catalyst/Optimizer.md) propagates the distinct attributes of logical operators for query optimization + +Default: `true` + +Used when: + +* `LogicalPlanDistinctKeys` logical operator is requested for the [distinct keys](logical-operators/LogicalPlanDistinctKeys.md#distinctKeys) + ### replaceExceptWithFilter { #spark.sql.optimizer.replaceExceptWithFilter } **spark.sql.optimizer.replaceExceptWithFilter** -**(internal)** When `true`, the apply function of the rule verifies whether the right node of the except operation is of type Filter or Project followed by Filter. If yes, the rule further verifies 1) Excluding the filter operations from the right (as well as the left node, if any) on the top, whether both the nodes evaluates to a same result. 2) The left and right nodes don't contain any SubqueryExpressions. 3) The output column names of the left node are distinct. If all the conditions are met, the rule will replace the except operation with a Filter by flipping the filter condition(s) of the right node. +**(internal)** When `true`, the `apply` function of the rule verifies whether the right node of the `except` operation is of type `Filter` or `Project` followed by `Filter`. If so, the rule further verifies the following conditions: + +1. Excluding the filter operations from the right (as well as the left node, if any) on the top, whether both the nodes evaluates to a same result +1. The left and right nodes don't contain any [SubqueryExpression](expressions/SubqueryExpression.md)s +1. The output column names of the left node are distinct + +If all the conditions are met, the rule will replace the `except` operation with a `Filter` by flipping the filter condition(s) of the right node. Default: `true` diff --git a/docs/logical-operators/LogicalPlanDistinctKeys.md b/docs/logical-operators/LogicalPlanDistinctKeys.md new file mode 100644 index 00000000000..2de0acec1b2 --- /dev/null +++ b/docs/logical-operators/LogicalPlanDistinctKeys.md @@ -0,0 +1,34 @@ +--- +title: LogicalPlanDistinctKeys +--- + +# LogicalPlanDistinctKeys Logical Operators + +`LogicalPlanDistinctKeys` is an extension of the [LogicalPlan](LogicalPlan.md) abstraction for logical operators that know their [distinct keys](#distinctKeys). + +All [logical operators](LogicalPlan.md) are `LogicalPlanDistinctKeys`. + +## Distinct Keys { #distinctKeys } + +```scala +distinctKeys: Set[ExpressionSet] +``` + +??? note "Lazy Value" + `distinctKeys` is a Scala **lazy value** to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards. + + Learn more in the [Scala Language Specification]({{ scala.spec }}/05-classes-and-objects.html#lazy). + +`distinctKeys` uses [DistinctKeyVisitor](../DistinctKeyVisitor.md) to [determine the distinct keys](../cost-based-optimization/LogicalPlanVisitor.md#visit) of this [logical operator](LogicalPlan.md) when [spark.sql.optimizer.propagateDistinctKeys.enabled](../configuration-properties.md#spark.sql.optimizer.propagateDistinctKeys.enabled) configuration property is enabled. + +Otherwise, `distinctKeys` is always empty. + +--- + +`distinctKeys` is used when: + +* `EliminateOuterJoin` logical optimization is executed +* `EliminateDistinct` logical optimization is executed +* `RemoveRedundantAggregates` logical optimization is executed +* `JoinEstimation` is requested to [estimateInnerOuterJoin](../cost-based-optimization/JoinEstimation.md#estimateInnerOuterJoin) +* `SizeInBytesOnlyStatsPlanVisitor` is requested to [visitJoin](../cost-based-optimization/SizeInBytesOnlyStatsPlanVisitor.md#visitJoin) diff --git a/mkdocs.yml b/mkdocs.yml index 701e0bfa9da..92c310e06ca 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -714,6 +714,7 @@ nav: - ... | demo/**.md - Misc: - AggregatingAccumulator: AggregatingAccumulator.md + - DistinctKeyVisitor: DistinctKeyVisitor.md - JoinSelectionHelper.md - PushDownUtils: PushDownUtils.md - UnsafeExternalRowSorter: UnsafeExternalRowSorter.md