diff --git a/docs/expressions/Generator.md b/docs/expressions/Generator.md index 26f6a360948..f6e83d0a6d8 100644 --- a/docs/expressions/Generator.md +++ b/docs/expressions/Generator.md @@ -4,59 +4,76 @@ title: Generator # Generator Expressions -`Generator` is a <> for [Catalyst expressions](Expression.md) that can <> zero or more rows given a single input row. +`Generator` is an [extension](#contract) of the [Expression](Expression.md) abstraction for [generator expressions](#implementations) that [can produce multiple rows for a single input row](#eval). -NOTE: `Generator` corresponds to SQL's sql/AstBuilder.md#withGenerate[LATERAL VIEW]. +The execution of `Generator` is managed by [GenerateExec](../physical-operators/GenerateExec.md) unary physical operator. -[[dataType]] -`dataType` in `Generator` is simply an [ArrayType](../types/ArrayType.md) of <>. +!!! note + `Generator` corresponds to [LATERAL VIEW](../sql/AstBuilder.md#withGenerate) in SQL. -[[foldable]] -[[nullable]] -`Generator` is not Expression.md#foldable[foldable] and not Expression.md#nullable[nullable] by default. +## Contract (Subset) -[[supportCodegen]] -`Generator` supports [Java code generation](../whole-stage-code-generation/index.md) conditionally, i.e. only when a physical operator is not marked as [CodegenFallback](Expression.md#CodegenFallback). +### Interpreted Expression Evaluation { #eval } -[[terminate]] -`Generator` uses `terminate` to inform that there are no more rows to process, clean up code, and additional rows can be made here. +```scala +eval( + input: InternalRow): TraversableOnce[InternalRow] +``` -[source, scala] ----- -terminate(): TraversableOnce[InternalRow] = Nil ----- +Evaluates the given [InternalRow](../InternalRow.md) to produce zero, one or more [InternalRow](../InternalRow.md)s -[[generator-implementations]] -.Generators -[width="100%",cols="1,2",options="header"] -|=== -| Name -| Description +!!! note "Return Type" + `eval` is part of the [Expression](Expression.md#eval) abstraction and this `eval` enforces that `Generator`s produce a collection of [InternalRow](../InternalRow.md)s (not any other value as by non-generator expressions). + +## Implementations + +* `CollectionGenerator` +* `GeneratorOuter` +* `HiveGenericUDTF` +* `JsonTuple` +* `ReplicateRows` +* `SQLKeywords` +* `Stack` +* `UnevaluableGenerator` +* [UnresolvedGenerator](UnresolvedGenerator.md) +* `UserDefinedGenerator` -| [[ExplodeBase]] spark-sql-Expression-ExplodeBase.md[ExplodeBase] -| +## Data Type { #dataType } -| [[Explode]] spark-sql-Expression-ExplodeBase.md#Explode[Explode] -| +??? note "Expression" -| [[GeneratorOuter]] `GeneratorOuter` -| + ```scala + dataType: DataType + ``` -| [[HiveGenericUDTF]] `HiveGenericUDTF` -| + `dataType` is part of the [Expression](Expression.md#dataType) abstraction. -| [[Inline]] spark-sql-Expression-Inline.md[Inline] -| Corresponds to `inline` and `inline_outer` functions. +`dataType` is an [ArrayType](../types/ArrayType.md) of the [elementSchema](#elementSchema). -| JsonTuple -| +## supportCodegen { #supportCodegen } -| [[PosExplode]] spark-sql-Expression-ExplodeBase.md#PosExplode[PosExplode] -| +```scala +supportCodegen: Boolean +``` + +`supportCodegen` is enabled (`true`) when this `Generator` is not [CodegenFallback](../expressions/CodegenFallback.md) + +`supportCodegen` is used when: + +* `GenerateExec` physical operator is requested for [supportCodegen](../physical-operators/GenerateExec.md#supportCodegen) + + diff --git a/docs/physical-operators/GenerateExec.md b/docs/physical-operators/GenerateExec.md index bf1a68a8aff..60a401a6332 100644 --- a/docs/physical-operators/GenerateExec.md +++ b/docs/physical-operators/GenerateExec.md @@ -4,12 +4,10 @@ title: GenerateExec # GenerateExec Unary Physical Operator -`GenerateExec` is a [unary physical operator](UnaryExecNode.md) with [CodegenSupport](CodegenSupport.md). +`GenerateExec` is a [unary physical operator](UnaryExecNode.md) to manage execution of a [Generator](#generator) expression. `GenerateExec` represents [Generate](../logical-operators/Generate.md) unary logical operator at execution time. -`GenerateExec` is an executon environment for the [Generator](#generator) expression. - When [executed](#doExecute), `GenerateExec` [executes](../expressions/Generator.md#eval) (aka _evaluates_) the [Generator](#boundGenerator) expression on every row in a RDD partition. ![GenerateExec's Execution -- `doExecute` Method](../images/spark-sql-GenerateExec-doExecute.png)