Standard Functions

japila-books · Apr 7, 2024 · 43c3665 · 43c3665
1 parent bfebf1a
commit 43c3665
Show file tree

Hide file tree

Showing 14 changed files with 201 additions and 74 deletions.
diff --git a/docs/Column.md b/docs/Column.md
@@ -244,7 +244,7 @@ over(window: WindowSpec): Column
 
 `over` creates a _windowing column_ (_aka_ _analytic clause_) that allows to execute an aggregate function over a [window](window-functions/WindowSpec.md) (i.e. a group of records that are in _some_ relation to the current record).
 
-TIP: Read up on windowed aggregation in Spark SQL in spark-sql-functions-windows.md[Window Aggregate Functions].
+TIP: Read up on windowed aggregation in Spark SQL in functions/windows-functions.md[Window Aggregate Functions].
 
 [source, scala]
 ----

diff --git a/docs/expressions/Generator.md b/docs/expressions/Generator.md
@@ -23,7 +23,9 @@ eval(
 Evaluates the given [InternalRow](../InternalRow.md) to produce zero, one or more [InternalRow](../InternalRow.md)s
 
 !!! note "Return Type"
-    `eval` is part of the [Expression](Expression.md#eval) abstraction and this `eval` enforces that `Generator`s produce a collection of [InternalRow](../InternalRow.md)s (not any other value as by non-generator expressions).
+    `eval` is part of the [Expression](Expression.md#eval) abstraction.
+
+    This `eval` enforces that `Generator`s produce a collection of [InternalRow](../InternalRow.md)s (not any other value as by non-generator expressions).
 
 ## Implementations
 

diff --git a/docs/expressions/MaxBy.md b/docs/expressions/MaxBy.md
@@ -0,0 +1,7 @@
+---
+title: MaxBy
+---
+
+# MaxBy Expression
+
+`MaxBy` is a `MaxMinBy` aggregate function expression.
diff --git a/docs/expressions/ParseToDate.md b/docs/expressions/ParseToDate.md
@@ -1,6 +1,6 @@
 # ParseToDate
 
-`ParseToDate` is a [RuntimeReplaceable](RuntimeReplaceable.md) expression to represent [to_date](../spark-sql-functions-datetime.md#to_date) function (in logical query plans).
+`ParseToDate` is a [RuntimeReplaceable](RuntimeReplaceable.md) expression to represent [to_date](../functions/datetime.md#to_date) function (in logical query plans).
 
 As a `RuntimeReplaceable` expression, `ParseToDate` is replaced by [Logical Query Optimizer](../catalyst/Optimizer.md#ReplaceExpressions) with the [child](#child) expression:
 

diff --git a/docs/expressions/ParseToTimestamp.md b/docs/expressions/ParseToTimestamp.md
@@ -1,6 +1,6 @@
 # ParseToTimestamp
 
-`ParseToTimestamp` is a [RuntimeReplaceable](RuntimeReplaceable.md) expression to represent [to_timestamp](../spark-sql-functions-datetime.md#to_timestamp) standard function (in logical query plans).
+`ParseToTimestamp` is a [RuntimeReplaceable](RuntimeReplaceable.md) expression to represent [to_timestamp](../functions/datetime.md#to_timestamp) standard function (in logical query plans).
 
 As a `RuntimeReplaceable` expression, `ParseToTimestamp` is replaced by [Logical Optimizer](../catalyst/Optimizer.md#ReplaceExpressions) with the [child](#child) expression:
 

diff --git a/docs/expressions/UnixTimestamp.md b/docs/expressions/UnixTimestamp.md
@@ -1,6 +1,6 @@
 # UnixTimestamp
 
-`UnixTimestamp` is a [binary](Expression.md#BinaryExpression) expression with [timezone](Expression.md#TimeZoneAwareExpression) support that represents [unix_timestamp](../spark-sql-functions-datetime.md#unix_timestamp) function (and indirectly [to_date](../spark-sql-functions-datetime.md#to_date) and [to_timestamp](../spark-sql-functions-datetime.md#to_timestamp)).
+`UnixTimestamp` is a [binary](Expression.md#BinaryExpression) expression with [timezone](Expression.md#TimeZoneAwareExpression) support that represents [unix_timestamp](../functions/datetime.md#unix_timestamp) function (and indirectly [to_date](../functions/datetime.md#to_date) and [to_timestamp](../functions/datetime.md#to_timestamp)).
 
 ```text
 import org.apache.spark.sql.functions.unix_timestamp
@@ -19,15 +19,19 @@ scala> c1.expr.isInstanceOf[UnixTimestamp]
 res0: Boolean = true
 ```
 
+<!---
+## Review Me
+
 NOTE: `UnixTimestamp` is `UnixTime` expression internally (as is `ToUnixTimestamp` expression).
 
 [[inputTypes]][[dataType]]
 `UnixTimestamp` supports `StringType`, [DateType](../types/DataType.md#DateType) and `TimestampType` as input types for a time expression and returns `LongType`.
 
-```
+```text
 scala> c1.expr.eval()
 res1: Any = 1493354303
 ```
 
 [[formatter]]
 `UnixTimestamp` uses `DateTimeUtils.newDateFormat` for date/time format (as Java's [java.text.DateFormat]({{ java.api }}/java/text/DateFormat.html)).
+-->
diff --git a/docs/functions/aggregate-functions.md b/docs/functions/aggregate-functions.md
@@ -1,6 +1,45 @@
 # Standard Aggregate Functions
 
-## <span id="collect_set"> collect_set
+## any { #any }
+
+```scala
+any(
+  e: Column): Column
+```
+
+`any`...FIXME
+
+## any_value { #any_value }
+
+```scala
+any_value(
+  e: Column): Column
+any_value(
+  e: Column,
+  ignoreNulls: Column): Column
+```
+
+`any_value`...FIXME
+
+## bool_and { #bool_and }
+
+```scala
+bool_and(
+  e: Column): Column
+```
+
+`bool_and`...FIXME
+
+## bool_or { #bool_or }
+
+```scala
+bool_or(
+  e: Column): Column
+```
+
+`bool_or`...FIXME
+
+## collect_set { #collect_set }
 
 ```scala
 collect_set(
@@ -13,6 +52,43 @@ collect_set(
 
 In the end, `collect_set` wraps the [AggregateExpression](../expressions/AggregateExpression.md) up in a [Column](../Column.md).
 
+## count_if { #count_if }
+
+```scala
+count_if(
+  e: Column): Column
+```
+
+`count_if`...FIXME
+
+## every { #every }
+
+```scala
+every(
+  e: Column): Column
+```
+
+`every`...FIXME
+
+## max_by { #max_by }
+
+```scala
+max_by(
+  e: Column,
+  ord: Column): Column
+```
+
+`max_by` creates a [MaxBy](../expressions/MaxBy.md) aggregate function that is then [wrapped into a Column](../functions/index.md#withAggregateFunction) (as an [AggregateExpression](../expressions/AggregateExpression.md)).
+
+## some { #some }
+
+```scala
+some(
+  e: Column): Column
+```
+
+`some`...FIXME
+
 <!---
 ## Review Me
 

diff --git a/docs/functions/collection-functions.md b/docs/functions/collection-functions.md
@@ -1,6 +1,6 @@
 # Standard Collection Functions
 
-## <span id="filter"> filter
+## filter { #filter }
 
 ```scala
 filter(
@@ -15,6 +15,22 @@ filter(
 
 In the end, `collect_set` wraps the `ArrayFilter` up in a [Column](../Column.md).
 
+## str_to_map { #str_to_map }
+
+```scala
+str_to_map(
+  text: Column): Column
+str_to_map(
+  text: Column,
+  pairDelim: Column): Column
+str_to_map(
+  text: Column,
+  pairDelim: Column,
+  keyValueDelim: Column): Column
+```
+
+`str_to_map`...FIXME
+
 <!---
 ## Review Me
 

diff --git a/docs/spark-sql-functions-datetime.md → docs/functions/datetime.md b/docs/spark-sql-functions-datetime.md → docs/functions/datetime.md
@@ -1,5 +1,48 @@
 # Date and Time Functions
 
+## to_date { #to_date }
+
+```scala
+to_date(
+  e: Column): Column
+to_date(
+  e: Column,
+  fmt: String): Column
+```
+
+`to_date` converts the column into [DateType](../types/DataType.md#DateType) (by casting to `DateType`).
+
+!!! note
+    `fmt` follows [the formatting styles](http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html).
+
+Internally, `to_date` creates a [Column](../Column.md) with [ParseToDate](../expressions/ParseToDate.md) expression (and `Literal` expression for `fmt`).
+
+!!! tip
+    Use [ParseToDate](../expressions/ParseToDate.md) expression to use a column for the values of `fmt`.
+
+## to_timestamp { #to_timestamp }
+
+```scala
+to_timestamp(
+  s: Column): Column
+to_timestamp(
+  s: Column,
+  fmt: String): Column
+```
+
+`to_timestamp` converts the column into [TimestampType](../types/DataType.md#TimestampType) (by casting to `TimestampType`).
+
+!!! note
+    `fmt` follows [the formatting styles](http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html).
+
+Internally, `to_timestamp` creates a [Column](../Column.md) with [ParseToTimestamp](../expressions/ParseToTimestamp.md) expression (and `Literal` expression for `fmt`).
+
+!!! tip
+    Use [ParseToTimestamp](../expressions/ParseToTimestamp.md) expression to use a column for the values of `fmt`.
+
+<!---
+## Review Me
+
 [[functions]]
 .(Subset of) Standard Functions for Date and Time
 [align="center",cols="1,2",width="100%",options="header"]
@@ -35,7 +78,7 @@
 current_date(): Column
 ```
 
-`current_date` function gives the current date as a [date](types/DataType.md#DateType) column.
+`current_date` function gives the current date as a [date](../types/DataType.md#DateType) column.
 
 ```text
 val df = spark.range(1).select(current_date)
@@ -51,7 +94,7 @@ root
  |-- current_date(): date (nullable = false)
 ```
 
-Internally, `current_date` creates a [Column](Column.md) with `CurrentDate` Catalyst leaf expression.
+Internally, `current_date` creates a [Column](../Column.md) with `CurrentDate` Catalyst leaf expression.
 
 ```text
 val c = current_date()
@@ -70,7 +113,7 @@ scala> println(cd.numberedTreeString)
 date_format(dateExpr: Column, format: String): Column
 ```
 
-Internally, `date_format` creates a [Column](Column.md) with `DateFormatClass` binary expression. `DateFormatClass` takes the expression from `dateExpr` column and `format`.
+Internally, `date_format` creates a [Column](../Column.md) with `DateFormatClass` binary expression. `DateFormatClass` takes the expression from `dateExpr` column and `format`.
 
 ```text
 val c = date_format($"date", "dd/MM/yyyy")
@@ -161,7 +204,7 @@ scala> spark.sql("SELECT unix_timestamp() as unix_timestamp").show
 +--------------+
 ```
 
-Internally, `unix_timestamp` creates a [Column](Column.md) with [UnixTimestamp](expressions/UnixTimestamp.md) binary expression (possibly with `CurrentTimestamp`).
+Internally, `unix_timestamp` creates a [Column](../Column.md) with [UnixTimestamp](../expressions/UnixTimestamp.md) binary expression (possibly with `CurrentTimestamp`).
 
 === [[window]] Generating Time Windows -- `window` Function
 
@@ -208,7 +251,7 @@ scala> val timeColumn = window('time, "5 seconds")
 timeColumn: org.apache.spark.sql.Column = timewindow(time, 5000000, 5000000, 0) AS `window`
 ----
 
-`timeColumn` should be of [TimestampType](types/DataType.md#TimestampType), i.e. with [java.sql.Timestamp]({{ java.api }}/java/sql/Timestamp.html) values.
+`timeColumn` should be of [TimestampType](../types/DataType.md#TimestampType), i.e. with [java.sql.Timestamp]({{ java.api }}/java/sql/Timestamp.html) values.
 
 !!! tip
     Use [java.sql.Timestamp.from]({{ java.api }}/java/sql/Timestamp.html#from-java.time.Instant-) or [java.sql.Timestamp.valueOf]({{ java.api }}/java/sql/Timestamp.html#valueOf-java.time.LocalDateTime-) factory methods to create `Timestamp` instances.
@@ -279,7 +322,7 @@ scala> sums.show
 !!! TIP
     Use `CalendarInterval` for valid window identifiers.
 
-Internally, `window` creates a [Column](Column.md) (with [TimeWindow](expressions/TimeWindow.md) expression) available as `window` alias.
+Internally, `window` creates a [Column](../Column.md) (with [TimeWindow](../expressions/TimeWindow.md) expression) available as `window` alias.
 
 ```text
 // q is the query defined earlier
@@ -305,43 +348,4 @@ scala> println(timeColumn.expr.numberedTreeString)
 NOTE: The example is borrowed from https://flink.apache.org/news/2015/12/04/Introducing-windows.html[Introducing Stream Windows in Apache Flink].
 
 The example shows how to use `window` function to model a traffic sensor that counts every 15 seconds the number of vehicles passing a certain location.
-
-## <span id="to_date"> to_date
-
-```scala
-to_date(
-  e: Column): Column
-to_date(
-  e: Column,
-  fmt: String): Column
-```
-
-`to_date` converts the column into [DateType](types/DataType.md#DateType) (by casting to `DateType`).
-
-!!! note
-    `fmt` follows [the formatting styles](http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html).
-
-Internally, `to_date` creates a [Column](Column.md) with [ParseToDate](expressions/ParseToDate.md) expression (and `Literal` expression for `fmt`).
-
-!!! tip
-    Use [ParseToDate](expressions/ParseToDate.md) expression to use a column for the values of `fmt`.
-
-## <span id="to_timestamp"> to_timestamp
-
-```scala
-to_timestamp(
-  s: Column): Column
-to_timestamp(
-  s: Column,
-  fmt: String): Column
-```
-
-`to_timestamp` converts the column into [TimestampType](types/DataType.md#TimestampType) (by casting to `TimestampType`).
-
-!!! note
-    `fmt` follows [the formatting styles](http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html).
-
-Internally, `to_timestamp` creates a [Column](Column.md) with [ParseToTimestamp](expressions/ParseToTimestamp.md) expression (and `Literal` expression for `fmt`).
-
-!!! tip
-    Use [ParseToTimestamp](expressions/ParseToTimestamp.md) expression to use a column for the values of `fmt`.
+-->