Skip to content

Commit

Permalink
[SPARK-49353][SQL] Update docs related to UTF-32 encoding/decoding
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?
The pr aims to update the related docs after `encoding` and `decoding` support `UTF-32`,  includes:
- the `doc` of the sql config `spark.sql.legacy.javaCharsets`
- connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
- sql/core/src/main/scala/org/apache/spark/sql/functions.scala
- python/pyspark/sql/functions/builtin.py

### Why are the changes needed?
After the pr #46469, `UTF-32` for string encoding and decoding is already supported, but some related documents have not been updated synchronously.
Let's update it to avoid misunderstandings for end-users and developers.

https://github.com/apache/spark/blob/e93c5fbe81d21f8bf2ce52867013d06a63c7956e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharsetProvider.scala#L26

### Does this PR introduce _any_ user-facing change?
Yes, fix doc.

### How was this patch tested?
Nope, only fixed some docs.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #47844 from panbingkun/SPARK-49353.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
  • Loading branch information
panbingkun authored and MaxGekk committed Aug 22, 2024
1 parent 3e8cd99 commit b57d863
Show file tree
Hide file tree
Showing 4 changed files with 10 additions and 9 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -3840,8 +3840,8 @@ object functions {

/**
* Computes the first argument into a string from a binary using the provided character set (one
* of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument
* is null, the result will also be null.
* of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', 'UTF-32'). If either
* argument is null, the result will also be null.
*
* @group string_funcs
* @since 3.4.0
Expand All @@ -3851,8 +3851,8 @@ object functions {

/**
* Computes the first argument into a binary from a string using the provided character set (one
* of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument
* is null, the result will also be null.
* of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', 'UTF-32'). If either
* argument is null, the result will also be null.
*
* @group string_funcs
* @since 3.4.0
Expand Down
4 changes: 2 additions & 2 deletions python/pyspark/sql/functions/builtin.py
Original file line number Diff line number Diff line change
Expand Up @@ -10989,7 +10989,7 @@ def concat_ws(sep: str, *cols: "ColumnOrName") -> Column:
def decode(col: "ColumnOrName", charset: str) -> Column:
"""
Computes the first argument into a string from a binary using the provided character set
(one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
(one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', 'UTF-32').

.. versionadded:: 1.5.0

Expand Down Expand Up @@ -11027,7 +11027,7 @@ def decode(col: "ColumnOrName", charset: str) -> Column:
def encode(col: "ColumnOrName", charset: str) -> Column:
"""
Computes the first argument into a binary from a string using the provided character set
(one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
(one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', 'UTF-32').

.. versionadded:: 1.5.0

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5090,7 +5090,8 @@ object SQLConf {
.internal()
.doc("When set to true, the functions like `encode()` can use charsets from JDK while " +
"encoding or decoding string values. If it is false, such functions support only one of " +
"the charsets: 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'.")
"the charsets: 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', " +
"'UTF-32'.")
.version("4.0.0")
.booleanConf
.createWithDefault(false)
Expand Down
4 changes: 2 additions & 2 deletions sql/core/src/main/scala/org/apache/spark/sql/functions.scala
Original file line number Diff line number Diff line change
Expand Up @@ -3752,7 +3752,7 @@ object functions {

/**
* Computes the first argument into a string from a binary using the provided character set
* (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
* (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', 'UTF-32').
* If either argument is null, the result will also be null.
*
* @group string_funcs
Expand All @@ -3763,7 +3763,7 @@ object functions {

/**
* Computes the first argument into a binary from a string using the provided character set
* (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
* (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', 'UTF-32').
* If either argument is null, the result will also be null.
*
* @group string_funcs
Expand Down

0 comments on commit b57d863

Please sign in to comment.