Skip to content

Commit

Permalink
[SPARK-45223][PYTHON][DOCS] Refine docstring of Column.when
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

This PR proposes to improve the docstring of `Column.when`.

### Why are the changes needed?

For end users, and better usability of PySpark.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes the user facing documentation.

### How was this patch tested?

Manually tested.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43688 from HyukjinKwon/SPARK-45223.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
  • Loading branch information
HyukjinKwon committed Nov 7, 2023
1 parent 57fc1ab commit b0791b5
Showing 1 changed file with 37 additions and 3 deletions.
40 changes: 37 additions & 3 deletions python/pyspark/sql/column.py
Original file line number Diff line number Diff line change
Expand Up @@ -1388,17 +1388,51 @@ def when(self, condition: "Column", value: Any) -> "Column":
Examples
--------
Example 1: Using :func:`when` with conditions and values to create a new Column
>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame(
... [(2, "Alice"), (5, "Bob")], ["age", "name"])
>>> df.select(df.name, sf.when(df.age > 4, 1).when(df.age < 3, -1).otherwise(0)).show()
>>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], ["age", "name"])
>>> result = df.select(df.name, sf.when(df.age > 4, 1).when(df.age < 3, -1).otherwise(0))
>>> result.show()
+-----+------------------------------------------------------------+
| name|CASE WHEN (age > 4) THEN 1 WHEN (age < 3) THEN -1 ELSE 0 END|
+-----+------------------------------------------------------------+
|Alice| -1|
| Bob| 1|
+-----+------------------------------------------------------------+
Example 2: Chaining multiple :func:`when` conditions
>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(1, "Alice"), (4, "Bob"), (6, "Charlie")], ["age", "name"])
>>> result = df.select(
... df.name,
... sf.when(df.age < 3, "Young").when(df.age < 5, "Middle-aged").otherwise("Old")
... )
>>> result.show()
+-------+---------------------------------------------------------------------------+
| name|CASE WHEN (age < 3) THEN Young WHEN (age < 5) THEN Middle-aged ELSE Old END|
+-------+---------------------------------------------------------------------------+
| Alice| Young|
| Bob| Middle-aged|
|Charlie| Old|
+-------+---------------------------------------------------------------------------+
Example 3: Using literal values as conditions
>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], ["age", "name"])
>>> result = df.select(
... df.name, sf.when(sf.lit(True), 1).otherwise(
... sf.raise_error("unreachable")).alias("when"))
>>> result.show()
+-----+----+
| name|when|
+-----+----+
|Alice| 1|
| Bob| 1|
+-----+----+
See Also
--------
pyspark.sql.functions.when
Expand Down

0 comments on commit b0791b5

Please sign in to comment.