Skip to content

Commit

Permalink
[SPARK-49692][PYTHON][CONNECT] Refine the string representation of li…
Browse files Browse the repository at this point in the history
…teral date and datetime

### What changes were proposed in this pull request?
Refine the string representation of literal date and datetime

### Why are the changes needed?
1, we should not represent those literals with internal values;
2, the string representation should be consistent with PySpark Classic if possible (we cannot make sure the representations are always the same because we only hold an unresolved expression in connect, but we can try our best to do so)

### Does this PR introduce _any_ user-facing change?
yes

before:
```
In [3]: lit(datetime.date(2024, 7, 10))
Out[3]: Column<'19914'>

In [4]: lit(datetime.datetime(2024, 7, 10, 1, 2, 3, 456))
Out[4]: Column<'1720544523000456'>
```

after:
```
In [3]: lit(datetime.date(2024, 7, 10))
Out[3]: Column<'2024-07-10'>

In [4]: lit(datetime.datetime(2024, 7, 10, 1, 2, 3, 456))
Out[4]: Column<'2024-07-10 01:02:03.000456'>
```

### How was this patch tested?
added tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #48137 from zhengruifeng/py_connect_lit_dt.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
  • Loading branch information
zhengruifeng committed Sep 18, 2024
1 parent a6f6e07 commit 25d6b7a
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 2 deletions.
16 changes: 14 additions & 2 deletions python/pyspark/sql/connect/expressions.py
Original file line number Diff line number Diff line change
Expand Up @@ -477,8 +477,20 @@ def to_plan(self, session: "SparkConnectClient") -> "proto.Expression":
def __repr__(self) -> str:
if self._value is None:
return "NULL"
else:
return f"{self._value}"
elif isinstance(self._dataType, DateType):
dt = DateType().fromInternal(self._value)
if dt is not None and isinstance(dt, datetime.date):
return dt.strftime("%Y-%m-%d")
elif isinstance(self._dataType, TimestampType):
ts = TimestampType().fromInternal(self._value)
if ts is not None and isinstance(ts, datetime.datetime):
return ts.strftime("%Y-%m-%d %H:%M:%S.%f")
elif isinstance(self._dataType, TimestampNTZType):
ts = TimestampNTZType().fromInternal(self._value)
if ts is not None and isinstance(ts, datetime.datetime):
return ts.strftime("%Y-%m-%d %H:%M:%S.%f")
# TODO(SPARK-49693): Refine the string representation of timedelta
return f"{self._value}"


class ColumnReference(Expression):
Expand Down
9 changes: 9 additions & 0 deletions python/pyspark/sql/tests/test_column.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@

from enum import Enum
from itertools import chain
import datetime

from pyspark.sql import Column, Row
from pyspark.sql import functions as sf
from pyspark.sql.types import StructType, StructField, IntegerType, LongType
Expand Down Expand Up @@ -280,6 +282,13 @@ def test_expr_str_representation(self):
when_cond = sf.when(expression, sf.lit(None))
self.assertEqual(str(when_cond), "Column<'CASE WHEN foo THEN NULL END'>")

def test_lit_time_representation(self):
dt = datetime.date(2021, 3, 4)
self.assertEqual(str(sf.lit(dt)), "Column<'2021-03-04'>")

ts = datetime.datetime(2021, 3, 4, 12, 34, 56, 1234)
self.assertEqual(str(sf.lit(ts)), "Column<'2021-03-04 12:34:56.001234'>")

def test_enum_literals(self):
class IntEnum(Enum):
X = 1
Expand Down

0 comments on commit 25d6b7a

Please sign in to comment.