Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-UTC timezone for casting from date type to timestamp type [databricks] #11462

Merged
merged 5 commits into from
Sep 23, 2024

Conversation

res-life
Copy link
Collaborator

closes #11461

Only support non-DST(daylight saving time) time zones.

Signed-off-by: Chong Gao [email protected]

@res-life res-life changed the title Support non-UTC timezone for casting from date type to timestamp type Support non-UTC timezone for casting from date type to timestamp type [databricks] Sep 11, 2024
@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

non-DST timezone

passed:

TZ=Asia/Shanghai ./integration_tests/run_pyspark_from_build.sh -s -k test_date_to_timestamp

For DST timezone:

Report not supporting prompt:

TZ=America/Los_Angeles ./integration_tests/run_pyspark_from_build.sh -s -k test_date_to_timestamp
!Expression <Cast> cast(a#4 as timestamp) 
cannot run on GPU because Timezone America/Los_Angeles is not supported yet. 
Only Non DST (daylight saving time) timezone is supported.

firestarman
firestarman previously approved these changes Sep 11, 2024
Copy link
Collaborator

@firestarman firestarman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me but a NIT.


@pytest.mark.parametrize('parser_policy', ["LEGACY", "CORRECTED", "EXCEPTION"], ids=idfn)
def test_date_to_timestamp(parser_policy):
parser_policy_dic = {"spark.sql.legacy.timeParserPolicy": "{}".format(parser_policy)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT

Suggested change
parser_policy_dic = {"spark.sql.legacy.timeParserPolicy": "{}".format(parser_policy)}
all_confs = {
"spark.sql.legacy.timeParserPolicy": "{}".format(parser_policy),
"spark.rapids.sql.incompatibleDateFormats.enabled": True}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@res-life
Copy link
Collaborator Author

build

@sameerz sameerz added the feature request New feature or request label Sep 14, 2024
@firestarman
Copy link
Collaborator

premerge failed

"SELECT cast(a as timestamp) from tab",
conf = {
"spark.sql.legacy.timeParserPolicy": "{}".format(parser_policy),
"spark.rapids.sql.incompatibleDateFormats.enabled": True})
Copy link
Collaborator

@firestarman firestarman Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we only need spark.rapids.sql.incompatibleDateFormats.enabled being true for LEGACY policy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@res-life
Copy link
Collaborator Author

build

@@ -671,3 +671,14 @@ def test_timestamp_millis_long_overflow():
def test_timestamp_micros(data_gen):
assert_gpu_and_cpu_are_equal_collect(
lambda spark : unary_op_df(spark, data_gen).selectExpr("timestamp_micros(a)"))


@pytest.mark.parametrize('parser_policy', ["LEGACY", "CORRECTED", "EXCEPTION"], ids=idfn)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to use different quote marks?

Suggested change
@pytest.mark.parametrize('parser_policy', ["LEGACY", "CORRECTED", "EXCEPTION"], ids=idfn)
@pytest.mark.parametrize('parser_policy', ['LEGACY', 'CORRECTED', 'EXCEPTION'], ids=idfn)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"tab",
"SELECT cast(a as timestamp) from tab",
conf = {
"spark.sql.legacy.timeParserPolicy": "{}".format(parser_policy),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format redundant?

Suggested change
"spark.sql.legacy.timeParserPolicy": "{}".format(parser_policy),
"spark.sql.legacy.timeParserPolicy": parser_policy,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@res-life
Copy link
Collaborator Author

Spark 400 building failure is related to #11479

@res-life
Copy link
Collaborator Author

build

firestarman
firestarman previously approved these changes Sep 20, 2024
@res-life
Copy link
Collaborator Author

build

gerashegalov
gerashegalov previously approved these changes Sep 20, 2024
@res-life
Copy link
Collaborator Author

build

@res-life res-life merged commit ebcc146 into NVIDIA:branch-24.10 Sep 23, 2024
43 of 45 checks passed
@res-life res-life deleted the cast-date-as-timestamp-non-UTC branch September 23, 2024 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support non-UTC timezone for casting from date to timestamp
4 participants