Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Filter on timestamp fields compares date only instead of datetime #560

Open
engechas opened this issue Aug 12, 2024 · 1 comment
Open
Labels
bug Something isn't working DataSource:Iceberg

Comments

@engechas
Copy link
Contributor

What is the bug?
This is a bug with direct query + iceberg.

When adding a filter to a query based on a timestamp field compared to a datetime, the filter is only considering the date from the datetime.

Example:

SELECT * FROM validation.amazon_security_lake_glue_db_us_west_2.amazon_security_lake_table_us_west_2_route53_2_0 WHERE time_dt > '2024-08-09T00:00:00Z' ORDER BY time_dt ASC LIMIT 10

The above query is expected to return all results from 2024-08-09T00:00:01Z to the current time. Instead it only returns results for entries 2024-08-10T00:00:00Z to the current time.

Changing the operator from > to >= returns the expected results.

It looks like Iceberg/Spark are comparing only the dates from the datetime filter.

Explained query for the above example looks like it has converted the datetime to an epoch microsecond correctly, but the query results do not align with that.

== Physical Plan ==
TakeOrderedAndProject(limit=10, orderBy=[time_dt#2099 ASC NULLS FIRST], output=[metadata#2095,cloud#2096,src_endpoint#2097,time#2098L,time_dt#2099,query#2100,answers#2101,connection_info#2102,dst_endpoint#2103,firewall_rule#2104,severity_id#2105,severity#2106,class_name#2107,class_uid#2108,category_name#2109,category_uid#2110,activity_id#2111,activity_name#2112,type_uid#2113L,type_name#2114,rcode_id#2115,rcode#2116,disposition#2117,action#2118,... 6 more fields])
+- *(1) Project [metadata#2095, cloud#2096, src_endpoint#2097, time#2098L, time_dt#2099, query#2100, answers#2101, connection_info#2102, dst_endpoint#2103, firewall_rule#2104, severity_id#2105, severity#2106, class_name#2107, class_uid#2108, category_name#2109, category_uid#2110, activity_id#2111, activity_name#2112, type_uid#2113L, type_name#2114, rcode_id#2115, rcode#2116, disposition#2117, action#2118, ... 6 more fields]
   +- *(1) Filter (time_dt#2099 > 2024-08-09 00:00:00)
      +- BatchScan[metadata#2095, cloud#2096, src_endpoint#2097, time#2098L, time_dt#2099, query#2100, answers#2101, connection_info#2102, dst_endpoint#2103, firewall_rule#2104, severity_id#2105, severity#2106, class_name#2107, class_uid#2108, category_name#2109, category_uid#2110, activity_id#2111, activity_name#2112, type_uid#2113L, type_name#2114, rcode_id#2115, rcode#2116, disposition#2117, action#2118, ... 6 more fields] spark_catalog.amazon_security_lake_glue_db_us_west_2.amazon_security_lake_table_us_west_2_route53_2_0 (branch=null) [filters=time_dt IS NOT NULL, time_dt > 1723161600000000, groupedBy=] RuntimeFilters: []

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Create an Iceberg table with a timestamp field
  2. Filter on this field using a datetime

What is the expected behavior?
Filters based on datetimes should return results after the datetime rather than the date within the datetime.

What is your host/environment?

  • Version 0.4.0

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

@engechas engechas added bug Something isn't working untriaged labels Aug 12, 2024
@dblock dblock removed the untriaged label Sep 2, 2024
@dblock
Copy link
Member

dblock commented Sep 2, 2024

[Weekly Catch All Triage - 1]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working DataSource:Iceberg
Projects
None yet
Development

No branches or pull requests

3 participants