You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A ClassCastException when using the TUMBLE function with expressions in a CREATE MATERIALIZED VIEW statement.
For example:
CREATE MATERIALIZED VIEW test_day AS
SELECT
COUNT(1),
window.start
FROM
test
GROUP BY
TUMBLE(CAST(FROM_UNIXTIME(time) AS TIMESTAMP), '1 Hour')
ORDER BY
window.start;
...
java.lang.ClassCastException: class org.apache.spark.sql.catalyst.expressions.Cast cannot be cast to class org.apache.spark.sql.catalyst.expressions.Attribute (org.apache.spark.sql.catalyst.expressions.Cast and org.apache.spark.sql.catalyst.expressions.Attribute are in unnamed module of loader 'app')
at org.opensearch.flint.spark.mv.FlintSparkMaterializedView$WindowingAggregate$.unapply(FlintSparkMaterializedView.scala:132)
at org.opensearch.flint.spark.mv.FlintSparkMaterializedView$$anonfun$1.applyOrElse(FlintSparkMaterializedView.scala:87)
at org.opensearch.flint.spark.mv.FlintSparkMaterializedView$$anonfun$1.applyOrElse(FlintSparkMaterializedView.scala:86)
...
What solution would you like?
Support expression in TUMBLE function. This is especially useful when time column in the source dataset is not timestamp type.
What alternatives have you considered?
Alternatively, using subquery can be a workaround:
CREATE MATERIALIZED VIEW test_day AS
SELECT
COUNT(1),
window.start
FROM (
SELECT CAST(FROM_UNIXTIME(start) AS TIMESTAMP) AS startTime
FROM test
)
GROUP BY
TUMBLE(startTime, '1 Hour')
ORDER BY
window.start
...
Do you have any additional context?
The first thing is to confirm if Spark can support event time defined by an expression.
The text was updated successfully, but these errors were encountered:
Actually EventTimeWatermark operator only accepts column. In this case the workaround above seems the right way to do this. I verified the correctness by inspecting the query plan:
Aggregate [window#132-T1000ms], [window#132-T1000ms.start AS startTime#107, count(1) AS count#108L]
+- Project [named_struct(...) AS window#132-T1000ms]
+- Filter isnotnull(timestamp2#106-T1000ms)
+- EventTimeWatermark timestamp2#106: timestamp, 1 seconds
+- Project [cast(timestamp#130 as timestamp) AS timestamp2#106]
+- StreamingRelation DataSource(org.apache.spark.sql.test.TestSparkSession@4bf9f44b,CSV,List(),
Some(StructType(StructField(id,IntegerType,true),StructField(status_code,IntegerType,true),
StructField(request_path,StringType,true),StructField(timestamp,StringType,true))),List(),None,
Map(header -> false, delimiter -> , path -> file:/...),Some(CatalogTable(...
Is your feature request related to a problem?
A
ClassCastException
when using the TUMBLE function with expressions in aCREATE MATERIALIZED VIEW
statement.For example:
What solution would you like?
Support expression in
TUMBLE
function. This is especially useful when time column in the source dataset is not timestamp type.What alternatives have you considered?
Alternatively, using subquery can be a workaround:
Do you have any additional context?
The first thing is to confirm if Spark can support event time defined by an expression.
The text was updated successfully, but these errors were encountered: