You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
The easier it is to track down the bug, the faster it is solved.
Feature Request. Start by telling us what problem you’re trying to solve.
Often a solution already exists! Don’t send pull requests to implement new features without
first getting our support. Sometimes we leave features out on purpose to keep the project small.
Issue description
When read a Hive table stored by EsStorageHandler using Spark SQL, java.lang.ClassCastException is thrown.
22/08/30 21:36:16 ERROR SparkSQLDriver: Failed in [select * from test_column]
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1) (10.219.184.72 executor driver): java.lang.ClassCastException: org.elasticsearch.hadoop.hive.EsHiveInputFormat$EsHiveSplit cannot be cast to org.elasticsearch.hadoop.mr.EsInputFormat$EsInputSplit
at org.elasticsearch.hadoop.mr.EsInputFormat$EsInputRecordReader.initialize(EsInputFormat.java:144)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:220)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:217)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:172)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1469)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Version Info
Spark: 3.1.2
ES-Hadoop : 8.3.3
ES : 8.4.0
The text was updated successfully, but these errors were encountered:
We don't have any test coverage of reading from Hive with the spark adapter. I suspect you're the first to try this. As a workaround can you just use spark sql directly, and avoid Hive altogether?
What kind an issue is this?
The easier it is to track down the bug, the faster it is solved.
Often a solution already exists! Don’t send pull requests to implement new features without
first getting our support. Sometimes we leave features out on purpose to keep the project small.
Issue description
When read a Hive table stored by EsStorageHandler using Spark SQL, java.lang.ClassCastException is thrown.
Steps to reproduce
test_column
in Elasticsearch.test_column
in hive-cli:select * from test_column
using Spark SQLStrack trace:
Version Info
Spark: 3.1.2
ES-Hadoop : 8.3.3
ES : 8.4.0
The text was updated successfully, but these errors were encountered: