You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
The easier it is to track down the bug, the faster it is solved.
Issue description
When I negate .isin() function in PySpark, the generated query is malformed and results in an error.
23/08/08 21:28:09 ERROR Executor: Exception in task 0.0 in stage 7.0 (TID 19)
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: org.elasticsearch.hadoop.rest.EsHadoopRemoteException: query_shard_exception: failed to create query: For input string: "0 1 2 3"
{"query":{"bool":{"must":[{"match_all":{}}],"filter":[{"bool":{"must_not":{"bool":{"should":[{"match":{"group_id":"0 1 2 3"}}]}}}},{"bool":{"should":[{"match":{"status":"verified sent"}}]}}]}},"_source":["id","status","group_id"]}
at org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:487)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:444)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:438)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:418)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:318)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:94)
at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:66)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
What kind an issue is this?
The easier it is to track down the bug, the faster it is solved.
Issue description
When I negate
.isin()
function in PySpark, the generated query is malformed and results in an error.Steps to reproduce
Code:
Strack trace:
Version Info
OS: : Ubuntu 22.04
JVM :
Hadoop/Spark: PySpark 3.3.1
ES-Hadoop : elasticsearch-spark-30_2.12-8.9.0.jar
ES : 8.9.0
The text was updated successfully, but these errors were encountered: