Spark connector's implentation of "explode" does not work on nested fields #2051

ThibSCH · 2022-12-14T14:56:59Z

Hi everyone,

What kind an issue is this?

Bug report.
Feature Request

Issue description

We use Spark to manipulated an array of distinct objects in an ElasticSearch Index.
The ElasticSearch index's field is mapped as :

"array_field": {
        "type": "nested",
        "properties": {
          "property1": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          },
          "property2": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          },
          "property3": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          },
          "property4": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          },
          "property5": {
            "type": "date"
          }
        }
      }

When we use the explode Spark function on a dataset created from reading from ElasticSearch the connector generates the following query :

"query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        }
      ],
      "filter": [
        {
          "exists": {
            "field": "array_field"
          }
        }
      ]
    }
  }

The "exists" part in the query is generated to differentiate calls of explode and explode_outer because explode drops nulls elements whereas explode_outer keeps them.
But since the field is a nested, the query never gets any match because it is not a nested query therefore the dataset is always empty.

Steps to reproduce

Create an index with a nested mapped field
Put a document with a valued nested field
Read the index from Spark into a dataset
Call Spark explode(field) on the nested field on the dataset
The dataset is empty because the generated query does not match any document

Version Info

OS: : Linux
JVM : 1.8
Hadoop/Spark: Spark 3.3.0
ES-Hadoop : elasticsearch-spark-30_2.12:8.2.2

The text was updated successfully, but these errors were encountered:

jbaiera added bug :Spark labels Jan 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark connector's implentation of "explode" does not work on nested fields #2051

Spark connector's implentation of "explode" does not work on nested fields #2051

ThibSCH commented Dec 14, 2022

Spark connector's implentation of "explode" does not work on nested fields #2051

Spark connector's implentation of "explode" does not work on nested fields #2051

Comments

ThibSCH commented Dec 14, 2022

What kind an issue is this?

Issue description

Steps to reproduce

Version Info