Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Flint data source cannot read nested field value #233

Open
dai-chen opened this issue Jan 23, 2024 · 1 comment
Open

[BUG] Flint data source cannot read nested field value #233

dai-chen opened this issue Jan 23, 2024 · 1 comment
Labels
bug Something isn't working DataSource:OpenSearch

Comments

@dai-chen
Copy link
Collaborator

What is the bug?

Flint data source always return NULL for nested field value.

How can one reproduce the bug?

Create an OpenSearch index with nested field:

POST nested_index/_doc
{
  "a.b.c": 123
}

GET nested_index/_search
    "hits": [
      {
        "_index": "nested_index",
        "_id": "GDV4OI0BWdGHpYUCQ66r",
        "_score": 1,
        "_source": {
          "a.b.c": 123
        }
      }
    ]

Read it using Flint data source in spark-shell:

spark.read.format("flint").load("nested_index").schema
res6: org.apache.spark.sql.types.StructType = 
StructType(StructField(a,StructType(StructField(b,StructType(StructField(c,LongType,true)),true)),true))

spark.read.format("flint").load("nested_index").show
+----+
|   a|
+----+
|null|
+----+

spark.read.format("flint").load("nested_index").select("a.b.c").show
+----+
|   c|
+----+
|null|
+----+

What is the expected behavior?

Flint data source can read nested field value as expected.

@dai-chen
Copy link
Collaborator Author

dai-chen commented Jan 24, 2024

Did more test. In OpenSearch, index mapping is the same.

POST nested_index/_doc
{
  "a.b.c": 123
}

POST nested_index_2/_doc
{
  "a": {
    "b": {
      "c": 123
    }
  }
}

# Both index has the same mapping as below
GET nested_index/_mapping
{
  "nested_index": {
    "mappings": {
      "properties": {
        "a": {
          "properties": {
            "b": {
              "properties": {
                "c": {
                  "type": "long"
                }
              }
            }
          }
        }
      }
    }
  }
}

But field value in _source is different (what's given when indexing):

GET nested_index/_search
    ...
    "hits": [
      {
        "_index": "nested_index",
        "_id": "GDV4OI0BWdGHpYUCQ66r",
        "_score": 1,
        "_source": {
          "a.b.c": 123
        }
      }
    ]

GET nested_index_2/_search
    ...
    "hits": [
      {
        "_index": "nested_index_2",
        "_id": "04jYPI0BZG4KSy0O6VBi",
        "_score": 1,
        "_source": {
          "a": {
            "b": {
              "c": 123
            }
          }
        }
      }
    ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working DataSource:OpenSearch
Development

No branches or pull requests

1 participant