Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] The aggs result of NestedAggregator with sub NestedAggregator may be not accurately #13303

Closed
kkewwei opened this issue Apr 19, 2024 · 2 comments · Fixed by #13324
Closed
Assignees
Labels
bug Something isn't working Search:Aggregations

Comments

@kkewwei
Copy link
Contributor

kkewwei commented Apr 19, 2024

Describe the bug

the result of NestedAggregator with sub NestedAggregator is not accurately here, the two values of doc_count should be 4.
image

Related component

Search:Aggregations

To Reproduce

  1. create the index.
PUT index1_nest111
{
    "settings": {
    "index.refresh_interval":"30s"
    }, 
   "mappings": {
      "properties": {
         "nested1": {
            "type": "nested",
            "properties": {
               "name": {
                  "type": "keyword"
               }
            }
         },
         "nested2": {
            "type": "nested",
            "properties": {
               "age": {
                  "type": "long"
               }
            }
         }
      }
   }
}
  1. put the data.
    the 4 documents are same, except for the _id:
POST _bulk?refresh=true
{ "index": { "_index": "index1_nest111", "_id": "1" } }
{ "nested2": {"age":1}, "nested1": {"name": "name1"} }
{ "index": { "_index": "index1_nest111", "_id": "2" } }
{ "nested2": {"age":1}, "nested1": {"name": "name1"} }


POST _bulk?refresh=true
{ "index": { "_index": "index1_nest111", "_id": "3" } }
{ "nested2": {"age":1}, "nested1": {"name": "name1"} }
{ "index": { "_index": "index1_nest111", "_id": "4" } }
{ "nested2": {"age":1}, "nested1": {"name": "name1"} }
  1. aggregation
POST index1_nest111/_search
{
  "aggregations": {
    "out_nested": {
      "aggregations": {
        "out_terms": {
          "aggregations": {
            "inner_nested": {
              "aggregations": {
                "inner_terms": {
                  "terms": {
                    "field": "nested1.name"
                  }
                }
              },
              "nested": {
                "path": "nested1"
              }
            }
          },
          "terms": {
            "field": "nested2.age"
          }
        }
      },
      "nested": {
        "path": "nested2"
      }
    }
  },
  "size": 0
}

Expected behavior

The inner_nested.doc_count shouble alse be 4.

If it's a bug, I'm please to fix.

Additional Details

Host/Environment (please complete the following information):

  • OS: os2.9
@kkewwei
Copy link
Contributor Author

kkewwei commented Apr 22, 2024

Nest2 child is outer nested aggregation, nest1 child is inner nested aggregation.

To help explain the describe above:
image

When execute the inner nested aggregation, the parentDoc=0(the first lucene document id) will be discarded
https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/search/aggregations/bucket/nested/NestedAggregator.java#L196

We can see that parentDoc will not be always bigger than childDoc, which means that the function logic processBufferedChildBuckets is wrong, it will aggregate unrelated document.

@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6 7 8]
@kkewwei Thanks for creating this issue, thanks for the pull request to address!

@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Search Project Board Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search:Aggregations
Projects
Archived in project
2 participants