Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend cases for #11643 - optimization for string terms aggregation #13119

Open
sandeshkr419 opened this issue Apr 8, 2024 · 1 comment
Open

Comments

@sandeshkr419
Copy link
Contributor

#11643

W.r.t to the above PR, when optimizing the string terms aggregation for single terms, we omitted some cases:

  • Deleted documents in a segment
  • Doc frequency exists in any document of a segment

It should be possible that with the optimization, deleted documents in a segment can be handled in a segment gracefully by computing the aggregations using the same optimization and then traversing only through deleted documents.

Need to figure out:

  • What should be those ideal thresholds for the deleted documents compared to the total number of documents in a segment. This will require some data collection with experimenting different percentages of deleted documents.

Note, that we are mentioning this per segment basis as the optimization (if or not) is also relevant at a segment level.

@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6]
@sandeshkr419 Thanks for creating this issue, look forward to seeing a pull request to resolve.

From @peternied please use the template for issue creation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Untriaged
Status: Now(This Quarter)
Development

No branches or pull requests

2 participants