Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Performance Regression in 2.14 and 3.0 hourly_aggs in http_logs workload #13345

Closed
mgodwan opened this issue Apr 23, 2024 · 9 comments
Closed
Labels
bug Something isn't working engine performance Performance This is for any performance related enhancements or bugs Search:Aggregations Search:Performance v2.14.0

Comments

@mgodwan
Copy link
Member

mgodwan commented Apr 23, 2024

Describe the bug

https://opensearch.org/benchmarks

Screenshot 2024-04-23 at 2 10 27 PM

The P90 latency observed for hourly_aggs query has regressed over the last weekk

Related component

Search:Aggregations

Expected behavior

Latency should not increase

Additional Details

No response

@mgodwan
Copy link
Member Author

mgodwan commented Apr 23, 2024

Tagging @getsaurabh02 @msfroh @bbarani to see if they may be aware of any changes. In parallel, looking through the commit history to see if I can find some commit which could've cause this.

@mgodwan
Copy link
Member Author

mgodwan commented Apr 23, 2024

One of the commits (on the same day when regression started) which touch aggregation path slightly: 8332859 [Can be evaluated if this could have had some impact]

@getsaurabh02
Copy link
Member

@mgodwan This looks related to the #13179 where @bowenlan-amzn has added cluster setting to dynamically disable filter rewrite optimization.

Based on the description it reduces the deciding threshold for rewrite filters from 1024 to 24. Meaning if the date histogram aggregation include more than 24 buckets (e.g. hourly aggregation of 1 day), we won't use the optimization After this change, we will probably see regression for date_histogram_hourly_agg of big5 workload. That will be handled after the long term solution merged in next.

@bowenlan-amzn
Copy link
Member

bowenlan-amzn commented Apr 23, 2024

The change causing this is adding a dynamic cluster setting to decrease the threshold of apply our optimization on date histogram. The threshold is the number of filters rewritten from date histogram. Previous 1024 is reported to causing regression on pmc workload.

Since it's a dynamic setting, it won't actually cause regression for users and instead giving them ability to tune for their workload.

The PR for long term fix: #13317

@mgodwan
Copy link
Member Author

mgodwan commented Apr 24, 2024

Thanks @bowenlan-amzn

Since it's a dynamic setting, it won't actually cause regression for users and instead giving them ability to tune for their workload.

Is this setting enabled for the benchmark setup where we are seeing regression?

@bowenlan-amzn
Copy link
Member

The setting is a threshold. This operation of http workload currently exceed the threshold so our previous optimization is disabled, hence the regression.

@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6]
@mgodwan Thanks for creating this issue. This looks like a potential release blocking for v2.14. Please let me know if you need any help getting eyes on this issue.

@mgodwan
Copy link
Member Author

mgodwan commented Apr 29, 2024

The setting is a threshold. This operation of http workload currently exceed the threshold so our previous optimization is disabled, hence the regression.

@bowenlan-amzn Do we need to revisit the threshold defaults in that case as the current ones have shown to cause regression?

@getsaurabh02 getsaurabh02 moved this from 🆕 New to Now(This Quarter) in Search Project Board May 1, 2024
@bowenlan-amzn
Copy link
Member

Fix/Improvements merged in

@github-project-automation github-project-automation bot moved this from Now(This Quarter) to ✅ Done in Search Project Board May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working engine performance Performance This is for any performance related enhancements or bugs Search:Aggregations Search:Performance v2.14.0
Projects
Archived in project
Status: Planned work items
Development

No branches or pull requests

4 participants