Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Indexing throughput regression in OpenSearch main branch #12991

Closed
shwetathareja opened this issue Apr 1, 2024 · 9 comments
Closed

[BUG] Indexing throughput regression in OpenSearch main branch #12991

shwetathareja opened this issue Apr 1, 2024 · 9 comments
Assignees
Labels
bug Something isn't working Indexing:Performance Indexing Indexing, Bulk Indexing and anything related to indexing

Comments

@shwetathareja
Copy link
Member

Describe the bug

There is regression observed in Indexing Performance around 10-15% starting March 22 2024 in mainline branch during the nightly runs.
Screenshot 2024-03-29 at 5 44 30 PM

Pending further details.

Related component

Indexing:Performance

To Reproduce

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

No regression.

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@shwetathareja shwetathareja added bug Something isn't working Indexing Indexing, Bulk Indexing and anything related to indexing labels Apr 1, 2024
@khushbr
Copy link

khushbr commented Apr 1, 2024

The Indexing throughput degradation is seen only on 3.0.0, refer below screenshot taken from https://opensearch.org/benchmarks/

The version 2.12.0 and 2.13.0 performance numbers are close.

Screenshot 2024-04-01 at 10 35 39 AM

@khushbr
Copy link

khushbr commented Apr 1, 2024

I looked at the delta commits and one suspect is https://github.com/opensearch-project/OpenSearch/pull/12494/files,

@bbarani Do we have support to run the Nightly Benchmarks against a commit ? Can we run the HTTP corpus against the above commit ?

@bbarani
Copy link
Member

bbarani commented Apr 1, 2024

@rishabh6788 can you help @khushbr ?

@rishabh6788
Copy link
Contributor

Synced up with @khushbr and provided her the details.
The reason we don't see regression on 2.13 is because the commit Khushboo mentioned has not been merged onto 2.13 branch. It is there on 2.x branch and once we schedule nightlies for 2.14 after 2.13 release tomorrow we should see similar pattern as 3.0 on 2.x line, if in fact that is the offending commit.

@bbarani
Copy link
Member

bbarani commented Apr 2, 2024

The Indexing throughput degradation is seen only on 3.0.0, refer below screenshot taken from https://opensearch.org/benchmarks/

The version 2.12.0 and 2.13.0 performance numbers are close.

I still notice the regression (especially when security is disabled) in main branch

Screenshot 2024-04-02 at 10 10 10 AM

@khushbr
Copy link

khushbr commented Apr 4, 2024

Baseline with Light weight Transport action to verify local term before fetching cluster-state from remote

Min Throughput Mean Throughput Median Throughput Max Throughput
1 207733 220321 219700 232087
2 206696 215810 212435 228198
3 219180 228342 225731 240500

Uptill ([Remote Migration] Changes for Primary Relocation during migration ) OSB runs (in docs/s):

Min Throughput Mean Throughput Median Throughput Max Throughput
1 191314 200558 199832 210299
2 183816 197962 194975 216139
3 185448 193235 192683 200703

Uptill (Update supported version for the wait_for_completion parameter in open&clone&shrink&split APIs ) OSB runs(in docs/s):

Min Throughput Mean Throughput Median Throughput Max Throughput
1 187156 193726 193174 200601
2 188290 196304 195373 205012
3 191328 200612 199125 210423

Full Set Commits taken on 03/22 (Catch task description error) OSB runs(in docs/s):

Min Throughput Mean Throughput Median Throughput Max Throughput
1 192710 201019 199823 210869
2 191049 201927 199053 216989
3 193736 205920 203444 221371

@khushbr
Copy link

khushbr commented Apr 4, 2024

The OSB dashboard plots the Max of Mean (which for single run is Mean value) Throughput. For the runs on and prior to 03/21, the Mean throughput is > 200K.

On running the same https_logs workload in my setup, I see the Mean Throughput value decline with the change [Remote Migration] Changes for Primary Relocation during migration , dropping to ~19K (see table above)

@gbbafna
Copy link
Collaborator

gbbafna commented Apr 4, 2024

I see that the throughput has increased again for all of the benchmarks . We have added a change to memoize a value rather than looking into index settings for same : #12994 and backported it to 2.x (2.14) as well . The results of 2.14 and 3.0 are on par with 2.13 now.

Screenshot 2024-04-04 at 2 23 30 PM

@khushbr khushbr removed their assignment Apr 4, 2024
@gbbafna gbbafna closed this as completed Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Indexing:Performance Indexing Indexing, Bulk Indexing and anything related to indexing
Projects
None yet
Development

No branches or pull requests

5 participants