Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Regression in nyc_taxis desc_sort_tip_amount between 2.16 and 2.17 #16220

Open
peteralfonsi opened this issue Oct 7, 2024 · 5 comments
Labels
bug Something isn't working Search:Performance

Comments

@peteralfonsi
Copy link
Contributor

Describe the bug

desc_sort_tip_amount in nyc_taxis is this query:

"body": {
        "query": {
          "match_all": {}
        },
        "sort" : [
          {"tip_amount" : "desc"}
        ]
      }

Using OSB, its p50 for latency in 2.16 was 9.6 ms but in 2.17 it was 36.8 ms. Other percentiles are similarly affected. This happens consistently across different runs. It looks like the nightly benchmarks don't run this operation.

asc_sort_tip_amount is the same query with "asc" sort order. It's not affected. Its p50 went from 7.7 to 7.4 ms.

Related component

Search:Performance

To Reproduce

  • Create tar install on 2.16 or 2.17 branch using ./gradlew assemble
  • Run OpenSearch in a c5.xl instance with 4 GB heap size
  • Run nyc_taxis against the cluster using OSB and see differences

Expected behavior

The latencies should be at par.

Additional Details

Plugins
No plugins

Host/Environment (please complete the following information):
AL2 on c5.xl instance type, using tar install of OpenSearch built from 2.16 or 2.17 branches

@peteralfonsi
Copy link
Contributor Author

When doing more testing, I found in some runs of the workload we don't see the regression.

If this is the case, and we do more runs without recreating the index (by skipping those operations in OSB with a command like opensearch-benchmark execute-test --workload-path=/home/ec2-user/osb/opensearch-benchmark-workloads/nyc_taxis --workload-params='{"bulk_indexing_clients":16}' --target-host=http://localhost:9200/ --exclude-tasks=delete-index,create-index,check-cluster-health,index), those subsequent runs will also not have the regression. But, when we recreate the index again, the regression reappears. So it seems to have something to do with indexing?

So far I've seen the regression about 4 out of 5 times that I've run OSB with a new index.

@sandeshkr419
Copy link
Contributor

@rishabh6788 Is this the same query shape that you have been investigating?

If yes, can you share your experimentation details here.

@getsaurabh02
Copy link
Member

@peteralfonsi can we try running with one client and see if this is consistently reproducible? The ordering of data with multiple clients can create another variable here, so eliminating that will be great.

@peteralfonsi
Copy link
Contributor Author

I did 4 runs with "bulk_indexing_clients":1. 3 of them had the regression (2 in the ~35 ms range, 1 in the ~25 ms range) and 1 didn't (~9 ms).

Will post flamegraphs for 2.16 vs 2.17 tomorrow.

@peteralfonsi
Copy link
Contributor Author

Ok, I've now seen the regression happen on 2.16 as well. Probably this is unrelated to the version change, and has something to do with how it's indexed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search:Performance
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants