Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Top N Queries by latency implementation #11904

Merged
merged 5 commits into from
Feb 6, 2024

Conversation

ansjcy
Copy link
Member

@ansjcy ansjcy commented Jan 17, 2024

Description

(parent RFC: #11186)
As a follow up of #11903, This PR implements the Top N Queries by latency feature #11186
More specifically, this PR includes:

  • The Top N queries service, listener, and related transport and REST endpoints.
  • Added unit tests for features and API added.

Related Issues

#11186

How to use the API:

  1. First enable the top N queries insight feature
curl -X PUT 'localhost:9200/_cluster/settings' -H 'Content-Type: application/json' -d'
{
    "persistent" : {
        "search.insights.top_queries.latency.enabled" : "true",
        "search.insights.top_queries.latency.window_size" : "60s",
        "search.insights.top_queries.latency.top_n_size" : 5
    }
}'
  1. Insert documents for searching
curl -X POST "localhost:9200/my-index-0/_doc/?pretty" -H 'Content-Type: application/json' -d'
{
  "@timestamp": "2023-12-01T13:12:00",
  "message": "this is my document",
  "user": {
    "id": "ansjcy"
  }
}'
  1. Do some search operations
curl -X GET "localhost:9200/my-index-0/_search?size=20&pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "message": "document 2"
          }
        },
        {
          "match": {
            "user.id": "cyji"
          }
        }
      ]
    }
  }
}'
curl -X GET "localhost:9200/my-index-0/_search?size=20&pretty" -H 'Content-Type: application/json' -d '{}'
...
  1. Get top N queries by latency in the last 1 minute
curl -X GET "localhost:9200/_insights/top_queries?type=latency&pretty"

returns

{
  "top_queries" : [
    {
      "timestamp" : 1706746069075,
      "phase_latency_map" : {
        "expand" : 0,
        "query" : 36,
        "fetch" : 2
      },
      "node_id" : "PsQkEubhT9S-ePsh906t-w",
      "total_shards" : 1,
      "search_type" : "query_then_fetch",
      "source" : "{\"size\":20,\"query\":{\"bool\":{\"must\":[{\"match_phrase\":{\"message\":{\"query\":\"document 2\",\"slop\":0,\"zero_terms_query\":\"NONE\",\"boost\":1.0}}},{\"match\":{\"user.id\":{\"query\":\"cyji\",\"operator\":\"OR\",\"prefix_length\":0,\"max_expansions\":50,\"fuzzy_transpositions\":true,\"lenient\":false,\"zero_terms_query\":\"NONE\",\"auto_generate_synonyms_phrase_query\":true,\"boost\":1.0}}}],\"adjust_pure_negative\":true,\"boost\":1.0}}}",
      "indices" : [
        "my-index-0"
      ],
      "latency" : 45
    },
    {
      "timestamp" : 1706746069271,
      "total_shards" : 1,
      "search_type" : "query_then_fetch",
      "source" : "{\"size\":20}",
      "phase_latency_map" : {
        "expand" : 0,
        "query" : 19,
        "fetch" : 0
      },
      "indices" : [
        "my-index-0"
      ],
      "node_id" : "IITrLUUXROCQehphz75Jsw",
      "latency" : 20
    },
    {
      "timestamp" : 1706746069135,
      "total_shards" : 1,
      "search_type" : "query_then_fetch",
      "source" : "{\"size\":20}",
      "phase_latency_map" : {
        "expand" : 0,
        "query" : 10,
        "fetch" : 2
      },
      "indices" : [
        "my-index-0"
      ],
      "node_id" : "IITrLUUXROCQehphz75Jsw",
      "latency" : 18
    },
    {
      "timestamp" : 1706746069351,
      "total_shards" : 1,
      "search_type" : "query_then_fetch",
      "source" : "{\"size\":20}",
      "phase_latency_map" : {
        "expand" : 0,
        "query" : 2,
        "fetch" : 1
      },
      "indices" : [
        "my-index-0"
      ],
      "node_id" : "_2E2035ZQvmEM9GMADl9Bw",
      "latency" : 9
    },
    {
      "timestamp" : 1706746069380,
      "total_shards" : 1,
      "search_type" : "query_then_fetch",
      "source" : "{\"size\":20}",
      "phase_latency_map" : {
        "expand" : 0,
        "query" : 5,
        "fetch" : 0
      },
      "indices" : [
        "my-index-0"
      ],
      "node_id" : "_2E2035ZQvmEM9GMADl9Bw",
      "latency" : 6
    }
  ]
}

Load Tests

~70 Load tests are performed using the nyc_taxis workload on different combinations of window sizes and top n values. No performance impact identified. Here are detailed benchmark results.

Feature off (Baseline)

Runs 50th percentile 90th percentile latency 99th percentile latency 100th percentile latency
1 (2110) 5.68534 6.27264 6.73714 8.4323
2 (bc5e) 5.36776 5.80834 6.50946 28.6724
3 (bcdb) 5.18429 5.60393 9.2652 30.4254
4 (d74f) 5.02313 5.74386 6.7693 9.38909
5 (9244) 5.10541 5.47246 7.84308 8.63438
6 (b1de) 5.14018 5.49457 6.75883 9.80746
7 (217e) 5.09886 5.56152 8.21575 18.4278
8 (57d3) 5.26441 5.83722 9.92809 15.4894
9 (78f3) 5.30425 5.76678 9.24641 30.7725
10 (a2d6) 5.30458 5.82973 8.86554 13.9551
Median 5.22435 5.75532 8.02942 14.72225
Mean 5.24782 5.73911 8.01388 17.40058
St dev 0.18888 0.23304 1.272 9.25685

n=10, window size = 10 minutes

Runs 50th percentile 90th percentile latency 99th percentile latency 100th percentile latency
1 (2110) 5.48015 5.93038 6.35181 8.1288
2 (bc5e) 5.12966 5.52368 6.05926 7.25804
3 (bcdb) 5.17215 5.66219 6.65964 7.45862
4 (d74f) 4.90608 5.57437 6.04869 7.64221
5 (9244) 5.49047 5.89037 6.41805 7.67218
6 (b1de) 5.06197 5.42041 6.89302 16.4436
7 (217e) 5.27588 5.63697 6.48004 8.57925
8 (57d3) 4.85925 5.20557 6.20037 8.81262
9 (78f3) 5.3572 5.81061 8.37155 14.7513
10 (a2d6) 5.53084 6.26242 8.44923 17.5828
Median 5.22401 5.64958 6.44905 8.35403
Mean 5.22636 5.6917 6.79317 10.43294
St dev 0.24109 0.29666 0.89066 4.10425

n=50, window size = 10 minutes

Runs 50th percentile 90th percentile latency 99th percentile latency 100th percentile latency
1 (2110) 5.3265 5.76359 6.50887 8.87379
2 (bc5e) 5.15597 5.76559 6.26947 7.86013
3 (bcdb) 5.58544 6.05801 9.65796 15.5788
4 (d74f) 5.00877 5.44595 6.14539 9.64201
5 (9244) 5.39437 5.7242 6.42808 9.3807
6 (b1de) 4.99536 5.23857 5.80347 8.70772
7 (217e) 5.26149 5.76495 9.92437 18.695
8 (57d3) 5.19225 5.59769 5.82104 6.70231
9 (78f3) 5.28367 5.7321 6.18476 8.3956
10 (a2d6) 5.38787 5.97278 6.90825 7.99288
Median 5.27258 5.74785 6.34878 8.79076
Mean 5.25917 5.70634 6.96517 10.18289
St dev 0.1807 0.23671 1.52503 3.82823

n=100, window size = 10 minutes

Runs 50th percentile 90th percentile latency 99th percentile latency 100th percentile latency
1 (2110) 5.42979 5.88745 8.21798 13.2944
2 (bc5e) 5.05335 6.07895 7.3364 9.17266
3 (bcdb) 5.41622 5.76504 6.77248 7.78023
4 (d74f) 4.79577 5.26539 5.69491 7.45989
5 (9244) 5.20676 5.57206 8.1644 18.6386
6 (b1de) 4.43616 5.02934 16.3021 18.167
7 (217e) 5.30738 5.75768 6.37761 8.36801
8 (57d3) 4.93365 5.59796 7.23298 30.6488
9 (78f3) 5.38238 5.8045 6.63352 9.21911
10 (a2d6) 5.314 5.84568 8.70836 28.0179
Median 5.25707 5.76136 7.28469 11.25676
Mean 5.12755 5.66041 8.14407 15.07666
St dev 0.3239 0.31059 3.01186 8.56873

n=10, window size = 60 minutes

Runs 50th percentile 90th percentile latency 99th percentile latency 100th percentile latency
1 (2110) 5.48994 5.82642 9.2272 14.1245
2 (bc5e) 5.10504 5.70799 7.26535 10.31651
3 (bcdb) 5.20354 5.79872 8.79294 16.5674
4 (d74f) 5.34996 5.96373 6.80488 10.5772
5 (9244) 5.05346 5.62933 6.0651 8.00097
6 (b1de) 4.92265 5.42617 5.97952 7.94773
7 (217e) 5.20606 5.68424 8.25867 12.8365
8 (57d3) 5.04032 5.7306 6.50287 9.94117
9 (78f3) 5.22669 5.77208 6.68624 8.57578
10 (a2d6) 5.3514 6.00356 6.46163 8.47628
Median 5.2048 5.75134 6.74556 10.12884
Mean 5.19491 5.75428 7.20444 10.7364
St dev 0.17091 0.16478 1.15473 2.90134

n=50, window size = 60 minutes

Runs 50th percentile 90th percentile latency 99th percentile latency 100th percentile latency
1 (2110) 5.66053 6.09658 10.03 16.7627
2 (bc5e) 5.17521 5.65421 8.03381 35.3631
3 (bcdb) 5.18126 5.70204 8.7235 31.0266
4 (d74f) 4.8835 5.13923 5.81303 9.60856
5 (9244) 5.38711 5.87948 6.43249 8.73293
6 (b1de) 4.73117 5.16525 5.63382 6.48858
7 (217e) 5.22232 5.66381 6.4244 6.79487
8 (57d3) 4.86784 5.3733 6.11748 8.90142
9 (78f3) 5.38244 5.89673 9.13197 28.2333
10 (a2d6) 5.53817 5.93063 8.72938 31.9991
Median 5.20179 5.68293 7.23315 13.18563
Mean 5.20296 5.65013 7.50699 18.39112
St dev 0.30303 0.3278 1.5949 11.8744

n=100, window size = 60 minutes

Runs 50th percentile 90th percentile latency 99th percentile latency 100th percentile latency
1 (2110) 5.61482 6.16978 9.96055 26.959
2 (bc5e) 4.80904 5.14862 6.18893 16.3675
3 (bcdb) 5.23928 5.77383 6.16525 8.2151
4 (d74f) 4.91612 5.31431 6.05033 9.32068
5 (9244) 5.43572 5.90742 8.29484 14.3912
6 (b1de) 4.8768 5.26618 5.81323 7.71826
7 (217e) 5.31596 5.80771 8.0518 33.2195
8 (57d3) 4.91712 5.31075 8.97082 28.1725
9 (78f3) 5.28335 5.62963 9.44923 12.0019
10 (a2d6) 5.48454 6.06844 6.63467 8.84661
Median 5.26132 5.70173 7.34324 13.19655
Mean 5.18928 5.63967 7.55797 16.52123
St dev 0.28816 0.36172 1.56758 9.4619

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Jan 17, 2024

Compatibility status:

Checks if related components are compatible with change c7c7ef6

Incompatible components

Incompatible components: [https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/performance-analyzer.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git]

Copy link
Contributor

❌ Gradle check result for 894452e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@ansjcy ansjcy force-pushed the top-n-queries-feature branch from 894452e to e08b250 Compare January 17, 2024 06:28
Copy link
Contributor

❌ Gradle check result for e08b250: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@ansjcy ansjcy force-pushed the top-n-queries-feature branch from e08b250 to e59b28c Compare January 18, 2024 01:48
@ansjcy ansjcy force-pushed the top-n-queries-feature branch 2 times, most recently from 0967071 to 1895133 Compare February 6, 2024 03:53
Copy link
Contributor

github-actions bot commented Feb 6, 2024

❕ Gradle check result for 0967071: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.cluster.coordination.AwarenessAttributeDecommissionIT.testConcurrentDecommissionAction

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link
Contributor

github-actions bot commented Feb 6, 2024

❌ Gradle check result for 1895133: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@ansjcy ansjcy force-pushed the top-n-queries-feature branch from 1895133 to c572870 Compare February 6, 2024 04:44
@ansjcy ansjcy force-pushed the top-n-queries-feature branch from c572870 to c7c7ef6 Compare February 6, 2024 04:48
Copy link
Contributor

github-actions bot commented Feb 6, 2024

❌ Gradle check result for c572870: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Feb 6, 2024

❕ Gradle check result for c7c7ef6: UNSTABLE

  • TEST FAILURES:
      3 org.opensearch.remotestore.RemoteIndexPrimaryRelocationIT.testPrimaryRelocationWhileIndexing
      1 org.opensearch.search.SearchWeightedRoutingIT.testStrictWeightedRoutingWithCustomString_FailOpenEnabled

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@msfroh msfroh merged commit 554cbf7 into opensearch-project:main Feb 6, 2024
30 checks passed
@msfroh msfroh added the backport 2.x Backport to 2.x branch label Feb 6, 2024
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-11904-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 554cbf7bfd83d94e1b1b69528ce0d128454754f4
# Push it to GitHub
git push --set-upstream origin backport/backport-11904-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-11904-to-2.x.

@ansjcy ansjcy added backport 2.x Backport to 2.x branch and removed backport 2.x Backport to 2.x branch backport-failed labels Feb 6, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 6, 2024
* Top N Queries by latency implementation
* Increase JavaDoc coverage and update PR based comments
* Refactor record and service to make them generic
* refactor service for improving multithreading efficiency
* rebase from master to pick up query insights plugin changes

---------

Signed-off-by: Chenyang Ji <[email protected]>
(cherry picked from commit 554cbf7)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
ansjcy added a commit to ansjcy/OpenSearch that referenced this pull request Feb 6, 2024
* Top N Queries by latency implementation
* Increase JavaDoc coverage and update PR based comments
* Refactor record and service to make them generic
* refactor service for improving multithreading efficiency
* rebase from master to pick up query insights plugin changes

---------

Signed-off-by: Chenyang Ji <[email protected]>
(cherry picked from commit 554cbf7)
peteralfonsi pushed a commit to peteralfonsi/OpenSearch that referenced this pull request Mar 1, 2024
* Top N Queries by latency implementation
* Increase JavaDoc coverage and update PR based comments
* Refactor record and service to make them generic
* refactor service for improving multithreading efficiency
* rebase from master to pick up query insights plugin changes

---------

Signed-off-by: Chenyang Ji <[email protected]>
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
* Top N Queries by latency implementation
* Increase JavaDoc coverage and update PR based comments
* Refactor record and service to make them generic
* refactor service for improving multithreading efficiency
* rebase from master to pick up query insights plugin changes

---------

Signed-off-by: Chenyang Ji <[email protected]>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
* Top N Queries by latency implementation
* Increase JavaDoc coverage and update PR based comments
* Refactor record and service to make them generic
* refactor service for improving multithreading efficiency
* rebase from master to pick up query insights plugin changes

---------

Signed-off-by: Chenyang Ji <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch Search:Query Insights v2.12.0 Issues and PRs related to version 2.12.0
Projects
Status: Done
Status: No status
Development

Successfully merging this pull request may close these issues.

6 participants