Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added search backpressure stats API #4932

Merged

Conversation

ketanv3
Copy link
Contributor

@ketanv3 ketanv3 commented Oct 26, 2022

Description

Added search backpressure stats to the existing node/stats API to describe:

  1. the number of cancellations (currently for SearchShardTask only)
  2. the current state of TaskResourceUsageTracker

Sample request: GET /_nodes/stats/search_backpressure?human

Sample response:

{
    "_nodes": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "cluster_name": "runTask",
    "nodes": {
        "T7aqO6zaQX-lt8XBWBYLsA": {
            "timestamp": 1667409521070,
            "name": "runTask-0",
            "transport_address": "127.0.0.1:9300",
            "host": "127.0.0.1",
            "ip": "127.0.0.1:9300",
            "roles": [
                "cluster_manager",
                "data",
                "ingest",
                "remote_cluster_client"
            ],
            "attributes": {
                "testattr": "test",
                "shard_indexing_pressure_enabled": "true"
            },
            "search_backpressure": {
                "search_shard_task": {
                    "resource_tracker_stats": {
                        "heap_usage_tracker": {
                            "cancellation_count": 34,
                            "current_max": "1.1mb",
                            "current_max_bytes": 1203272,
                            "current_avg": "683.8kb",
                            "current_avg_bytes": 700267,
                            "rolling_avg": "1.1mb",
                            "rolling_avg_bytes": 1156270
                        },
                        "cpu_usage_tracker": {
                            "cancellation_count": 318,
                            "current_max": "731.3ms",
                            "current_max_millis": 731,
                            "current_avg": "303.6ms",
                            "current_avg_millis": 303
                        },
                        "elapsed_time_tracker": {
                            "cancellation_count": 310,
                            "current_max": "1.3s",
                            "current_max_millis": 1305,
                            "current_avg": "649.3ms",
                            "current_avg_millis": 649
                        }
                    },
                    "cancellation_stats": {
                        "cancellation_count": 318,
                        "cancellation_limit_reached_count": 97,
                        "last_cancelled_task": {
                            "cpu_usage": "759.8ms",
                            "cpu_usage_millis": 759,
                            "heap_usage": "1.1mb",
                            "heap_usage_bytes": 1211240,
                            "elapsed_time": "1.2s",
                            "elapsed_time_millis": 1207
                        }
                    }
                },
                "mode": "enforced"
            }
        }
    }
}

Issues Resolved

#1181

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ketan Verma [email protected]

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@ketanv3 ketanv3 marked this pull request as ready for review October 26, 2022 09:53
@ketanv3 ketanv3 requested review from a team and reta as code owners October 26, 2022 09:53
Copy link

@nssuresh2007 nssuresh2007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ketanv3 ketanv3 force-pushed the feature/inflight-cancellation-stats branch from e8e5118 to 89549f1 Compare October 31, 2022 17:20
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@ketanv3 ketanv3 force-pushed the feature/inflight-cancellation-stats branch from 89549f1 to 46d138d Compare October 31, 2022 18:34
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@ketanv3 ketanv3 force-pushed the feature/inflight-cancellation-stats branch from 46d138d to b17b52b Compare November 2, 2022 09:09
@github-actions
Copy link
Contributor

github-actions bot commented Nov 2, 2022

Gradle Check (Jenkins) Run Completed with:

@ketanv3 ketanv3 force-pushed the feature/inflight-cancellation-stats branch from b17b52b to cc2e682 Compare November 2, 2022 09:43
@github-actions
Copy link
Contributor

github-actions bot commented Nov 2, 2022

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Nov 2, 2022

Gradle Check (Jenkins) Run Completed with:

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"cancellation_stats": {
                        "cancellation_count": 0,
                        "cancellation_limit_reached_count": 0,
                        "last_cancelled_task": null
                    }

Can we have individual trackers have this info?

@ketanv3
Copy link
Contributor Author

ketanv3 commented Nov 2, 2022

Can we have individual trackers have this info?

These cancellation stats are at a shard level. Only having them at a tracker level won't be very meaningful as tasks may get cancelled for multiple reasons, essentially double-counting the cancellation count.

cancellation_limit_reached_count is not influenced by trackers, so it's better to keep it separate.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 2, 2022

Gradle Check (Jenkins) Run Completed with:

Added search backpressure stats to the existing node/stats API to describe:
1. the number of cancellations (currently for SearchShardTask only)
2. the current state of TaskResourceUsageTracker

Signed-off-by: Ketan Verma <[email protected]>
@ketanv3 ketanv3 force-pushed the feature/inflight-cancellation-stats branch from 66791e1 to 7602cac Compare November 3, 2022 07:15
@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2022

Gradle Check (Jenkins) Run Completed with:

@ketanv3 ketanv3 force-pushed the feature/inflight-cancellation-stats branch from 7602cac to 2f73821 Compare November 3, 2022 08:04
@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2022

Gradle Check (Jenkins) Run Completed with:

@ketanv3
Copy link
Contributor Author

ketanv3 commented Nov 3, 2022

Note on Gradle checks:

The ./gradlew ':qa:mixed-cluster:v2.5.0#mixedClusterTest' task is expected to fail as backward-compatibility checks have been lowered from 3.0.0 to 2.4.0 in this PR. This will be resolved once the backport PR (#5039) also gets merged.

Have separately verified with 3.0.0 that everything passes.

./gradlew check
...
BUILD SUCCESSFUL in 20m 29s
2557 actionable tasks: 584 executed, 1973 up-to-date

Copy link
Member

@psychbot psychbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@Bukhtawar
Copy link
Collaborator

Overriding merge due to Jenkins failure opensearch-project/opensearch-ci#222

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants