Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] _nodes/stats/os support for cgroup v2 #12945

Open
jondryk opened this issue Mar 27, 2024 · 2 comments
Open

[Feature] _nodes/stats/os support for cgroup v2 #12945

jondryk opened this issue Mar 27, 2024 · 2 comments
Labels
bug Something isn't working Cluster Manager Other Telemetry:Metrics PRs or issues specific to telemetry metrics framework v2.19.0 Issues and PRs related to version 2.19.0

Comments

@jondryk
Copy link

jondryk commented Mar 27, 2024

Describe the bug

After upgrading to GKE version 1.26 (view changes), cgroup v2 has become the default, resulting in _nodes/stats/os not returning cgroup metrics.

Related component

Other

To Reproduce

  1. Configure k8s cluster with cgroup v2 enabled
  2. Spin up opensearch-k8s-operator and OpenSearch cluster by using guide (link)
  3. Get information from _node/stats API, by using OpenSearch Dashboard (Dev Tools): GET _node/stats/os
  4. Get response without cgroup section:
{
  "_nodes": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "cluster_name": "quickstart",
  "nodes": {
    "cZANEenxTXms8yIFkb9_fQ": {
      "timestamp": 1711552940692,
      "name": "quickstart",
      "transport_address": "10.201.8.2:9300",
      "host": "quickstart",
      "ip": "10.201.8.2:9300",
      "roles": [
        "cluster_manager"
      ],
      "attributes": {
        "shard_indexing_pressure_enabled": "true"
      },
      "os": {
        "timestamp": 1711552940692,
        "cpu": {
          "percent": 36,
          "load_average": {
            "1m": 1.28,
            "5m": 1.17,
            "15m": 1.08
          }
        },
        "mem": {
          "total_in_bytes": 2147483648,
          "free_in_bytes": 1716224,
          "used_in_bytes": 2145767424,
          "free_percent": 0,
          "used_percent": 100
        },
        "swap": {
          "total_in_bytes": 0,
          "free_in_bytes": 0,
          "used_in_bytes": 0
        }
      }
    },

Expected behavior

Should retrieve response with cgroup (documentation) from GET _node/stats/os. Example response:

{
  "_nodes": {
    "total": 6,
    "successful": 6,
    "failed": 0
  },
  "cluster_name": "quickstart",
  "nodes": {
    "NBKECfQPSaSUXrVzUx_ykw": {
      "timestamp": 1711558029057,
      "name": "quickstart",
      "transport_address": "10.241.8.11:9300",
      "host": "10.241.8.11",
      "ip": "10.241.8.11:9300",
      "roles": [
        "master"
      ],
      "os": {
        "timestamp": 1711558029057,
        "cpu": {
          "percent": 5,
          "load_average": {
            "1m": 0.83,
            "5m": 0.76,
            "15m": 0.7
          }
        },
        "mem": {
          "total_in_bytes": 4294967296,
          "adjusted_total_in_bytes": 4294967296,
          "free_in_bytes": 1336111104,
          "used_in_bytes": 2958856192,
          "free_percent": 31,
          "used_percent": 69
        },
        "swap": {
          "total_in_bytes": 0,
          "free_in_bytes": 0,
          "used_in_bytes": 0
        },
        "cgroup": {
          "cpuacct": {
            "control_group": "/",
            "usage_nanos": 255341381158
          },
          "cpu": {
            "control_group": "/",
            "cfs_period_micros": 100000,
            "cfs_quota_micros": 200000,
            "stat": {
              "number_of_elapsed_periods": 28171495,
              "number_of_times_throttled": 433,
              "time_throttled_nanos": 12260969
            }
          },
          "memory": {
            "control_group": "/",
            "limit_in_bytes": "4294967296",
            "usage_in_bytes": "2958856192"
          }
        }
      }
    }
  }
}

Additional Details

Plugins
prometheus-exporter-plugin-for-opensearch

Host/Environment (please complete the following information):
OS:

  • GKE version: 1.27
  • node version: 1.27.11-gke.1062000
  • image type: UBUNTU_CONTAINERD

OpenSearch:

  • opensearch-k8s-operator version: 2.4.0
  • opensearch-cluster version: 2.11.1

Additional context

@jondryk jondryk added bug Something isn't working untriaged labels Mar 27, 2024
@github-actions github-actions bot added the Other label Mar 27, 2024
@jondryk jondryk changed the title [BUG] OpenSearch k8s endpoint "_nodes/stats/os" don't return cgroup metrics when cgroup v2 is used [BUG] OpenSearch endpoint "_nodes/stats/os" don't return cgroup metrics when cgroup v2 is used in k8s Mar 28, 2024
@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6]
@jondryk We don't believe that CGroupV2 is supported - we love to see a pull request to add this support

@peternied peternied changed the title [BUG] OpenSearch endpoint "_nodes/stats/os" don't return cgroup metrics when cgroup v2 is used in k8s [Feature] _nodes/stats/os support for CGroup v2 Apr 10, 2024
@peternied peternied changed the title [Feature] _nodes/stats/os support for CGroup v2 [Feature] _nodes/stats/os support for cgroup v2 Apr 10, 2024
@peternied peternied added Telemetry:Metrics PRs or issues specific to telemetry metrics framework Cluster Manager labels Apr 10, 2024
@peternied
Copy link
Member

peternied commented Apr 10, 2024

[Triage - attendees 1 2 3 4 5 6]

@jondryk Note; when making a pull request please do not use or reference code that is not Apache 2.0 license (or similar license)

@rajiv-kv rajiv-kv moved this from 🆕 New to Next (Next Quarter) in Cluster Manager Project Board Oct 24, 2024
@reta reta added the v2.19.0 Issues and PRs related to version 2.19.0 label Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Cluster Manager Other Telemetry:Metrics PRs or issues specific to telemetry metrics framework v2.19.0 Issues and PRs related to version 2.19.0
Projects
Status: Next (Next Quarter)
Development

No branches or pull requests

3 participants