Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the cgroup 2 process attaching problem #677

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kbfu
Copy link

@kbfu kbfu commented Dec 16, 2023

What this PR does / why we need it:
Fix the problem when attaching the process to another cgroup when using cgroup 2.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #
Fixed this issue.
litmuschaos/litmus#3902

Special notes for your reviewer:

Checklist:

  • Fixes #litmus/issues/3902
  • PR messages has document related information
  • Labelled this PR & related issue with breaking-changes tag
  • PR messages has breaking changes related information
  • Labelled this PR & related issue with requires-upgrade tag
  • PR messages has upgrade related information
  • Commit has unit tests
  • Commit has integration tests
  • E2E run Required for the changes

@kbfu kbfu changed the title fix the cgroup 2 problem Fix the cgroup 2 process attaching problem Dec 16, 2023
@uditgaurav
Copy link
Member

Hello @kbfu, thank you for your contribution through the pull request.

I would like to inquire about the specific cluster environment and container runtime where you have conducted your tests. We encountered an issue when running it on a GKE cluster with containerd and cgroupv2. Here's the error we observed:

could not get cgroup manager --- at /litmus-go/chaoslib/litmus/stress-chaos/helper/stress-helper.go:134 (prepareStressChaos) --- Caused by: Error in getting groupPath,nsenter: unrecognized option: C BusyBox v1.35.0 (2022-08-01 15:14:44 UTC) multi-call binary. Usage: nsenter [OPTIONS] [PROG ARGS] -t PID Target process to get namespaces from -m[FILE] Enter mount namespace -u[FILE] Enter UTS namespace (hostname etc) -i[FILE] Enter System V IPC namespace -n[FILE] Enter network namespace -p[FILE] Enter pid namespace -U[FILE] Enter user namespace -S UID Set uid in entered namespace -G GID Set gid in entered namespace --preserve-credentials Don't touch uids or gids -r[DIR] Set root directory -w[DIR] Set working directory -F Don't fork before exec'ing PROG

@kbfu
Copy link
Author

kbfu commented Jan 4, 2024

Hi @uditgaurav , I rebuilt the image and replaced the base image from alpine to debian. I believe nsenter command from busybox was outdated.
This is the version I am using now.
nsenter from util-linux 2.38.1

@uditgaurav
Copy link
Member

uditgaurav commented Jan 5, 2024

@kbfu, Thanks for your response. I'm wondering if we can integrate this capability within the Alpine-based image itself, as this would help in maintaining a smaller image size.

The corresponding version of util-linux package in Alpine is 2.38-r1.

For your reference, the experimental Dockerfile is located here - litmus-go Dockerfile. It uses the base image litmuschaos/experiment-alpine, sourced from this Dockerfile. Perhaps we can consider adding the required functionality in this Dockerfile.

@uditgaurav
Copy link
Member

uditgaurav commented Jan 5, 2024

Hi @kbfu, I've created a test experiment image using the same Alpine-based image, which includes package util-linux 2.38-r1. The image can be found at docker.io/uditgaurav/go-runner:stress.

Current Output:

/ # nsenter --version
nsenter from util-linux 2.38

Previous Output:

~ $ nsenter --version
nsenter: unrecognized option: version
BusyBox v1.35.0 (2022-08-01 15:14:44 UTC) multi-call binary.

It works well with containerd + cgroupv2 🙌. Moving forward, I plan to conduct further tests under these scenarios:

  • containerd + cgroupv1
  • docker + cgroupv2
  • docker + cgroupv1
  • Recursive experiment execution in both parallel and serial formats

Tagging @ispeakc0de, for any suggestions for additional tests or use cases for nsenter.

Copy link
Member

@uditgaurav uditgaurav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🙌
Note: We need to update the base image here with updated util-linux (using apk add --no-cache util-linux=2.38-r1).

Comment on lines +502 to +511
//groupPath, err := cgroupsv2.PidGroupPath(t.Pid)
//if err != nil {
// return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("fail to get pid group path: %s", err.Error())}
//}
//
//cgroup2, err := cgroupsv2.LoadManager("/sys/fs/cgroup", groupPath)
//if err != nil {
// return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("fail to load the cgroup: %s", err.Error())}
//}
//return cgroup2, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the commented code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants