Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When volcano-admission pod not running, create other pod can faild #3734

Open
lengrongfu opened this issue Sep 18, 2024 · 11 comments · May be fixed by #3748
Open

When volcano-admission pod not running, create other pod can faild #3734

lengrongfu opened this issue Sep 18, 2024 · 11 comments · May be fixed by #3748
Assignees
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.

Comments

@lengrongfu
Copy link
Contributor

Description

When volcano-admission pod crash, It will affect me creating other pods.

Steps to reproduce the issue

  1. install volcano use helm install
  2. scale volcano-admission replicas to 0, simulation volcano-admission pod crash
$ kubectl -n volcano scale deployment volcano-admission --replicas 0
  1. run a pod
$ kubectl run nginx --image=nginx

Describe the results you received and expected

received results

root@ubuntu:~# kubectl run nginx --image=nginx
Error from server (InternalError): Internal error occurred: failed calling webhook "mutatepod.volcano.sh": failed to call webhook: Post "https://volcano-admission-service.volcano.svc:443/pods/mutate?timeout=10s": no endpoints available for service "volcano-admission-service"

expected results: can create pod success.

What version of Volcano are you using?

1.9.0

Any other relevant information

No response

@lengrongfu lengrongfu added the kind/bug Categorizes issue or PR as related to a bug. label Sep 18, 2024
@lengrongfu
Copy link
Contributor Author

/assign

@lengrongfu
Copy link
Contributor Author

I have two ideas:

  1. Modify the failurePolicy field in the webhook
  2. Add a unique label to the pod, and then the webhook selects by label

@googs1025
Copy link
Member

i think the problem of pod creation failing after a webhook crash is a common problem with webhooks. If you want other pods in the cluster not to be affected, you can modify the failurePolicy field of the webhook.
refer to: https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/

@Monokaix
Copy link
Member

The first solution change failurePolicy to ignore is ok.

@lowang-bh
Copy link
Member

You can also disable the webhook don't need.
Just modify it in enabled_admissions: "/jobs/mutate,/jobs/validate,/podgroups/mutate,/pods/validate,/pods/mutate,/queues/mutate,/queues/validate"

@lengrongfu
Copy link
Contributor Author

I don't think the solution of configuring failurePolicy=Ignore is very good. I suggest that we can configure matchConditions.
I have verified that it works well.

  matchConditions:
  - expression: object.spec.schedulerName == 'volcano'
    name: scheduler  

@googs1025
Copy link
Member

I think the key question is whether we need to set this feature as the default configuration for helm installation. Is this what you mean?

@lengrongfu
Copy link
Contributor Author

I think the key question is whether we need to set this feature as the default configuration for helm installation. Is this what you mean?

yes.

@Monokaix
Copy link
Member

I don't think the solution of configuring failurePolicy=Ignore is very good. I suggest that we can configure matchConditions. I have verified that it works well.

  matchConditions:
  - expression: object.spec.schedulerName == 'volcano'
    name: scheduler  

It's ok to me, but change failurePolicy to Ignore is also needed: )

@Monokaix
Copy link
Member

/good-first-issue

@volcano-sh-bot
Copy link
Contributor

@Monokaix:
This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.

In response to this:

/good-first-issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@volcano-sh-bot volcano-sh-bot added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants