Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reclaim action's evict can not be canceled #3673

Open
lowang-bh opened this issue Aug 13, 2024 · 3 comments
Open

reclaim action's evict can not be canceled #3673

lowang-bh opened this issue Aug 13, 2024 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@lowang-bh
Copy link
Member

lowang-bh commented Aug 13, 2024

Description

Reclaim action use the ssn.Evict, which directly evict pod and can not be caceled when eviction is not helpful.

image image

Steps to reproduce the issue

cluster with cpu = 3 and scheduler configmap is

apiVersion: v1
data:
  volcano-scheduler.conf: |
    actions: "enqueue, allocate, reclaim"
    tiers:
    - plugins:
      - name: priority
      - name: gang
        enablePreemptable: false
      - name: conformance
    - plugins:
      - name: overcommit
      - name: drf
        enablePreemptable: false
      - name: predicates
      - name: proportion
      - name: nodeorder
      - name: binpack
kind: ConfigMap
  1. kubectl apply -f job-a.yaml
kind: Job
metadata:
  name: job-a
spec:
  backoffLimit: 3
  completions: 3
  parallelism: 3
  template:
    metadata:
      annotations:
        scheduling.k8s.io/group-name: job-a-pg
        volcano.sh/preemptable: "true"
    spec:
      containers:
      - image: nginx:1.14.2
        imagePullPolicy: IfNotPresent
        name: nginx
        ports:
          - containerPort: 80
        resources:
          requests:
            cpu: 1000m
            memory: 200Mi
          limits:
            cpu: 1000m
            memory: 200Mi
      restartPolicy: Never
      terminationGracePeriodSeconds: 1
      schedulerName: volcano
---
apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  annotations:
    scheduling.k8s.io/reclaimable: "true"
  name: job-a-pg
  namespace: default
spec:
  minMember: 1
  queue: queue-a
  1. kubectl apply -f job-a.yaml
    ➜ reclaim git:(master) ✗ cat job-b.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: job-b
spec:
  backoffLimit: 2
  completions: 2
  parallelism: 2
  template:
    metadata:
      annotations:
        scheduling.k8s.io/group-name: job-b-pg
        volcano.sh/preemptable: "true"
    spec:
      containers:
      - image: nginx:1.14.2
        imagePullPolicy: IfNotPresent
        name: nginx
        ports:
          - containerPort: 80
        resources:
          requests:
            cpu: 2000m
            memory: 200Mi
          limits:
            cpu: 2000m
            memory: 200Mi
      restartPolicy: Never
      terminationGracePeriodSeconds: 1
      schedulerName: volcano
---
apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  annotations:
    scheduling.k8s.io/reclaimable: "true"
  name: job-b-pg
  namespace: default
spec:
  minMember: 2
  queue: queue-b

Describe the results you received and expected

Reclaim action should use the Statement.Evict and can be caceled with Statement.Discard

What version of Volcano are you using?

master

Any other relevant information

No response

@lowang-bh lowang-bh added the kind/bug Categorizes issue or PR as related to a bug. label Aug 13, 2024
@Monokaix
Copy link
Member

You mean the gang can't be met? I think the gang plugin should returen no victims if it's not met.

@lowang-bh
Copy link
Member Author

We need to use statement.Evict instead of ssn.Evict. ssn package doesn't support transaction。

@lowang-bh
Copy link
Member Author

I think the gang plugin should returen no victims if it's not met.

That's victim job will return nil if its gang cannot be met. The reclaimor job's gang is not met and should revert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants