Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add operator to manage any resource created and deleted #1126

Open
wzshiming opened this issue May 29, 2024 · 14 comments
Open

Add operator to manage any resource created and deleted #1126

wzshiming opened this issue May 29, 2024 · 14 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@wzshiming
Copy link
Member

wzshiming commented May 29, 2024

What would you like to be added?

apiVersion: resource.operator.kwok.x-k8s.io/v1alpha1
kind: Resource
metadata:
  name: node-2c4g
spec:
  templateName: node
  replicas: 10
  parameters:
    allocatable:
      cpu: 2
      memory: 4Gi
---
apiVersion: resource.operator.kwok.x-k8s.io/v1alpha1
kind: Resource
metadata:
  name: node-4c8g
spec:
  templateName: node
  replicas: 10
  parameters:
    allocatable:
      cpu: 4
      memory: 8Gi
---
apiVersion: resource.operator.kwok.x-k8s.io/v1alpha1
kind: ResourceTemplate
metadata:
  name: node
spec:
  parameters:
    podCIDR: "10.0.0.1/24"
    allocatable:
      cpu: 32
      memory: 256Gi
      pods: 110
    capacity: {}
    nodeInfo:
      architecture: amd64
      operatingSystem: linux
  template: |-
    kind: Node
    apiVersion: v1
    metadata:
      name: {{ Name }}
      annotations:
        kwok.x-k8s.io/node: fake
        node.alpha.kubernetes.io/ttl: "0"
        metrics.k8s.io/resource-metrics-path: "/metrics/nodes/{{ Name }}/metrics/resource"
      labels:
        beta.kubernetes.io/arch: {{ .nodeInfo.architecture }}
        beta.kubernetes.io/os: {{ .nodeInfo.operatingSystem }}
        kubernetes.io/arch: {{ .nodeInfo.architecture }}
        kubernetes.io/hostname: {{ Name }}
        kubernetes.io/os: {{ .nodeInfo.operatingSystem }}
        kubernetes.io/role: agent
        node-role.kubernetes.io/agent: ""
        type: kwok
    spec:
      podCIDR: {{ AddCIDR .podCIDR Index }}
    status:
      allocatable:
      {{ range $key, $value := .allocatable }}
        {{ $key }}: {{ $value }}
      {{ end }}
      {{ $capacity := .capacity }}
      capacity:
      {{ range $key, $value := .allocatable }}
        {{ $key }}: {{ or ( index $capacity $key ) $value }}
      {{ end }}
      nodeInfo:
      {{ range $key, $value := .nodeInfo }}
        {{ $key }}: {{ $value }}
      {{ end }}

Why is this needed?

https://kubernetes.slack.com/archives/C04RG2YSK16/p1716989513415299?thread_ts=1716796734.402529&cid=C04RG2YSK16

@wzshiming wzshiming added the kind/feature Categorizes issue or PR as related to a new feature. label May 29, 2024
@dormullor
Copy link

dormullor commented May 29, 2024

@wzshiming What do you think about the below API ?

apiVersion: kwok.sigs.k8s.io/v1beta1
kind: NodePool
metadata:
  name: nodepool-sample
spec:
  nodeCount: 3
  nodeTemplate:
    apiVersion: v1
    metadata:
      annotations:
        node.alpha.kubernetes.io/ttl: "0"
      labels:
        kubernetes.io/role: agent
        nvidia.com/gpu.deploy.device-plugin: "true"
        nvidia.com/gpu.deploy.dcgm-exporter: "true"
        type: kwok
    spec: {}
    status:
      allocatable:
        cpu: 32
        memory: 256Gi
        pods: 110
      capacity:
        cpu: 32
        memory: 256Gi
        pods: 110
      nodeInfo:
        architecture: amd64
        bootID: ""
        containerRuntimeVersion: ""
        kernelVersion: ""
        kubeProxyVersion: fake
        kubeletVersion: fake
        machineID: ""
        operatingSystem: linux
        osImage: ""
        systemUUID: ""
      phase: Running

The kwok.x-k8s.io/node: fake annotation and the node taint are automatically added to all nodes.

Simple struct by using the kubernetes corev1 Node object

// NodePoolSpec defines the desired state of NodePool
type NodePoolSpec struct {
	NodeCount    int32       `json:"nodeCount"`
	NodeTemplate corev1.Node `json:"nodeTemplate"`
}

@wzshiming
Copy link
Member Author

This is fine as a first version of the API, but if we have a requirement for a DeploymentPool, or any custom resource pool, do we still need to implement it?

@dormullor
Copy link

Can you explain what is a DeploymentPool ? i thought kowk purpose is "fake" nodes

@wzshiming
Copy link
Member Author

wzshiming commented May 29, 2024

https://kwok.sigs.k8s.io/#what-is-kwok

kwok is the cornerstone of this project, responsible for simulating the lifecycle of fake nodes, pods, and other Kubernetes API resources.

The other Kubernetes API resources is also in the goal

Now it's support the simulation of other resources, such as Kubevirt's VMI. so there may be a need for VMI pools in the future

@dormullor
Copy link

I see, i think that creating a dedicated API for each resource will benefit if different logic for different resources is needed ( e.g nodepool/ kubevirtPool , etc)
Also IMHO, working with templates that are originally strings that converted into structs are very error prone.
WDYT ?

@wzshiming
Copy link
Member Author

wzshiming commented May 29, 2024

Also IMHO, working with templates that are originally strings that converted into structs are very error prone.

I'm with you on that one.

In fact, current node and pod simulations are already using templates, then we'll do chaos simulation use it.

As a result, a validation tool was recently added to this batch of simulation stage templates in CI to ensure that the templates are properly

@dormullor
Copy link

dormullor commented May 29, 2024

How can you prevent from a user to apply a malformed template for node ?

@wzshiming
Copy link
Member Author

There is no way, we try not to let the user touch the template, if the user modifies the template, they will be responsible for it.

You see my definition above, in fact the user only needs to modify the Resource, the ResourceTemplate is the default for the user, that is enough!

@dormullor
Copy link

Got it, thanks for the info.
The way i see it, we can create an API for each type of resource KWOK support & create a generic one that can operate on a CR like you did:

  template: |-
    kind: ANY TYPE OF RESOURCE

How does that sound?

@wzshiming
Copy link
Member Author

wzshiming commented May 29, 2024

we can create an API for each type of resource KWOK support

All resources are now supported on kwok, but we should only need to support node and pod for this.

create a generic one that can operate on a CR like you did:

I don't quite understand this. you mean create a generic Template CR?

@dormullor
Copy link

If it's needed, we can create an API for the operator to get any kind of resource as a string, and apply it to kubernetes. Same way you did with the template

@wzshiming
Copy link
Member Author

Got it

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 28, 2024
@wzshiming
Copy link
Member Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
Status: 🆕 New
Development

No branches or pull requests

4 participants