Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: What is the best approach to monitor children objects within their parent? #317

Open
kopf-archiver bot opened this issue Aug 18, 2020 · 6 comments
Labels
archive question Further information is requested

Comments

@kopf-archiver
Copy link

kopf-archiver bot commented Aug 18, 2020

An issue by xocoatzin at 2020-02-26 11:17:30+00:00
Original URL: zalando-incubator/kopf#317
 

Hi all,

I'm currently working on porting some code from metacontroller into Kopf.

Metacontroller, gives you the option to receive callbacks when the monitored object changes, or any of its children is updated.

For example, if I'm watching an object of kind multijob, which creates an arbitrary number of standard kubernetes jobs, I would receive a callback if the children fail, restart, succeed, etc, which I can use to update the status field in the parent.

I haven't been able to find a clean way to do the same thing in Kopf, other than adding separate listeners for both the parent/children, and within the children listeners update the parent CRD. Of course the example here is simplified, the actual application would have many dependencies and larger hierarchies of objects, and having this kind of inter-dependencies between listeners make them harder to maintain.

Is there any better way to do this? Or is there any feature in Kopf that would make the management of children easier/cleaner?

Thanks!


Commented by nolar at 2020-03-06 11:10:43+00:00
 

Related: #58 #264
See also: #264 (comment)

You are right, the only way is —as you said— "adding separate listeners for both the parent/children, and within the children listeners update the parent CRD".

Keep in mind, that Kopf keeps one and only one watch-query (an API request) per resource kind no matter how many handlers are there for that resource kind. So, there should be no problems with APIs.

There is no simpler (i.e. few-liner) solution at the moment.

A better solution is planned though — but rather later than sooner (because: priorities; and my regular employment takes time).

Under the hood, it will be working exactly the same way, just with better DSL for handlers. Some ideation was happening in this gist.


For all those looking for a solution/pattern and coming to this issue — here is an example, which we currently use for ourselves:

  • Label the children resources with name & namespace of the parent object (assuming they can be in different namespaces; if it is the same namespace by design, only the name is needed).

  • Watch for children resources that have this label (any value). In the watcher, get the name of the parent resource and patch its status field (e.g. status.subpods) with the status of the watched children resource (selected or agregated).

  • Back in the parent resource, react to changes in that status field, and do the calculation on all the children overall statuses.

A sample skeleton code:

# pip install kopf pykube-ng PyYAML
import kopf
import pykube
import yaml


class KopfExample(pykube.objects.NamespacedAPIObject):
    version = "zalando.org/v1"
    endpoint = "kopfexamples"
    kind = "KopfExample"


@kopf.on.create('zalando.org', 'v1', 'kopfexamples')
def spawn_children(name, **_):
    data = yaml.safe_load(f"""
        apiVersion: v1
        kind: Pod
        spec:
          containers:
          - name: the-only-one
            image: busybox
            command: ["sh", "-x", "-c", "sleep 1"]
    """)

    kopf.adopt(data)
    kopf.label(data, labels={'kex-parent-name': name})  # << HERE!

    api = pykube.HTTPClient(pykube.KubeConfig.from_env())
    for _ in range(5):
        pykube.Pod(api, data).create()


@kopf.on.event('', 'v1', 'pods', labels={'kex-parent-name': None})
def kexed_pod_monitoring(meta, name, namespace, status, **_):
    parent_name = meta['labels']['kex-parent-name']

    try:
        api = pykube.HTTPClient(pykube.KubeConfig.from_env())
        parent_kex = KopfExample.objects(api, namespace=namespace).get_by_name(parent_name)
        parent_kex.patch({'status': {'subpods': {name: status['phase']}}})  # << HERE
    except pykube.exceptions.ObjectDoesNotExist:
        pass


@kopf.on.field('zalando.org', 'v1', 'kopfexamples', field='status.subpods')
def kex_subpods_reaction(old, new, diff, **_):
    pass  # << HERE, decide something on ALL of them at once.
    msg = " // ".join([f"{pod_name} is {pod_phase}"
                       for pod_name, pod_phase in new.items()])
    print(f"==> {msg}")

Commented by xocoatzin at 2020-03-06 13:10:09+00:00
 

Thank you nolar for the detailed update. My current approach looks very similar to the example you provided. Will follow up closely on future releases.

@kopf-archiver kopf-archiver bot closed this as completed Aug 18, 2020
@kopf-archiver kopf-archiver bot changed the title [archival placeholder] Question: What is the best approach to monitor children objects within their parent? Aug 19, 2020
@kopf-archiver kopf-archiver bot added the question Further information is requested label Aug 19, 2020
@kopf-archiver kopf-archiver bot reopened this Aug 19, 2020
@ableuler
Copy link

Thanks @nolar for the detailed sample which I have successfully adapted and used. I'm wondering now if there is a way for this approach to be extended to situations where I don't know ahead of time of what kind the child resource will be? This would require for a generic on.event watcher watching all kinds of resources, but as far as I can tell this doesn't seem to be supported by Kopf. Am I missing an obvious solution here or is there some kind of workaround?

@nolar
Copy link
Owner

nolar commented Jun 14, 2021

@ableuler That answer was written in March 2020. A lot of new features have appeared since then.

I guess, if I would implement parent-children relations again, I would use in-memory indexing for that — instead of the .status.subpods field stored in the resource.

As for "all kinds of resources" — that feature was also added:

@kopf.on.event(kopf.EVERYTHING)
def fn(**_): ...

@kopf.on.event('example.com', kopf.EVERYTHING)  # all resources in a group
def fn(**_): ...

@kopf.on.event(category='all')  # the same as "kubectl get all" (except secrets, something else)
def fn(**_): ...

That also works for on-creation/update/deletion/indexing handlers, timers and daemons. Though, it might be not the best idea to do this on a live cluster without filters (e.g. by labels/annotations/when-callback), but it will work.

More info: https://kopf.readthedocs.io/en/stable/resources/

@ableuler
Copy link

Thank you very much for the pointer to kopf.EVERYTHING and sorry for missing that in the docs. For the moment, this (while filtering by label) solves my problem at hand. However, the in-memory indexing looks like a very nice feature that I'll happily take a look at when I might have a chance to refactor the custom object to child relation in my code.

@ableuler
Copy link

ableuler commented Oct 12, 2021

I have another question related to a parent-child relation as described above:

I use a @kopf.on.event('', 'v1', 'pods', labels={'parent-name': 'my-parent'}) type of decorator to watch events of child resources and update the parent (the actual custom resource object) accordingly. This works like a charm, until I stop the kopf-operator for a moment and restart it. In this scenario, events which have happened while the operator was down (such as the pod starting), are missed and the parent never gets updated. Based on the below comment from the docs, I would expect that on restart an initial listing of pod-events would still happen and thus trigger the corresponding handler.

Please note that the event handlers are invoked for every event received from the watching stream. This also includes the first-time listing when the operator starts or restarts.
It is the developer’s responsibility to make the handlers idempotent (re-executable with no duplicating side-effects).

What am I missing?

ps: in case you're interested what people are building based on your work, this is the project that I am using kopf for: https://github.com/SwissDataScienceCenter/amalthea

@ableuler
Copy link

What am I missing?

I managed to answer my own question. In my example I was only reacting to creation or modification events. However, the events that I get during the initial listing on operator restart come without and event type. Handling events without a type properly solved my problem.

@nolar
Copy link
Owner

nolar commented Oct 13, 2021

Yes, exactly. The event type is None for the initial listing (as "listing" is not a "watch-stream" in regular Kubernetes terms, but a Kopf-specific simulation or pseudo-streaming).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
archive question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants