Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: batch window suppression #829

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

paxbit
Copy link

@paxbit paxbit commented Sep 10, 2021

Hi @nolar,

it's been a while since #729. Back then I monkey patched the topics discussed in the issue and went with that up till now.

I'd like to split this in two parts of which this PR is the first. It disables the batch_window by default and allows to return to the previous behaviour on-demand. I believe the idea behind batch_window is viable in some cases but incorrect to assume for all operator by default. The doc string tries to explain the situation.

It would be great if you could have a look at it.

Thanks!

@paxbit paxbit requested a review from nolar as a code owner September 10, 2021 11:38
@paxbit paxbit force-pushed the feature/batch-window-suppression branch from a54f7cc to 66dc4af Compare September 15, 2021 20:57
@nolar
Copy link
Owner

nolar commented Sep 19, 2021

Thanks for this PR and for your time spent on it. However, I think, I will not merge it — but for a good cause:

While playing with two separate changes in separate branches, and after digging into the history of this "batching" solution, I have found that it was implemented for the only purpose — ensuring the consistency with the resource recently patched by Kopf itself. And this consistency was based on the assumption that the patched versions arrives into the queue fast enough. The assumption is wrong. It can take longer than 0.1 seconds to get the patched version.

Instead, resourceVersion should be used to ensure consistency (as got from the PATCH operation). I have this change already implemented, and now only struggle with auto-testing this case (asyncio tests are hard).

Once the resourceVersion-based consistency is ensured, the "batching" is not needed at all: it is not a feature of Kopf, it was a hack in the first place. And so, I am going to remove it completely.

However, with no "batching" in place, the question remains how to process the quickly arriving changes of a resource (as mentioned in your PR's comments: CREATED > MODIFIED > DELETED as an example). I think the best way would be to handle them all, i.e. all the changes — even if it causes some extra CPU load. Batching was not a way of reducing the load — as implied in this quote — the load reduction was a side effect:

A lossy but in some cases viable optimization mechanism to take some load off of event processing.


This is the 1st change of the mentioned two. The 2nd change is processing ALL incoming low-level events, including those considered "inconsistent" during those short time windows when the patch is performed but the patched version is not yet received. They must be excluded both from batching and from consistency logic. This requires some more sophisticated split of event- & change-processing — which I also have drafted, but not properly stabilised. It can take some time for me to finish those 2 PRs.

Meanwhile, regarding this PR and to ensure that you are not blocked: can you achieve this same effect by setting the batch window to 0? As far as I can see into the internals of asyncio.wait_for(), it treats non-None timeouts which are <=0 by timing out instantly. And so, the batching would not happen. It does not solve all the problems, but at least it should be equivalent to this PR's effects: the if condition is not needed in that case. Or do I miss something?

@paxbit
Copy link
Author

paxbit commented Sep 25, 2021

Hi @nolar,

Thanks for having a look! I think it is a good idea to get rid of batching altogether and I now understand a bit better what it was meant to solve in the first place.

Regarding setting the batching window to 0, I think I'll leave the monkey patches in place which did solve both issues fine for the last months. The patches are twofold.

  1. Patch the if batching_window is not None into worker
  2. Use the getters/setters of progress and diffbase to manage a simple cache for patches which have not made the full roundtrip so far. You can find a WIP of my attempt to translate the monkey patch into the fork here: https://github.com/paxbit/kopf/commits/feature/caching-state-storage/kopf/_cogs/configs

Both patches did solve our issues for the last months. You wrote above you have sth. in mind for the 2nd part. Can you make use of the caching idea or is it something you would rather disregard?

Looking forward to the release integrating both topics :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants