Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconciliation takes too long to execute #1097

Open
skhalash opened this issue May 21, 2024 · 5 comments
Open

Reconciliation takes too long to execute #1097

skhalash opened this issue May 21, 2024 · 5 comments
Labels
area/logs LogPipeline area/manager Manager or module changes area/metrics MetricPipeline area/traces TracePipeline kind/bug Categorizes issue or PR as related to a bug.

Comments

@skhalash
Copy link
Collaborator

Description

Fixing the managed Kyma dashboards exposed an issue with the CR reconciliation duration across all three pipeline types and the Telemetry CR. The median reconciliation duration for the pipelines is approximately 1 second, with the 99th percentile reaching around 4 seconds for long-running pipelines that were deployed months ago. Ideally, after an initial deployment each reconciliation should be a no-op since there have been no changes. The Telemetry CR fares slightly better, but its reconciliation duration is still within the same order of magnitude.

What can cause the problem?

  1. Client cache configuration contains a list of concrete GVKs to be cached. However, this list has not been maintained for a while. That's why it does not contain all GVKs deployed by different operator controllers (e.g. Fleunt Bit, OTel Collector and Self-Monior resources). We could instead use the DefaultNamespace cache option and automatically cache everything in the kyma-system namespace.
  2. There is a hypothesis that CreateOrUpdate utils have never actually worked and always perform an API call instead of checking a diff and returning early.

Expected result

A no-op reconciliation should not take that long

Actual result

A no-op reconciliation takes seconds

Steps to reproduce

Troubleshooting

Release Notes


@skhalash skhalash added area/logs LogPipeline area/metrics MetricPipeline area/traces TracePipeline area/manager Manager or module changes kind/bug Categorizes issue or PR as related to a bug. labels May 21, 2024
@skhalash
Copy link
Collaborator Author

Here people stumble upon the same problem with comparing resources kubernetes-sigs/kubebuilder#592

Copy link

This issue has been automatically marked as stale due to the lack of recent activity. It will soon be closed if no further activity occurs.
Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2024
Copy link

This issue has been automatically closed due to the lack of recent activity.
/lifecycle rotten

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 28, 2024
@kyma-bot kyma-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 28, 2024
@skhalash skhalash reopened this Jul 28, 2024
@skhalash skhalash removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 28, 2024
Copy link

github-actions bot commented Oct 6, 2024

This issue has been automatically marked as stale due to the lack of recent activity. It will soon be closed if no further activity occurs.
Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 6, 2024
Copy link

This issue has been automatically closed due to the lack of recent activity.
/lifecycle rotten

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 13, 2024
@kyma-bot kyma-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 13, 2024
@a-thaler a-thaler reopened this Oct 14, 2024
@a-thaler a-thaler removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logs LogPipeline area/manager Manager or module changes area/metrics MetricPipeline area/traces TracePipeline kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants