[SURE-8794] Deploying ClusterGroup from GitRepo results in loop #2859

p-se · 2024-09-17T09:27:17Z

Deploying a ClusterGroup from a GitRepo which also contains accompanying other GitRepo resources that use those newly created ClusterGroups result in a loop.

This loop triggers ClusterGroups and appends a message to it that endlessly grows, until the limit of etcd is hit. In which case Fleet is supposedly blocked.

The issue can be reproduced by adding this GitRepo resource to the cluster. The issue was reproducible on the latest Fleet development version at the time and did not require a Rancher installation to reproduce. The cluster was prepared using dev/setup-multi-cluster.

The text was updated successfully, but these errors were encountered:

Prevents fleet from crashing due to resources exceeding etcd's configured size limit. Deduplicate messages should only be necessary for edge cases which are not officially supported by fleet but result in ever increasing message sizes. This is due to the messages being copied from one resource to another and back again. Every resource adds its status to the message. This only happens if a cluster group is deployed by a GitRepo, which results in a bundle containing a cluster group. This bundle can only become ready if the cluster group is ready, but if the cluster group points to the cluster of the bundle, this cannot ever happen. The user is expected to fix this situation but deduplicating the messages prevents the message from growing up to the point where etcd's limit is reached and fleet crashes. Deduplicating the messages also has the effect of not changing the status of resources frequently, which results in less controllers being triggered.

Prevents fleet from crashing due to resources exceeding etcd's configured size limit. Deduplicate messages should only be necessary for edge cases which are not officially supported by fleet but result in ever increasing message sizes. This is due to the messages being copied from one resource to another and back again. Every resource adds its status to the message. This only happens if a cluster group is deployed by a GitRepo, which results in a bundle containing a cluster group. This bundle can only become ready if the cluster group is ready, but if the cluster group points to the cluster of the bundle that deployed the cluster group, this cannot ever happen. The user is expected to fix this situation but deduplicating the messages prevents the message from growing up to the point where etcd's limit is reached and fleet crashes. Deduplicating the messages also has the effect of not changing the status of resources frequently, which results in less controllers being triggered.

p-se · 2024-12-03T12:29:54Z

/backport v2.10.1

mmartin24 · 2024-12-04T07:02:31Z

I tested this in v2.10-d8667221a2eec48d4350d00a9d39aee54f00f810-head with fleet fleet:105.0.1+up0.11.1.

I saw a significant improvement against 2.8.5 where the logs grow every few seconds. In a matter of minutes, the logs fill the page as can be seen in this screenshot:

When I checked in v2.10-d8667221a2eec48d4350d00a9d39aee54f00f810-head with fleet fleet:105.0.1+up0.11.1, the log growth was significantly lower; however, it was still present, as can be seen in the screenshot after a few hours as can be seen in the screenshot:

@p-se is this expected?

p-se · 2024-12-04T08:11:03Z

@p-se is this expected?

No, it is not expected to see the status ever-growing! That said, I'm not sure if the fix is in the versions you've used for testing.

rancherbot added this to Fleet Sep 17, 2024

github-project-automation bot moved this to 🆕 New in Fleet Sep 17, 2024

p-se moved this from 🆕 New to To Triage in Fleet Sep 17, 2024

p-se added JIRA Must shout kind/bug labels Sep 17, 2024

kkaempf added this to the v2.9.3 milestone Sep 17, 2024

kkaempf modified the milestones: v2.9.3, 2.9.4 Oct 2, 2024

kkaempf assigned p-se Oct 2, 2024

manno moved this from To Triage to 📋 Backlog in Fleet Oct 23, 2024

manno unassigned p-se Oct 23, 2024

manno modified the milestones: v2.9.4, v2.11.0, v2.9.5 Oct 23, 2024

p-se self-assigned this Oct 25, 2024

p-se moved this from 📋 Backlog to 🏗 In progress in Fleet Oct 25, 2024

p-se moved this from 🏗 In progress to 👀 In review in Fleet Nov 5, 2024

weyfonk mentioned this issue Nov 6, 2024

Deduplicate status messages #3042

Merged

p-se moved this from 👀 In review to Needs QA review in Fleet Nov 6, 2024

mmartin24 self-assigned this Nov 12, 2024

manno modified the milestones: v2.9.5, v2.10.1 Dec 3, 2024

rancher deleted a comment from rancherbot Dec 3, 2024

rancherbot mentioned this issue Dec 3, 2024

[backport v2.10] [SURE-8794] Deploying ClusterGroup from GitRepo results in loop #3118

Open

p-se mentioned this issue Dec 3, 2024

[backport v2.10/v0.11] [SURE-8794] Deduplicate status messages (#3042) #3119

Open

mmartin24 added the status/waiting-for-fleet-rc-and-chart label Dec 4, 2024

manno modified the milestones: v2.10.1, v2.11.0 Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SURE-8794] Deploying ClusterGroup from GitRepo results in loop #2859

[SURE-8794] Deploying ClusterGroup from GitRepo results in loop #2859

p-se commented Sep 17, 2024

p-se commented Dec 3, 2024

mmartin24 commented Dec 4, 2024 •

edited

Loading

p-se commented Dec 4, 2024

[SURE-8794] Deploying ClusterGroup from GitRepo results in loop #2859

[SURE-8794] Deploying ClusterGroup from GitRepo results in loop #2859

Comments

p-se commented Sep 17, 2024

p-se commented Dec 3, 2024

mmartin24 commented Dec 4, 2024 • edited Loading

p-se commented Dec 4, 2024

mmartin24 commented Dec 4, 2024 •

edited

Loading