-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster upgrade tuning in CAPI #2307
base: main
Are you sure you want to change the base?
Conversation
src/content/tutorials/fleet-management/cluster-management/tunning-cluster-upgrades/index.md
Outdated
Show resolved
Hide resolved
--- | ||
linkTitle: Fine-tuning upgrade disruption | ||
title: Fine-tuning upgrade disruption on CAPI | ||
description: The level of disruption caused by cluster upgrades can be influenced per cluster. This article explains how to adjust the number of nodes that is updated simlutaneously, and the wait time between batches of nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rather make this a generic article about where disruption can be improved? It's not only about nodes, but also PDBs. Mind also that Giant Swarm CAPA WCs, as of 2024-10, use machine pools, which doesn't take care of disruption handling perfectly.
Also: typo in simultaneously
Not reviewing further until we're clear about the article's goal. It also seems copied from vintage, which will not work, given that we're now fully based on Helm charts, and editing AWSCluster
and other objects won't work the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I brought it here since we had an internal discussion about missing content, how upgrades currently work in CAPI, and how we can be configured to decrease the impact on the workloads. I agree we can do a general doc entry for the correct configuration of your workloads in our cluster to avoid impact. But still we need a page that describes how it works from a general view
…ing-cluster-upgrades/index.md Co-authored-by: Andreas Sommer <[email protected]>
|
||
## Introduction | ||
|
||
Cluster upgraded, described in detail in our [cluster upgrades reference]({{< relref "/vintage/platform-overview/cluster-management/cluster-upgrades" >}})), can cause disruption on workloads, if the upgrade requires upgrading worker nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cluster upgraded, described in detail in our [cluster upgrades reference]({{< relref "/vintage/platform-overview/cluster-management/cluster-upgrades" >}})), can cause disruption on workloads, if the upgrade requires upgrading worker nodes. | |
Cluster upgrades, described in detail in our [cluster upgrades reference]({{< relref "/vintage/platform-overview/cluster-management/cluster-upgrades" >}})), can cause disruption on workloads, if the upgrade requires replacement of worker nodes. |
(and the link goes to /vintage
– better avoid if we can)
|
||
Cluster upgraded, described in detail in our [cluster upgrades reference]({{< relref "/vintage/platform-overview/cluster-management/cluster-upgrades" >}})), can cause disruption on workloads, if the upgrade requires upgrading worker nodes. | ||
|
||
We provide two ways of limiting the amount of disruption: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From here, this is all outdated and not implemented for CAPI. AWSMachinePool
takes care of the instance refresh, and its settings dictate what happens. We'd first need to work on making this more stable (e.g. https://github.com/giantswarm/giantswarm/issues/31843). Or if you want, we can describe the current behavior already and update the article again later. Mind however that it will be provider-specific, since for CAPA we use machine pools, for others we don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is nice to have, but we can wait if it changes soon. When new process is implemented we can work here to make it up to date
What this PR does / why we need it
Things to check/remember before submitting
If you made content changes
make lint dev
to render and proofread content changes locally.last_review_date
in the front matter header if you reviewed the entire page.