Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster upgrade tuning in CAPI #2307

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Cluster upgrade tuning in CAPI #2307

wants to merge 3 commits into from

Conversation

pipo02mix
Copy link
Contributor

What this PR does / why we need it

Things to check/remember before submitting

  • If you made content changes

    • Run make lint dev to render and proofread content changes locally.
    • Bump last_review_date in the front matter header if you reviewed the entire page.

@pipo02mix pipo02mix requested a review from a team as a code owner October 7, 2024 07:44
@pipo02mix pipo02mix self-assigned this Oct 7, 2024
@pipo02mix pipo02mix requested a review from paurosello October 7, 2024 07:44
---
linkTitle: Fine-tuning upgrade disruption
title: Fine-tuning upgrade disruption on CAPI
description: The level of disruption caused by cluster upgrades can be influenced per cluster. This article explains how to adjust the number of nodes that is updated simlutaneously, and the wait time between batches of nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rather make this a generic article about where disruption can be improved? It's not only about nodes, but also PDBs. Mind also that Giant Swarm CAPA WCs, as of 2024-10, use machine pools, which doesn't take care of disruption handling perfectly.

Also: typo in simultaneously

Not reviewing further until we're clear about the article's goal. It also seems copied from vintage, which will not work, given that we're now fully based on Helm charts, and editing AWSCluster and other objects won't work the same way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I brought it here since we had an internal discussion about missing content, how upgrades currently work in CAPI, and how we can be configured to decrease the impact on the workloads. I agree we can do a general doc entry for the correct configuration of your workloads in our cluster to avoid impact. But still we need a page that describes how it works from a general view

@AndiDog AndiDog changed the title Cluster upgrade tunning in CAPI Cluster upgrade tuning in CAPI Nov 19, 2024

## Introduction

Cluster upgraded, described in detail in our [cluster upgrades reference]({{< relref "/vintage/platform-overview/cluster-management/cluster-upgrades" >}})), can cause disruption on workloads, if the upgrade requires upgrading worker nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Cluster upgraded, described in detail in our [cluster upgrades reference]({{< relref "/vintage/platform-overview/cluster-management/cluster-upgrades" >}})), can cause disruption on workloads, if the upgrade requires upgrading worker nodes.
Cluster upgrades, described in detail in our [cluster upgrades reference]({{< relref "/vintage/platform-overview/cluster-management/cluster-upgrades" >}})), can cause disruption on workloads, if the upgrade requires replacement of worker nodes.

(and the link goes to /vintage – better avoid if we can)


Cluster upgraded, described in detail in our [cluster upgrades reference]({{< relref "/vintage/platform-overview/cluster-management/cluster-upgrades" >}})), can cause disruption on workloads, if the upgrade requires upgrading worker nodes.

We provide two ways of limiting the amount of disruption:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From here, this is all outdated and not implemented for CAPI. AWSMachinePool takes care of the instance refresh, and its settings dictate what happens. We'd first need to work on making this more stable (e.g. https://github.com/giantswarm/giantswarm/issues/31843). Or if you want, we can describe the current behavior already and update the article again later. Mind however that it will be provider-specific, since for CAPA we use machine pools, for others we don't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is nice to have, but we can wait if it changes soon. When new process is implemented we can work here to make it up to date

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants