Ignore node_count and initial_node_count when late initializing NodePools #353

toastwaffle · 2023-08-09T17:30:26Z

Having node_count late-initialized with autoscaling can cause the autoscaler and Crossplane to fight over how many nodes there should be (with Crossplane effectively reverting autoscaler changes). If autoscaling is used, node_count should remain unset.

As documented, initial_node_count may (or may not?) change if the node count is manually changed (via gcloud or the Cloud Console). This can cause resources to fail reconciliation because terraform wants to destroy and recreate the node pool. If the node count is changed manually, either Crossplane should correct it back to the value of an explicitly set node_count field, or it should be ignored because autoscaler is in use (in which case node_count should be unset).

Fixes #340

I have:

Run make reviewable test to ensure this PR is ready for review (sort of - the linter was OOMing, but the tests pass when run separately).

How has this code been tested

Manually exercised with make run against minikube by creating a Cluster and Node Pool with autoscaling enabled, and manually (in Cloud Console) setting the node count to a high number such that the autoscaler scales it back down (which is how I found the issue with initial_node_count beyond the main issue in #340). I repeated this process a few times, and did not see any errors or attempts to undo the autoscaler's actions.

…ools Having node_count late-initialized with autoscaling can cause the autoscaler and Crossplane to fight over how many nodes there should be (with Crossplane effectively reverting autoscaler changes). If autoscaling is used, node_count should remain unset. As documented in [0], initial_node_count may (or may not?) change if the node count is manually changed (via gcloud or the Cloud Console). This can cause resources to fail reconciliation because terraform wants to destroy and recreate the node pool. If the node count is changed manually, either Crossplane should correct it back to the value of an explicitly set node_count field, or it should be ignored because autoscaler is in use. [0]: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_node_pool#initial_node_count Signed-off-by: Samuel Littley <[email protected]>

Upbound-CLA · 2023-08-09T17:30:30Z

All committers have signed the CLA.

turkenf · 2023-08-09T17:40:50Z

/test-examples="examples/container/nodepool.yaml"

toastwaffle · 2023-08-09T17:40:53Z

@turkenf this goes beyond what was suggested in #340, and I'm not 100% sure that ignoring initial_node_count is actually correct. I'd appreciate your thoughts.

I will hopefully get the CLA signed tomorrow; I need to get it reviewed by my company's legal team.

turkenf · 2023-08-14T09:22:23Z

/test-examples="examples/container/nodepool.yaml"

turkenf · 2023-08-15T20:26:16Z

Hi @toastwaffle, thank you for your contribution.
In v0.35.0, the granular management policies feature was added. This might solve your problem here, can you check this example or docs here?

toastwaffle · 2023-08-15T20:37:34Z

I did spot that, but it looked like the feature was marked as Alpha. Do you have a rough idea of how stable the feature is?

Either way, I'll give it a try tomorrow.

turkenf · 2023-08-15T20:45:15Z

I did spot that, but it looked like the feature was marked as Alpha. Do you have a rough idea of how stable the feature is?

@lsviben can better comment here, but AFAIK it is planned to be promoted to beta soon.

jeanduplessis · 2023-08-15T20:51:01Z

@toastwaffle, the feature API is considered stable at this point. We plan to mature it to beta in the XP 1.14 release.

turkenf · 2023-09-25T16:21:35Z

@toastwaffle, have you had a chance to try?

toastwaffle · 2023-09-25T16:50:50Z

We did try it, and it did solve the node count problem, but we ended up rolling it back pretty quickly due to some errant behaviour (crossplane/upjet#263).

We may have been a little bit trigger happy with the rollback due to lack of understanding on our side, so we are going to try again, but we've been waiting for #373 to be released (James is my coworker) so we don't have to wrangle 2 upgrades in quick succession.

If you think this change isn't worth merging given the granular management policies, I'm happy for this PR to be closed.

turkenf · 2023-09-27T16:34:27Z

We can wait for you to try one more time before closing it. Please share the results with us.

toastwaffle · 2023-10-10T13:41:24Z

We successfully upgraded 🎉

toastwaffle requested review from ulucinar, sergenyalcin and turkenf as code owners August 9, 2023 17:30

toastwaffle closed this Oct 10, 2023

vladfr mentioned this pull request Nov 27, 2023

NodePool container.gcp.upbound.io should not LateInitialize node_count #340

Closed

moolen mentioned this pull request Jan 26, 2024

fix: NodePool should not late-init node count #452

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore node_count and initial_node_count when late initializing NodePools #353

Ignore node_count and initial_node_count when late initializing NodePools #353

toastwaffle commented Aug 9, 2023

Upbound-CLA commented Aug 9, 2023 •

edited

Loading

turkenf commented Aug 9, 2023

toastwaffle commented Aug 9, 2023

turkenf commented Aug 14, 2023

turkenf commented Aug 15, 2023

toastwaffle commented Aug 15, 2023

turkenf commented Aug 15, 2023 •

edited

Loading

jeanduplessis commented Aug 15, 2023

turkenf commented Sep 25, 2023

toastwaffle commented Sep 25, 2023

turkenf commented Sep 27, 2023

toastwaffle commented Oct 10, 2023

Ignore node_count and initial_node_count when late initializing NodePools #353

Ignore node_count and initial_node_count when late initializing NodePools #353

Conversation

toastwaffle commented Aug 9, 2023

How has this code been tested

Upbound-CLA commented Aug 9, 2023 • edited Loading

turkenf commented Aug 9, 2023

toastwaffle commented Aug 9, 2023

turkenf commented Aug 14, 2023

turkenf commented Aug 15, 2023

toastwaffle commented Aug 15, 2023

turkenf commented Aug 15, 2023 • edited Loading

jeanduplessis commented Aug 15, 2023

turkenf commented Sep 25, 2023

toastwaffle commented Sep 25, 2023

turkenf commented Sep 27, 2023

toastwaffle commented Oct 10, 2023

Upbound-CLA commented Aug 9, 2023 •

edited

Loading

turkenf commented Aug 15, 2023 •

edited

Loading