-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore node_count and initial_node_count when late initializing NodePools #353
Ignore node_count and initial_node_count when late initializing NodePools #353
Conversation
…ools Having node_count late-initialized with autoscaling can cause the autoscaler and Crossplane to fight over how many nodes there should be (with Crossplane effectively reverting autoscaler changes). If autoscaling is used, node_count should remain unset. As documented in [0], initial_node_count may (or may not?) change if the node count is manually changed (via gcloud or the Cloud Console). This can cause resources to fail reconciliation because terraform wants to destroy and recreate the node pool. If the node count is changed manually, either Crossplane should correct it back to the value of an explicitly set node_count field, or it should be ignored because autoscaler is in use. [0]: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_node_pool#initial_node_count Signed-off-by: Samuel Littley <[email protected]>
/test-examples="examples/container/nodepool.yaml" |
/test-examples="examples/container/nodepool.yaml" |
Hi @toastwaffle, thank you for your contribution. |
I did spot that, but it looked like the feature was marked as Alpha. Do you have a rough idea of how stable the feature is? Either way, I'll give it a try tomorrow. |
@lsviben can better comment here, but AFAIK it is planned to be promoted to beta soon. |
@toastwaffle, the feature API is considered stable at this point. We plan to mature it to beta in the XP 1.14 release. |
@toastwaffle, have you had a chance to try? |
We did try it, and it did solve the node count problem, but we ended up rolling it back pretty quickly due to some errant behaviour (crossplane/upjet#263). We may have been a little bit trigger happy with the rollback due to lack of understanding on our side, so we are going to try again, but we've been waiting for #373 to be released (James is my coworker) so we don't have to wrangle 2 upgrades in quick succession. If you think this change isn't worth merging given the granular management policies, I'm happy for this PR to be closed. |
We can wait for you to try one more time before closing it. Please share the results with us. |
We successfully upgraded 🎉 |
Having
node_count
late-initialized with autoscaling can cause the autoscaler and Crossplane to fight over how many nodes there should be (with Crossplane effectively reverting autoscaler changes). If autoscaling is used,node_count
should remain unset.As documented,
initial_node_count
may (or may not?) change if the node count is manually changed (via gcloud or the Cloud Console). This can cause resources to fail reconciliation because terraform wants to destroy and recreate the node pool. If the node count is changed manually, either Crossplane should correct it back to the value of an explicitly setnode_count
field, or it should be ignored because autoscaler is in use (in which casenode_count
should be unset).Fixes #340
I have:
make reviewable test
to ensure this PR is ready for review (sort of - the linter was OOMing, but the tests pass when run separately).How has this code been tested
Manually exercised with
make run
against minikube by creating a Cluster and Node Pool with autoscaling enabled, and manually (in Cloud Console) setting the node count to a high number such that the autoscaler scales it back down (which is how I found the issue withinitial_node_count
beyond the main issue in #340). I repeated this process a few times, and did not see any errors or attempts to undo the autoscaler's actions.