Enable high availability (HA) configuration for ASO #4445

bingikarthik · 2024-11-13T16:50:01Z

What this PR does

Closes #4215
As we adopt Azure Service Operator (ASO), it's essential to run multiple replicas for production workloads, especially during cluster upgrade operations. This PR enables high availability (HA) for ASO by increasing the replica count and implementing a Pod Disruption Budget (PDB) to maintain consistent uptime and resilience during potential disruptions.

How does this PR make you feel?

Checklist

this PR contains documentation
this PR contains tests
this PR contains YAML Samples

config/manager/manager.yaml

config/default/manager_pod_disruption_budget.yaml

v2/config/default/manager_pod_disruption_budget.yaml

theunrepentantgeek

How will things behave when upgrading ASO to the next version?

I'm worried about the scenario where the version N+1 of ASO deploys new CRD versions which (still running) version N doesn't understand.

In this scenario, since we use the newest resource version as the storage (hub) version of the resource, any resource that has been touched by the version N+1 will be unintelligible to version N, resulting in a panic/crash.

...zure-service-operator/templates/policy_v1_ poddisruptionbudget_azureserviceoperator-pdb.yaml

bingikarthik · 2024-11-18T08:43:31Z

@matthchr I've made the requested changes. When you have a moment, could you kindly review the PR?

matthchr · 2024-11-18T19:05:12Z

@bingikarthik - thanks! I think we need to make a deployment rollout change to make this safe (as @theunrepentantgeek called out). I'll send a separate PR for that and link it here, and then once that merges we can make sure this change is safe w/ it, and assuming it is, merge this too.

Thanks for your patience

nishant221 · 2024-11-20T06:43:19Z

Does ASO support "leader" kind of approach? If multiple replicas are running (and reconciling independently), can the same request can be processed by multiple instances of ASO resulting in duplicate calls to Azure?

bingikarthik · 2024-11-20T08:10:45Z

Does ASO support "leader" kind of approach? If multiple replicas are running (and reconciling independently), can the same request can be processed by multiple instances of ASO resulting in duplicate calls to Azure?

Yes, it was indeed. Please check: https://github.com/Azure/azure-service-operator/blob/main/v2/charts/azure-service-operator/templates/apps_v1_deployment_azureserviceoperator-controller-manager.yaml#L59
https://github.com/Azure/azure-service-operator/blob/main/main.go#L63

Enable high availability (HA) configuration for ASO

b32422d

bingikarthik requested review from davefellows, theunrepentantgeek, matthchr, babbageclunk and super-harsh as code owners November 13, 2024 16:50

matthchr reviewed Nov 13, 2024

View reviewed changes

config/manager/manager.yaml Outdated Show resolved Hide resolved

config/default/manager_pod_disruption_budget.yaml Outdated Show resolved Hide resolved

config/default/manager_pod_disruption_budget.yaml Outdated Show resolved Hide resolved

Update pdb selector and move refs under v2/

b31bf9e

matthchr reviewed Nov 13, 2024

View reviewed changes

v2/config/default/manager_pod_disruption_budget.yaml Outdated Show resolved Hide resolved

v2/config/default/manager_pod_disruption_budget.yaml Outdated Show resolved Hide resolved

theunrepentantgeek reviewed Nov 13, 2024

View reviewed changes

...zure-service-operator/templates/policy_v1_ poddisruptionbudget_azureserviceoperator-pdb.yaml Outdated Show resolved Hide resolved

Bingi Narasimha Karthik added 2 commits November 15, 2024 16:19

Move manager_pod_disruption_budget.yaml and update selector for PDB

c7be817

Remove version label from pdb

fae3df7

Add enable condition for PDB

ff13230

matthchr mentioned this pull request Nov 23, 2024

Support multiple replicas of ASO pod #4466

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable high availability (HA) configuration for ASO #4445

Enable high availability (HA) configuration for ASO #4445

bingikarthik commented Nov 13, 2024

theunrepentantgeek left a comment

bingikarthik commented Nov 18, 2024

matthchr commented Nov 18, 2024 •

edited

Loading

nishant221 commented Nov 20, 2024

bingikarthik commented Nov 20, 2024

Enable high availability (HA) configuration for ASO #4445

Are you sure you want to change the base?

Enable high availability (HA) configuration for ASO #4445

Conversation

bingikarthik commented Nov 13, 2024

What this PR does

How does this PR make you feel?

Checklist

theunrepentantgeek left a comment

Choose a reason for hiding this comment

bingikarthik commented Nov 18, 2024

matthchr commented Nov 18, 2024 • edited Loading

nishant221 commented Nov 20, 2024

bingikarthik commented Nov 20, 2024

matthchr commented Nov 18, 2024 •

edited

Loading