-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tron topology_spread_constraints support to PaaSTA #3983
Conversation
This adds support for configuring Tron-launched pods with a default Topology Spread Constraint (and node affinities) that will spread pods out across multiple AZs - otherwise, Karpenter will overwhelmingly favor a single AZ due to our config
0934f37
to
31673fc
Compare
requires Yelp/Tron#1003 and Yelp/task_processing#220 |
paasta_tools/tron_tools.py
Outdated
# PAASTA-18198: To improve AZ balance with Karpenter, we temporarily allow specifying zone affinities per pool | ||
pool_node_affinities = load_system_paasta_config().get_pool_node_affinities() | ||
if pool_node_affinities and self.get_pool() in pool_node_affinities: | ||
current_pool_node_affinities = pool_node_affinities[self.get_pool()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this dictionary/map returns pool_node_affinities[self.get_pool()]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it the config to set the pool node affinities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a puppet-defined dict (see paasta_tools::public_config::pool_node_affinities
internally) that defines a default set of affinities for certain pools (speaking of which, we should probably add an entry for the tron pool :p):
default:
topology.kubernetes.io/zone:
- us-west-2a
- us-west-2b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh i see it returns the azs affinities:
paasta_tools::public_config::pool_node_affinities:
# pnw-prod default should only run on us-west-2a and us-west-2b (unless there's a deploy_whitelist override)
default:
topology.kubernetes.io/zone:
- us-west-2a
- us-west-2b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh sorry just saw your answer, yea makes sense
This'll let us deploy this without big-bouncing Tron everywhere
I realized that we weren't consistently using the same mock SystemPaastaConfig - so I went ahead and fixed that while I was here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was tricked by the 'big bounce' test failing but had missed that our test config enables tron TSCs as well 👍 So this shouldn't cause a big bounce til we enable that (and only affects runs on pools w/ constraints specified), sgtm!
This adds support for configuring Tron-launched pods with a default Topology Spread Constraint (and node affinities) that will spread pods out across multiple AZs - otherwise, Karpenter will overwhelmingly favor a single AZ due to our config
I've double-checked that the feature flag works by running this code on
infrastage as well (as well as with the tests)
NOTE: there's a lot of test "changes", but that's mostly since I noticed
that we were inconsistently mocking load_system_paasta_config() and I wanted
to centralize things to make sure that the tests were doing the right thing.