Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tron topology_spread_constraints support to PaaSTA #3983

Merged
merged 6 commits into from
Nov 11, 2024

Conversation

nemacysts
Copy link
Member

@nemacysts nemacysts commented Oct 24, 2024

This adds support for configuring Tron-launched pods with a default Topology Spread Constraint (and node affinities) that will spread pods out across multiple AZs - otherwise, Karpenter will overwhelmingly favor a single AZ due to our config

I've double-checked that the feature flag works by running this code on
infrastage as well (as well as with the tests)

NOTE: there's a lot of test "changes", but that's mostly since I noticed
that we were inconsistently mocking load_system_paasta_config() and I wanted
to centralize things to make sure that the tests were doing the right thing.

@nemacysts nemacysts requested a review from a team as a code owner October 24, 2024 15:54
@nemacysts nemacysts changed the title Add tron topology_stpread_constraints support to PaaSTA Add tron topology_spread_constraints support to PaaSTA Oct 24, 2024
This adds support for configuring Tron-launched pods with a default Topology
Spread Constraint (and node affinities) that will spread pods out across
multiple AZs - otherwise, Karpenter will overwhelmingly favor a single AZ due
to our config
@nemacysts
Copy link
Member Author

requires Yelp/Tron#1003 and Yelp/task_processing#220

# PAASTA-18198: To improve AZ balance with Karpenter, we temporarily allow specifying zone affinities per pool
pool_node_affinities = load_system_paasta_config().get_pool_node_affinities()
if pool_node_affinities and self.get_pool() in pool_node_affinities:
current_pool_node_affinities = pool_node_affinities[self.get_pool()]
Copy link
Contributor

@EmanElsaban EmanElsaban Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this dictionary/map returns pool_node_affinities[self.get_pool()]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the config to set the pool node affinities?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a puppet-defined dict (see paasta_tools::public_config::pool_node_affinities internally) that defines a default set of affinities for certain pools (speaking of which, we should probably add an entry for the tron pool :p):

  default:
    topology.kubernetes.io/zone:
      - us-west-2a
      - us-west-2b

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh i see it returns the azs affinities:

paasta_tools::public_config::pool_node_affinities:
  # pnw-prod default should only run on us-west-2a and us-west-2b (unless there's a deploy_whitelist override)
  default:
    topology.kubernetes.io/zone:
      - us-west-2a
      - us-west-2b

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh sorry just saw your answer, yea makes sense

EmanElsaban
EmanElsaban previously approved these changes Oct 28, 2024
This'll let us deploy this without big-bouncing Tron everywhere
I realized that we weren't consistently using the same mock
SystemPaastaConfig - so I went ahead and fixed that while I was here
Copy link
Contributor

@jfongatyelp jfongatyelp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was tricked by the 'big bounce' test failing but had missed that our test config enables tron TSCs as well 👍 So this shouldn't cause a big bounce til we enable that (and only affects runs on pools w/ constraints specified), sgtm!

paasta_tools/tron_tools.py Outdated Show resolved Hide resolved
@nemacysts nemacysts merged commit b749c0a into master Nov 11, 2024
10 checks passed
@nemacysts nemacysts deleted the luisp/TRON-2213-tsc branch November 15, 2024 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants