Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve telepresence list performance #3712

Merged
merged 6 commits into from
Oct 30, 2024
Merged

Conversation

thallgren
Copy link
Member

@thallgren thallgren commented Oct 28, 2024

This PR modifies the list command to optimize performance and reduce dependency on Kubernetes RBAC privileges. Instead of directly watching Kubernetes workload resources, the client now subscribes to the Traffic Manager's WorkloadEventsWatcher to track interceptable workloads. This change offers two key benefits:

  1. Reduced RBAC Requirements: The client no longer needs RBAC permissions to read and watch Kubernetes resources.
  2. Improved Performance: By significantly reducing the number of API requests sent to the Traffic Manager, overall performance is enhanced.

Closes #3714

This setting Controls the enablement of features more recent than the
given version. In particular, it's intended to make newer grpc functions
return an `Unimplemented` when set to a version where the function is
not implemented.

This setting is intended for test and debugging only.

Signed-off-by: Thomas Hallgren <[email protected]>
@thallgren thallgren added the ok to test Applied by maintainers when a PR is ready to have tests run on it label Oct 28, 2024
@github-actions github-actions bot removed the ok to test Applied by maintainers when a PR is ready to have tests run on it label Oct 28, 2024
@thallgren thallgren added the ok to test Applied by maintainers when a PR is ready to have tests run on it label Oct 28, 2024
@github-actions github-actions bot removed the ok to test Applied by maintainers when a PR is ready to have tests run on it label Oct 28, 2024
@thallgren thallgren force-pushed the thallgren/list-performance branch 2 times, most recently from daa5d63 to 10c09f1 Compare October 29, 2024 16:16
@thallgren thallgren added the ok to test Applied by maintainers when a PR is ready to have tests run on it label Oct 29, 2024
@github-actions github-actions bot removed the ok to test Applied by maintainers when a PR is ready to have tests run on it label Oct 29, 2024
@thallgren thallgren added the ok to test Applied by maintainers when a PR is ready to have tests run on it label Oct 29, 2024
@github-actions github-actions bot removed the ok to test Applied by maintainers when a PR is ready to have tests run on it label Oct 29, 2024
@thallgren thallgren added the ok to test Applied by maintainers when a PR is ready to have tests run on it label Oct 30, 2024
@github-actions github-actions bot removed the ok to test Applied by maintainers when a PR is ready to have tests run on it label Oct 30, 2024
Let the `telepresence list` command use a workload collection that is
backed by the traffic-managers `WatchWorkloads` function.

The client will use local shared informers to watch the workloads when
connecting to an older traffic-manager that doesn't support the
`WatchWorkloads` call.

Sort conditions by LastTransitionTime so that more recent has priority.
When checking the state of an argo-rollout, it's essential to look at
the transition timestamp, because the rollout can be both "progressing"
and "available" when pausing between steps.

We now look at the `RolloutAvailable` rather than `RolloutHealthy`. The
latter is not set until all steps of a canary rollout have completed,
which might be never.

Signed-off-by: Thomas Hallgren <[email protected]>
The condition "Rollout of n.n is not necessary. At least one pod has
the desired agent state" must be reversed when deleting an agent so that
we also have "Rollout of n.n is necessary. At least one pod still has an
agent".

Signed-off-by: Thomas Hallgren <[email protected]>
@thallgren thallgren added the ok to test Applied by maintainers when a PR is ready to have tests run on it label Oct 30, 2024
@github-actions github-actions bot removed the ok to test Applied by maintainers when a PR is ready to have tests run on it label Oct 30, 2024
@thallgren thallgren merged commit d7d3ab2 into release/v2 Oct 30, 2024
11 checks passed
@thallgren thallgren deleted the thallgren/list-performance branch October 30, 2024 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

telepresence list takes a very long time to complete (~4 mins)
1 participant