Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Argo Rollout workloads #3651

Merged
merged 4 commits into from
Aug 6, 2024
Merged

Support Argo Rollout workloads #3651

merged 4 commits into from
Aug 6, 2024

Conversation

P0lip
Copy link
Contributor

@P0lip P0lip commented Jul 18, 2024

Closes #2406 #3640

Description

I'm a bit on the edge with having a separate path for rollouts when relevant crds are not available. We'll always have that scenario in some way, but could approach it differently.
I went with a simple resource check that we do once upon the start, but the alternative path I was considering (and I still am, to be frank) is to simply handle the errors gracefully and that should be sufficient in most cases.

I'm also torn when it comes to those CRDs overall Since we have a different behavior depending on the availability of the aforementioned CRDs as they are cluster-wide, we isolating the proper test isolation is trickier when we re-use the same cluster.

Argo Rollouts controller is installed "globally" under argo-rollouts namespace. It could be potentially installed per namespace, but that shouldn't generally matter in our case.

I added Test_SuccessfullyInterceptsArgoRollout to WorkloadsSuite in integration_test/workloads_test.go which should test things pretty thoroughly for this case, but if you think I should add specific tests for workload watcher or agent injection, LMK.

This PR needs datawire/k8sapi#21. I tried to combine k8sFactory and the argo addition in some way, to have a single function one calls, but there's more there.

integration_test/testdata/k8s/argorollout.goyaml is unused, I know. I'll either drop it or use it in some tests

Checklist

  • I made sure to update ./CHANGELOG.yml.
  • I made sure to add any docs changes required for my change (including release notes).
  • My change is adequately tested.
  • I updated CONTRIBUTING.md with any special dev tricks I had to use to work on this code efficiently.
  • I updated TELEMETRY.md if I added, changed, or removed a metric name.
  • Once my PR is ready to have integration tests ran, I posted the PR in #telepresence-dev in the datawire-oss slack so that the "ok to test" label can be applied.

@P0lip P0lip self-assigned this Jul 18, 2024
@P0lip P0lip force-pushed the jrozek/argo-rollouts branch 3 times, most recently from b82c73e to 1294b2c Compare July 18, 2024 16:42
Copy link
Member

@thallgren thallgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I like the fact that the informers are started conditionally, but perhaps we should also give the user the ability to control this using a Helm chart value? That way, no additional RBAC permissions are needed unless you want to use this feature (many users are sensitive to such additions). It would also make it possible for users of argo-rollouts to explicitly disable this new feature of Telepresence.

Using a Helm chart value would also remove the different behavior depending on CRD's that you mention.

CHANGELOG.yml Outdated Show resolved Hide resolved
pkg/informers/context.go Outdated Show resolved Hide resolved
cmd/traffic/cmd/manager/mutator/watcher.go Outdated Show resolved Hide resolved
cmd/traffic/cmd/manager/mutator/watcher.go Outdated Show resolved Hide resolved
@P0lip P0lip marked this pull request as ready for review July 22, 2024 15:54
@P0lip P0lip requested a review from thallgren July 22, 2024 15:54
@@ -101,6 +101,7 @@ The following tables lists the configurable parameters of the Telepresence chart
| client.routing.allowConflictingSubnets | Allow the specified subnets to be routed even if they conflict with other routes on the local machine. | `[]` |
| client.dns.excludeSuffixes | Suffixes for which the client DNS resolver will always fail (or fallback in case of the overriding resolver) | `[".com", ".io", ".net", ".org", ".ru"]` |
| client.dns.includeSuffixes | Suffixes for which the client DNS resolver will always attempt to do a lookup. Includes have higher priority than excludes. | `[]` |
| argoRollouts.enabled | Enable/Disable the argo-rollouts integration. | `false` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought: I can imagine a future where multiple Deployment alternatives are supported. Might be worth nesting this as alternateWorkloads.argoRollouts.enabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea, but I don't think it should be "alternative". It should be just "workloads". Let me explain why.

Each workload used today adds overhead. Both to the client- and cluster-side of Telepresence. If we add an element that defaults to:

workloads:
  deployments:
    enabled: true
  replicaSets:
    enabled true
  statefulSets:
    enabled true
  argoRollouts:
    enabled false

Then a user that just wants deployments to be considered a workload could control this and get rid of the current overhead for replicaSets and statefulSets. This will require some more work on our shared-indexer logic and client code, and the complete implementation of what I'm suggesting here is beyond the scope of this PR, but adding the workloads top level element could be done now as a preparation.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar enough with the codebase to comment on the implementation side, but from a user perspective that sounds great. (especially the "beyond the scope of this PR" bit so I can get Rollout support sooner 😛)

Copy link
Member

@thallgren thallgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor nits, and perhaps add the top level workloads element for the Helm value.

pkg/client/userd/trafficmgr/workloads.go Outdated Show resolved Hide resolved
cmd/traffic/cmd/manager/managerutil/envconfig.go Outdated Show resolved Hide resolved
cmd/traffic/cmd/manager/managerutil/envconfig_test.go Outdated Show resolved Hide resolved
Copy link
Member

@thallgren thallgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this, but I'd still like to see a workloads top-level element for the Helm value.

@thallgren thallgren added the ok to test Applied by maintainers when a PR is ready to have tests run on it label Jul 27, 2024
@github-actions github-actions bot removed the ok to test Applied by maintainers when a PR is ready to have tests run on it label Jul 27, 2024
@thallgren
Copy link
Member

@P0lip FYI a rebase from the latest release/v2 will get rid of the errors involving the "thhal" docker repository.

@P0lip
Copy link
Contributor Author

P0lip commented Jul 29, 2024

Approving this, but I'd still like to see a workloads top-level element for the Helm value.

Sounds good, I like that change and will add it.

@P0lip P0lip added the ok to test Applied by maintainers when a PR is ready to have tests run on it label Jul 30, 2024
@github-actions github-actions bot removed the ok to test Applied by maintainers when a PR is ready to have tests run on it label Jul 30, 2024
@P0lip
Copy link
Contributor Author

P0lip commented Jul 31, 2024

It looks like this GH action step is not executed causing the integration test to fail

@thallgren
Copy link
Member

thallgren commented Jul 31, 2024

Oh, that's because the CI is triggered using pull_request_target, which in essence means that it runs using the .github folder from the target release/v2 branch (security feature). In essence, this PR cannot be green unit it's been merged (catch 22).

What you can do, is create another branch from this PR's branch, and push an intentionally empty commit. Then create a new PR that targets this PR. It's CI will then use the .github folder from this PR. If that's green, then it's OK to merge this PR.

@P0lip P0lip changed the base branch from release/v2 to jrozek/dev/argo-ci-changes August 2, 2024 09:19
@P0lip P0lip added the ok to test Applied by maintainers when a PR is ready to have tests run on it label Aug 2, 2024
@github-actions github-actions bot removed the ok to test Applied by maintainers when a PR is ready to have tests run on it label Aug 2, 2024
@P0lip P0lip force-pushed the jrozek/dev/argo-ci-changes branch from af242c0 to 1bec08c Compare August 2, 2024 13:04
@P0lip P0lip added the ok to test Applied by maintainers when a PR is ready to have tests run on it label Aug 2, 2024
@github-actions github-actions bot removed the ok to test Applied by maintainers when a PR is ready to have tests run on it label Aug 2, 2024
@P0lip
Copy link
Contributor Author

P0lip commented Aug 2, 2024

@thallgren I created a small branch containing the GH actions changes and made this one point at it.
I need to tweak PATH on Windows, because we're getting exec: "kubectl-argo-rollouts": executable file not found in %PATH% there.
However, the tests pass on Linux (both platforms) & MacOS, so from that standpoint we should be good.
I'll ping you once the Windows test is passing.

We can then just change the target branch on this PR back to v2 and merge it.

Signed-off-by: Jakub Rożek <[email protected]>
@P0lip P0lip force-pushed the jrozek/dev/argo-ci-changes branch from 1bec08c to 76d8be1 Compare August 5, 2024 14:06
@P0lip P0lip added the ok to test Applied by maintainers when a PR is ready to have tests run on it label Aug 5, 2024
@github-actions github-actions bot removed the ok to test Applied by maintainers when a PR is ready to have tests run on it label Aug 5, 2024
@P0lip
Copy link
Contributor Author

P0lip commented Aug 5, 2024

@thallgren I fixed the issue with PATH on Windows and as such this PR can be merged.
Feel free to change the target branch to v2 and merge it at your earliest convenience.

@thallgren thallgren changed the base branch from jrozek/dev/argo-ci-changes to release/v2 August 6, 2024 13:29
@thallgren thallgren merged commit ed78fd2 into release/v2 Aug 6, 2024
14 checks passed
@thallgren thallgren deleted the jrozek/argo-rollouts branch August 6, 2024 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Argo Rollout workloads
3 participants