-
Notifications
You must be signed in to change notification settings - Fork 685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wait for cache sync and DAG build before starting xDS server #5672
wait for cache sync and DAG build before starting xDS server #5672
Conversation
8d9887a
to
820b2f4
Compare
3672d63
to
44d7293
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #5672 +/- ##
==========================================
- Coverage 78.59% 78.52% -0.08%
==========================================
Files 138 138
Lines 19244 19299 +55
==========================================
+ Hits 15125 15154 +29
- Misses 3830 3853 +23
- Partials 289 292 +3
|
There's actually a better way to implement this. The server should only wait for the first DAG build. The handler itself then would ensure the first DAG build is done after the cache sync. I'll push a commit to implement this. |
8684e5c
to
02cf83d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @therealak12!
So, my understanding from this is that WaitForCacheSync()
might have worked - if we did NOT process the events asynchronously. Since client-go will be unaware of our background processing it returns while we still process our own queue. For this kind of situations, client-go offers utility SingleFileTracker
, where user signals their async processing by using Start()
and Finished()
call-pair per resource in initial list, and HasSynced()
which additionally covers the initial queue inside client.go. Is that correct?
I added some small questions inline as well.
The change makes sense to me and seems to work on my machine as well 🙂 👍
Would you add a changelog entry as well?
Thanks for your review and comments. I try to explain what happens exactly:
This PR waits until all of the Those This is why I've used the |
095f8d4
to
4812082
Compare
I have one more observation, but otherwise the change looks good to me 👍 Assume we have a lot of resources that require status update, let's say 2000 There seems to be a chance now, that Contour will NOT start XDS server before the statuses have been pushed to Kubernetes. It happens like following: Sometimes Contour manages to acquire lease before client-go sync has finalised.
In this case the processing of status updates happens within In my test it took 7 minutes to update 2000
Since the XDS server does not depend on statuses, I think it would make sense to set contour/internal/contour/handler.go Lines 243 to 250 in 68bafab
Cc @sunjayBhatia, @skriss |
If I've understood correctly, you mean we should allow status updates regardless of whether the cache is synced or not. (Please correct me if I'm wrong) We have about 3000 httpproxy objects. They all use a delegated certificate. After a restart, if contour puts the TLSCertificateDelegation resource in its cache after the httpproxy objects, it marks all valid httpproxies as invalid! This is what triggered us to make this PR to wait for the full cache sync before starting XDS server, DAG rebuild, etc. |
No, we should wait for cache sync before status updates are started. However, the problem is that If Envoys restart during Contour restart, Contour would be up and serving configuration in few hundred msecs (=time of sync) instead of few minutes (time of status update). Even though the status updates are sent via a channel to be processed asynchronously, I think the channel itself blocks when we have enough updates and that is why |
bd5c9e8
to
61939aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @therealak12, this looks good to me! 🙏
Waiting for @skriss and @sunjayBhatia to take a look as well.
b6b619c
to
0b807b1
Compare
Signed-off-by: Ahmad Karimi <[email protected]>
Signed-off-by: Ahmad Karimi <[email protected]>
Signed-off-by: Ahmad Karimi <[email protected]>
Signed-off-by: Ahmad Karimi <[email protected]>
Signed-off-by: Ahmad Karimi <[email protected]>
Signed-off-by: Ahmad Karimi <[email protected]>
Signed-off-by: Ahmad Karimi <[email protected]>
44af337
to
4d7b9c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks again @therealak12! Will leave open for a bit in case anyone else wants to look, but we'll definitely plan to get this into the upcoming 1.27 release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, just one requested change on the mechanism for signaling between goroutines that the race detector picks up
It might be good to add a run of the e2es where we add the -race
flag when contour is compiled so we catch these sorts of things, can do as a daily build or even just enable for all runs by default
93ad0b6
to
8b2afe3
Compare
Signed-off-by: Ahmad Karimi <[email protected]>
Signed-off-by: Ahmad Karimi <[email protected]>
8b2afe3
to
310dc2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, modulo a few nits
Signed-off-by: Ahmad Karimi <[email protected]>
Signed-off-by: Ahmad Karimi <[email protected]>
* provisioner: add field overloadMaxHeapSize for envoy (projectcontour#5699) * add field overloadMaxHeapSize Signed-off-by: yy <[email protected]> * add changelog Signed-off-by: yy <[email protected]> * update changelog and configuration.md Signed-off-by: yangyang <[email protected]> --------- Signed-off-by: yy <[email protected]> Signed-off-by: yangyang <[email protected]> * build(deps): bump sigs.k8s.io/gateway-api from 0.8.0 to 0.8.1 (projectcontour#5757) * build(deps): bump sigs.k8s.io/gateway-api from 0.8.0 to 0.8.1 Bumps [sigs.k8s.io/gateway-api](https://github.com/kubernetes-sigs/gateway-api) from 0.8.0 to 0.8.1. - [Release notes](https://github.com/kubernetes-sigs/gateway-api/releases) - [Changelog](https://github.com/kubernetes-sigs/gateway-api/blob/main/CHANGELOG.md) - [Commits](kubernetes-sigs/gateway-api@v0.8.0...v0.8.1) --- updated-dependencies: - dependency-name: sigs.k8s.io/gateway-api dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * make generate Signed-off-by: Steve Kriss <[email protected]> --------- Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Steve Kriss <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Steve Kriss <[email protected]> * build(deps): bump github.com/onsi/ginkgo/v2 from 2.12.0 to 2.12.1 (projectcontour#5781) Bumps [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) from 2.12.0 to 2.12.1. - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.12.0...v2.12.1) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump google.golang.org/grpc from 1.58.1 to 1.58.2 (projectcontour#5780) Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.58.1 to 1.58.2. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](grpc/grpc-go@v1.58.1...v1.58.2) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump github.com/vektra/mockery/v2 from 2.33.2 to 2.34.0 (projectcontour#5779) Bumps [github.com/vektra/mockery/v2](https://github.com/vektra/mockery) from 2.33.2 to 2.34.0. - [Release notes](https://github.com/vektra/mockery/releases) - [Changelog](https://github.com/vektra/mockery/blob/master/docs/changelog.md) - [Commits](vektra/mockery@v2.33.2...v2.34.0) --- updated-dependencies: - dependency-name: github.com/vektra/mockery/v2 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Setting `disabled = true` on a route should disable the virtual host global rate limit policy (projectcontour#5657) Support disabling global rate limiting on individual routes by setting disabled=true. Fixes projectcontour#5685. Signed-off-by: shadi-altarsha <[email protected]> * update Go to 1.21.1 (projectcontour#5783) Signed-off-by: Steve Kriss <[email protected]> * Fixup: Sort path matches based on length rather than lexi (projectcontour#5752) Since Envoy is greedy matching path routes, order is important. Contour decides to sort the routes in a way that is not really intuitive and can lead to suprises. In particular even tho the comment in the code state that routes are ordered based on legnth the reality is that they are sorted based on string comparison. This PR fixes this. * I think the current behaviour doesnt make much sense and it is a bit brittle. * Updating the behaviour has significant update risk since there might be folks that rely on this routing behaviour without really knowing it. * Should we even merge this PR? I am of two minds and I would like some input: 1. Option (1): Merge it as and make a clear changelog/announcement about the fix 2. Option (2): Create a config flag with a feature-flag e.g. `route_sorting_strategy` and switch the implementation to not do sorting when the flag is present. That way it allows folks to opt-out from the sorting as they need to. Longest path based matching kinda makes sense to me now that I know about it, but it is rough edge than needs users to be familiar with contour and it is harder to socialize in larger teams. Signed-off-by: Sotiris Nanopoulos <[email protected]> * build(deps): bump github.com/onsi/gomega from 1.27.10 to 1.28.0 (projectcontour#5792) Bumps [github.com/onsi/gomega](https://github.com/onsi/gomega) from 1.27.10 to 1.28.0. - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.27.10...v1.28.0) --- updated-dependencies: - dependency-name: github.com/onsi/gomega dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump github.com/cert-manager/cert-manager (projectcontour#5791) Bumps [github.com/cert-manager/cert-manager](https://github.com/cert-manager/cert-manager) from 1.13.0 to 1.13.1. - [Release notes](https://github.com/cert-manager/cert-manager/releases) - [Commits](cert-manager/cert-manager@v1.13.0...v1.13.1) --- updated-dependencies: - dependency-name: github.com/cert-manager/cert-manager dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump github.com/vektra/mockery/v2 from 2.34.0 to 2.34.2 (projectcontour#5793) Bumps [github.com/vektra/mockery/v2](https://github.com/vektra/mockery) from 2.34.0 to 2.34.2. - [Release notes](https://github.com/vektra/mockery/releases) - [Changelog](https://github.com/vektra/mockery/blob/master/docs/changelog.md) - [Commits](vektra/mockery@v2.34.0...v2.34.2) --- updated-dependencies: - dependency-name: github.com/vektra/mockery/v2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump github.com/prometheus/client_golang (projectcontour#5790) Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.16.0 to 1.17.0. - [Release notes](https://github.com/prometheus/client_golang/releases) - [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md) - [Commits](prometheus/client_golang@v1.16.0...v1.17.0) --- updated-dependencies: - dependency-name: github.com/prometheus/client_golang dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * HTTPProxy: allow dynamic Host header rewrite (projectcontour#5678) Allows the Host header to be rewritten to the value of another header while forwarding the request to the upstream. This is possible at the route level only. Fixes projectcontour#5673. Signed-off-by: Clayton Gonsalves <[email protected]> * fix spelling errors (projectcontour#5798) Signed-off-by: Steve Kriss <[email protected]> * hack: bump codespell version to match GH action (projectcontour#5799) Signed-off-by: Steve Kriss <[email protected]> * gateway provisioner: add flags to enable running provisioner out of cluster (projectcontour#5686) Adds --incluster and --kubeconfig flags to the gateway provisioner to enable running outside of the cluster. Signed-off-by: gang.liu <[email protected]> * site: Bump Hugo to 0.119.0 (projectcontour#5795) - Also implement more consistent toml file indenting for readability - Asset optimization is deprecated by netlify, see: https://answers.netlify.com/t/please-read-deprecation-of-post-processing-asset-optimization/96657 Signed-off-by: Sunjay Bhatia <[email protected]> * internal/dag: default Listener ResolvedRefs to true (projectcontour#5804) Sets Gateway Listeners' ResolvedRefs condition to true by default, to pass updated conformance. Closes projectcontour#5648. Signed-off-by: Steve Kriss <[email protected]> * build(deps): bump golang.org/x/oauth2 from 0.12.0 to 0.13.0 (projectcontour#5810) Bumps [golang.org/x/oauth2](https://github.com/golang/oauth2) from 0.12.0 to 0.13.0. - [Commits](golang/oauth2@v0.12.0...v0.13.0) --- updated-dependencies: - dependency-name: golang.org/x/oauth2 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump github.com/vektra/mockery/v2 from 2.34.2 to 2.35.2 (projectcontour#5809) Bumps [github.com/vektra/mockery/v2](https://github.com/vektra/mockery) from 2.34.2 to 2.35.2. - [Release notes](https://github.com/vektra/mockery/releases) - [Changelog](https://github.com/vektra/mockery/blob/master/docs/changelog.md) - [Commits](vektra/mockery@v2.34.2...v2.35.2) --- updated-dependencies: - dependency-name: github.com/vektra/mockery/v2 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump github.com/prometheus/client_model (projectcontour#5811) Bumps [github.com/prometheus/client_model](https://github.com/prometheus/client_model) from 0.4.1-0.20230718164431-9a2bf3000d16 to 0.5.0. - [Release notes](https://github.com/prometheus/client_model/releases) - [Commits](https://github.com/prometheus/client_model/commits/v0.5.0) --- updated-dependencies: - dependency-name: github.com/prometheus/client_model dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * switch to github.com/distribution/parse (projectcontour#5818) Signed-off-by: Steve Kriss <[email protected]> * deps: Bump logrusr to v4.0.0 (projectcontour#5806) Fixes data races found in projectcontour#5805 Also remove testing around V().Info() logrusr has changed behavior since v3.0.0, it now tries to mimic logrus log levels with the V() level, see: bombsimon/logrusr@9f3fd50 In practice client-go checks if a certain verbosity level is enabled and initializes a different logger based on that and then uses Info(f) logs, rather than the V().Info() construction. This commit removes the testing of log lines written with V() guarding them and rather just tests the expected verbosity is enabled or not. Signed-off-by: Sunjay Bhatia <[email protected]> * wait for cache sync and DAG build before starting xDS server (projectcontour#5672) Prevents starting the XDS server and building the DAG until the cache is synced with the initial list of k8s objects and these events are processed by the event handler Signed-off-by: Ahmad Karimi <[email protected]> * internal/xdscache: Generate uuid for snapshot version (projectcontour#5819) Snapshotter had a data race reading/writing the snapshot version between threads. This version is not in practice used for the contour xDS server DiscoveryResponse versions but is in the go-control-plane version. Fixes: projectcontour#5482 Signed-off-by: Sunjay Bhatia <[email protected]> * Bump Envoy to 1.27.1 (projectcontour#5821) See release notes: https://www.envoyproxy.io/docs/envoy/v1.27.1/version_history/v1.27/v1.27.1 Signed-off-by: Sunjay Bhatia <[email protected]> * build(deps): bump golang.org/x/net from 0.16.0 to 0.17.0 (projectcontour#5829) Bumps [golang.org/x/net](https://github.com/golang/net) from 0.16.0 to 0.17.0. - [Commits](golang/net@v0.16.0...v0.17.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump google.golang.org/grpc from 1.58.2 to 1.58.3 (projectcontour#5833) Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.58.2 to 1.58.3. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](grpc/grpc-go@v1.58.2...v1.58.3) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump github.com/onsi/ginkgo/v2 from 2.12.1 to 2.13.0 (projectcontour#5831) Bumps [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) from 2.12.1 to 2.13.0. - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.12.1...v2.13.0) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump github.com/vektra/mockery/v2 from 2.35.2 to 2.35.4 (projectcontour#5834) Bumps [github.com/vektra/mockery/v2](https://github.com/vektra/mockery) from 2.35.2 to 2.35.4. - [Release notes](https://github.com/vektra/mockery/releases) - [Changelog](https://github.com/vektra/mockery/blob/master/docs/changelog.md) - [Commits](vektra/mockery@v2.35.2...v2.35.4) --- updated-dependencies: - dependency-name: github.com/vektra/mockery/v2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump github.com/google/go-cmp from 0.5.9 to 0.6.0 (projectcontour#5832) Bumps [github.com/google/go-cmp](https://github.com/google/go-cmp) from 0.5.9 to 0.6.0. - [Release notes](https://github.com/google/go-cmp/releases) - [Commits](google/go-cmp@v0.5.9...v0.6.0) --- updated-dependencies: - dependency-name: github.com/google/go-cmp dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump go to 1.21.3 (projectcontour#5841) Signed-off-by: Sunjay Bhatia <[email protected]> * Add configurability for HTTP requests per IO cycle (projectcontour#5827) An additional mitigation to CVE-2023-44487 available in Envoy 1.27.1. This change allows configuring the http.max_requests_per_io_cycle Envoy runtime setting via Contour configuration to allow administrators of Contour to prevent abusive connections from starving resources from others. The default is left as the existing behavior, that is no limit, so as not to impact existing valid traffic. See the Envoy release notes for more information: https://www.envoyproxy.io/docs/envoy/v1.27.1/version_history/v1.27/v1.27.1 Signed-off-by: Sunjay Bhatia <[email protected]> * provisioner: fix envoy-max-heapsize not set (projectcontour#5814) * fix envoy-max-heapsize not set Signed-off-by: yangyang <[email protected]> * add ut Signed-off-by: yangyang <[email protected]> * update ut Signed-off-by: yangyang <[email protected]> --------- Signed-off-by: yangyang <[email protected]> * HTTP/2 max concurrent streams can be configured (projectcontour#5850) Adds a global Listener configuration field for admins to be able to protect their installations of Contour/Envoy with a limit. Default is no limit to ensure existing behavior is not impacted for valid traffic. This field can be used for tuning resource usage or mitigated DOS attacks like in CVE-2023-44487. Also fixes omitempty tags on MaxRequestsPerIOCycle field. Fixes: projectcontour#5846 Signed-off-by: Sunjay Bhatia <[email protected]> * Bump Envoy to v1.27.2 (projectcontour#5863) See release notes: https://www.envoyproxy.io/docs/envoy/v1.27.2/version_history/v1.27/v1.27.2 Signed-off-by: Sunjay Bhatia <[email protected]> * site: 1.26.1, 1.25.3, 1.24.6 patch releases (projectcontour#5859) Signed-off-by: Sunjay Bhatia <[email protected]> * test/e2e: Add race detection in e2e tests (projectcontour#5805) Compile contour binary with -race flag and look for "DATA RACE" in stderr. Fails test if found. Signed-off-by: Sunjay Bhatia <[email protected]> * golangci-lint: Fix revive rules (projectcontour#5857) When we enabled the use-any rule we disabled all the default rules that are run by revive (see: https://revive.run/docs#golangci-lint) This change grabs all the default rules from https://github.com/mgechev/revive/blob/master/defaults.toml and adds the use-any rule Also fixes outstanding lint issues Signed-off-by: Sunjay Bhatia <[email protected]> * crd/ContourDeployment: Add field 'podLabels' for contour (#2) * add pod labels field to contourDeployment --------- Signed-off-by: yy <[email protected]> Signed-off-by: yangyang <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Steve Kriss <[email protected]> Signed-off-by: shadi-altarsha <[email protected]> Signed-off-by: Sotiris Nanopoulos <[email protected]> Signed-off-by: Clayton Gonsalves <[email protected]> Signed-off-by: gang.liu <[email protected]> Signed-off-by: Sunjay Bhatia <[email protected]> Signed-off-by: Ahmad Karimi <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Steve Kriss <[email protected]> Co-authored-by: Shadi Altarsha <[email protected]> Co-authored-by: Sotiris Nanopoulos <[email protected]> Co-authored-by: Clayton Gonsalves <[email protected]> Co-authored-by: izturn <[email protected]> Co-authored-by: Sunjay Bhatia <[email protected]> Co-authored-by: Ahmad Karimi <[email protected]>
Closes #5550.
Closes #1280.
I've removed the
x.mgr.GetCache().WaitForCacheSync()
call as it's implicitly handled in themgr.Start()
flow.As a TLDR, the PR prevents starting the XDS server and building the DAG until the cache is synced with the initial list of k8s objects.