Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wait for cache sync and DAG build before starting xDS server #5672

Merged
merged 21 commits into from
Oct 10, 2023

Conversation

therealak12
Copy link
Contributor

@therealak12 therealak12 commented Aug 13, 2023

Closes #5550.
Closes #1280.

I've removed the x.mgr.GetCache().WaitForCacheSync() call as it's implicitly handled in the mgr.Start() flow.

As a TLDR, the PR prevents starting the XDS server and building the DAG until the cache is synced with the initial list of k8s objects.

@therealak12 therealak12 force-pushed the cache-sync-tracking branch 3 times, most recently from 8d9887a to 820b2f4 Compare August 14, 2023 19:06
@therealak12 therealak12 changed the title Track internal cache sync Wait for cache sync Aug 15, 2023
@therealak12 therealak12 marked this pull request as ready for review August 15, 2023 10:24
@therealak12 therealak12 requested a review from a team as a code owner August 15, 2023 10:24
@therealak12 therealak12 requested review from stevesloka and sunjayBhatia and removed request for a team August 15, 2023 10:24
@codecov
Copy link

codecov bot commented Aug 15, 2023

Codecov Report

Merging #5672 (d6881e7) into main (b865f33) will decrease coverage by 0.08%.
Report is 6 commits behind head on main.
The diff coverage is 30.76%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5672      +/-   ##
==========================================
- Coverage   78.59%   78.52%   -0.08%     
==========================================
  Files         138      138              
  Lines       19244    19299      +55     
==========================================
+ Hits        15125    15154      +29     
- Misses       3830     3853      +23     
- Partials      289      292       +3     
Files Coverage Δ
internal/featuretests/v3/featuretests.go 86.60% <100.00%> (-0.05%) ⬇️
internal/contour/handler.go 82.16% <61.29%> (-6.07%) ⬇️
cmd/contour/serve.go 19.82% <0.00%> (-0.31%) ⬇️

... and 1 file with indirect coverage changes

@therealak12
Copy link
Contributor Author

There's actually a better way to implement this. The server should only wait for the first DAG build. The handler itself then would ensure the first DAG build is done after the cache sync.

I'll push a commit to implement this.

Copy link
Member

@tsaarni tsaarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @therealak12!

So, my understanding from this is that WaitForCacheSync() might have worked - if we did NOT process the events asynchronously. Since client-go will be unaware of our background processing it returns while we still process our own queue. For this kind of situations, client-go offers utility SingleFileTracker, where user signals their async processing by using Start() and Finished() call-pair per resource in initial list, and HasSynced() which additionally covers the initial queue inside client.go. Is that correct?

I added some small questions inline as well.

The change makes sense to me and seems to work on my machine as well 🙂 👍

Would you add a changelog entry as well?

internal/contour/handler.go Outdated Show resolved Hide resolved
internal/contour/handler.go Show resolved Hide resolved
@tsaarni tsaarni added the release-note/minor A minor change that needs about a paragraph of explanation in the release notes. label Aug 23, 2023
@therealak12
Copy link
Contributor Author

Nice work @therealak12!

So, my understanding from this is that WaitForCacheSync() might have worked - if we did NOT process the events asynchronously. Since client-go will be unaware of our background processing it returns while we still process our own queue. For this kind of situations, client-go offers utility SingleFileTracker, where user signals their async processing by using Start() and Finished() call-pair per resource in initial list, and HasSynced() which additionally covers the initial queue inside client.go. Is that correct?

I added some small questions inline as well.

The change makes sense to me and seems to work on my machine as well 🙂 👍

Would you add a changelog entry as well?

Thanks for your review and comments. I try to explain what happens exactly:

  • The manager's WaitForCacheSync waits until the initial list of Kubernetes objects is delivered to the informers.
  • We register our handlers to these informers and ignore the returned HasSynced methods.
  • Although the full list of initial objects exists in the informers, they're not necessarily handled by the contour handler and thus they don't necessarily exist in the contour's internal cache.
  • The DAG rebuild goroutine starts rebuilding DAG and updating Ingress objects based on its own cache

This PR waits until all of the HasSynced methods that are received when registering handlers, return true, and then starts the DAG rebuild process.

Those HasSynced methods would return true when OnAdd method of the handler is called for all of the objects in the initial list. If we only rely on these HasSynced methods, we may start DAG rebuild process before putting the last object in the cache! (The last object because we use an unbuffered channel and thus OnAdd is blocked until the current object is read by the goroutine.)

This is why I've used the SingleFileTracker. It's decremented each time the OnAdd is called for an object of the initial list and incremented when its handling is done. So if syncTracker.HasSynced returns true, it means we are not processing any objects at that moment.

@tsaarni
Copy link
Member

tsaarni commented Aug 25, 2023

I have one more observation, but otherwise the change looks good to me 👍

Assume we have a lot of resources that require status update, let's say 2000 HTTPProxies with status marked as invalid, and the status now needs to be updated valid at once, because the error condition was fixed during Contour was down.

There seems to be a chance now, that Contour will NOT start XDS server before the statuses have been pushed to Kubernetes. It happens like following:

Sometimes Contour manages to acquire lease before client-go sync has finalised. StatusUpdateHandler will get started and processing status updates is enabled:

time="2023-08-25T14:16:15+03:00" level=info msg="attempting to acquire leader lease projectcontour/leader-elect...\n" caller="leaderelection.go:245" context=kubernetes
time="2023-08-25T14:16:15+03:00" level=info msg="successfully acquired lease projectcontour/leader-elect\n" caller="leaderelection.go:255" context=kubernetes
time="2023-08-25T14:16:15+03:00" level=info msg="started status update handler" context=StatusUpdateHandler
time="2023-08-25T14:16:15+03:00" level=info msg="received a new address for status.loadBalancer" context=loadBalancerStatusWriter loadbalancer-address=
time="2023-08-25T14:16:15+03:00" level=info msg="performing delayed update" context=contourEventHandler last_update=239.3048ms outstanding=3984

In this case the processing of status updates happens within rebuildDAG() before we have set e.initialDagBuilt = true. Due to the default client rate limits (adjustable by --kubernetes-client-qps and --kubernetes-client-burst) the XDS server will be down for quite a while, depending on how many statuses there are to update.

In my test it took 7 minutes to update 2000 HTTPProxies until I got this:

time="2023-08-25T14:22:33+03:00" level=info msg="the initial dag is built" context=xds
time="2023-08-25T14:22:33+03:00" level=info msg="started xDS server type: \"contour\"" address="0.0.0.0:8001" context=xds

Since the XDS server does not depend on statuses, I think it would make sense to set initialDagBuilt = True to start XDS server before looping and sending the status updates:

func (e *EventHandler) rebuildDAG() {
latestDAG := e.builder.Build()
e.observer.OnChange(latestDAG)
for _, upd := range latestDAG.StatusCache.GetStatusUpdates() {
e.statusUpdater.Send(upd)
}
}

Cc @sunjayBhatia, @skriss

@therealak12
Copy link
Contributor Author

therealak12 commented Aug 25, 2023

I have one more observation, but otherwise the change looks good to me 👍

Assume we have a lot of resources that require status update, let's say 2000 HTTPProxies with status marked as invalid, and the status now needs to be updated valid at once, because the error condition was fixed during Contour was down.

There seems to be a chance now, that Contour will NOT start XDS server before the statuses have been pushed to Kubernetes. It happens like following:

Sometimes Contour manages to acquire lease before client-go sync has finalised. StatusUpdateHandler will get started and processing status updates is enabled:

time="2023-08-25T14:16:15+03:00" level=info msg="attempting to acquire leader lease projectcontour/leader-elect...\n" caller="leaderelection.go:245" context=kubernetes
time="2023-08-25T14:16:15+03:00" level=info msg="successfully acquired lease projectcontour/leader-elect\n" caller="leaderelection.go:255" context=kubernetes
time="2023-08-25T14:16:15+03:00" level=info msg="started status update handler" context=StatusUpdateHandler
time="2023-08-25T14:16:15+03:00" level=info msg="received a new address for status.loadBalancer" context=loadBalancerStatusWriter loadbalancer-address=
time="2023-08-25T14:16:15+03:00" level=info msg="performing delayed update" context=contourEventHandler last_update=239.3048ms outstanding=3984

In this case the processing of status updates happens within rebuildDAG() before we have set e.initialDagBuilt = true. Due to the default client rate limits (adjustable by --kubernetes-client-qps and --kubernetes-client-burst) the XDS server will be down for quite a while, depending on how many statuses there are to update.

In my test it took 7 minutes to update 2000 HTTPProxies until I got this:

time="2023-08-25T14:22:33+03:00" level=info msg="the initial dag is built" context=xds
time="2023-08-25T14:22:33+03:00" level=info msg="started xDS server type: \"contour\"" address="0.0.0.0:8001" context=xds

Since the XDS server does not depend on statuses, I think it would make sense to set initialDagBuilt = True to start XDS server before looping and sending the status updates:

func (e *EventHandler) rebuildDAG() {
latestDAG := e.builder.Build()
e.observer.OnChange(latestDAG)
for _, upd := range latestDAG.StatusCache.GetStatusUpdates() {
e.statusUpdater.Send(upd)
}
}

Cc @sunjayBhatia, @skriss

If I've understood correctly, you mean we should allow status updates regardless of whether the cache is synced or not. (Please correct me if I'm wrong)

We have about 3000 httpproxy objects. They all use a delegated certificate. After a restart, if contour puts the TLSCertificateDelegation resource in its cache after the httpproxy objects, it marks all valid httpproxies as invalid! This is what triggered us to make this PR to wait for the full cache sync before starting XDS server, DAG rebuild, etc.

@tsaarni
Copy link
Member

tsaarni commented Aug 25, 2023

If I've understood correctly, you mean we should allow status updates regardless of whether the cache is synced or not. (Please correct me if I'm wrong)

No, we should wait for cache sync before status updates are started. However, the problem is that rebuildDAG() can still take minutes just to update the statuses, and during that time (in the scenario above) Envoys are not being served their configuration by Contour, since starting of XDS server (initialDagBuilt = True) is delayed until the function returns. Updating the status is slow background process, and I think we could already start the XDS server when the status update is going on.

If Envoys restart during Contour restart, Contour would be up and serving configuration in few hundred msecs (=time of sync) instead of few minutes (time of status update).

Even though the status updates are sent via a channel to be processed asynchronously, I think the channel itself blocks when we have enough updates and that is why statusUpdater.Send() inside rebuildDAG() blocks.

Copy link
Member

@tsaarni tsaarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @therealak12, this looks good to me! 🙏

Waiting for @skriss and @sunjayBhatia to take a look as well.

Copy link
Member

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks again @therealak12! Will leave open for a bit in case anyone else wants to look, but we'll definitely plan to get this into the upcoming 1.27 release.

@skriss skriss changed the title Wait for cache sync wait for cache sync and DAG build before starting xDS server Oct 5, 2023
Copy link
Member

@sunjayBhatia sunjayBhatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, just one requested change on the mechanism for signaling between goroutines that the race detector picks up

It might be good to add a run of the e2es where we add the -race flag when contour is compiled so we catch these sorts of things, can do as a daily build or even just enable for all runs by default

Makefile Outdated Show resolved Hide resolved
test/e2e/framework.go Outdated Show resolved Hide resolved
Copy link
Contributor

@clayton-gonsalves clayton-gonsalves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, modulo a few nits

internal/contour/handler.go Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
@sunjayBhatia sunjayBhatia enabled auto-merge (squash) October 10, 2023 12:43
@sunjayBhatia sunjayBhatia merged commit 2765c72 into projectcontour:main Oct 10, 2023
25 of 26 checks passed
yangyy93 added a commit to projectsesame/contour that referenced this pull request Oct 19, 2023
* provisioner: add field overloadMaxHeapSize for envoy (projectcontour#5699)

* add field overloadMaxHeapSize

Signed-off-by: yy <[email protected]>

* add changelog

Signed-off-by: yy <[email protected]>

* update changelog and configuration.md

Signed-off-by: yangyang <[email protected]>

---------

Signed-off-by: yy <[email protected]>
Signed-off-by: yangyang <[email protected]>

* build(deps): bump sigs.k8s.io/gateway-api from 0.8.0 to 0.8.1 (projectcontour#5757)

* build(deps): bump sigs.k8s.io/gateway-api from 0.8.0 to 0.8.1

Bumps [sigs.k8s.io/gateway-api](https://github.com/kubernetes-sigs/gateway-api) from 0.8.0 to 0.8.1.
- [Release notes](https://github.com/kubernetes-sigs/gateway-api/releases)
- [Changelog](https://github.com/kubernetes-sigs/gateway-api/blob/main/CHANGELOG.md)
- [Commits](kubernetes-sigs/gateway-api@v0.8.0...v0.8.1)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/gateway-api
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* make generate

Signed-off-by: Steve Kriss <[email protected]>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Steve Kriss <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Steve Kriss <[email protected]>

* build(deps): bump github.com/onsi/ginkgo/v2 from 2.12.0 to 2.12.1 (projectcontour#5781)

Bumps [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) from 2.12.0 to 2.12.1.
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.12.0...v2.12.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump google.golang.org/grpc from 1.58.1 to 1.58.2 (projectcontour#5780)

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.58.1 to 1.58.2.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.58.1...v1.58.2)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump github.com/vektra/mockery/v2 from 2.33.2 to 2.34.0 (projectcontour#5779)

Bumps [github.com/vektra/mockery/v2](https://github.com/vektra/mockery) from 2.33.2 to 2.34.0.
- [Release notes](https://github.com/vektra/mockery/releases)
- [Changelog](https://github.com/vektra/mockery/blob/master/docs/changelog.md)
- [Commits](vektra/mockery@v2.33.2...v2.34.0)

---
updated-dependencies:
- dependency-name: github.com/vektra/mockery/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Setting `disabled = true` on a route should disable the virtual host global rate limit policy (projectcontour#5657)

Support disabling global rate limiting on individual
routes by setting disabled=true.

Fixes projectcontour#5685.

Signed-off-by: shadi-altarsha <[email protected]>

* update Go to 1.21.1 (projectcontour#5783)


Signed-off-by: Steve Kriss <[email protected]>

* Fixup: Sort path matches based on length rather than lexi (projectcontour#5752)

Since Envoy is greedy matching path routes, order is important. Contour
decides to sort the routes in a way that is not really intuitive and can
lead to suprises.

In particular even tho the comment in the code state that routes are
ordered based on legnth the reality is that they are sorted based on string
comparison. This PR fixes this.

* I think the current behaviour doesnt make much sense and it is a bit brittle.
* Updating the behaviour has significant update risk since there might be folks
that rely on this routing behaviour without really knowing it.
* Should we even merge this PR? I am of two minds and I would like some input:

1. Option (1): Merge it as and make a clear changelog/announcement about the fix
2. Option (2): Create a config flag with a feature-flag e.g. `route_sorting_strategy` and switch the implementation
to not do sorting when the flag is present. That way it allows folks to opt-out from the sorting as they need to.

Longest path based matching kinda makes sense to me now that I know about it, but it is rough edge than needs users to
be familiar with contour and it is harder to socialize in larger teams.

Signed-off-by: Sotiris Nanopoulos <[email protected]>

* build(deps): bump github.com/onsi/gomega from 1.27.10 to 1.28.0 (projectcontour#5792)

Bumps [github.com/onsi/gomega](https://github.com/onsi/gomega) from 1.27.10 to 1.28.0.
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.27.10...v1.28.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/gomega
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump github.com/cert-manager/cert-manager (projectcontour#5791)

Bumps [github.com/cert-manager/cert-manager](https://github.com/cert-manager/cert-manager) from 1.13.0 to 1.13.1.
- [Release notes](https://github.com/cert-manager/cert-manager/releases)
- [Commits](cert-manager/cert-manager@v1.13.0...v1.13.1)

---
updated-dependencies:
- dependency-name: github.com/cert-manager/cert-manager
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump github.com/vektra/mockery/v2 from 2.34.0 to 2.34.2 (projectcontour#5793)

Bumps [github.com/vektra/mockery/v2](https://github.com/vektra/mockery) from 2.34.0 to 2.34.2.
- [Release notes](https://github.com/vektra/mockery/releases)
- [Changelog](https://github.com/vektra/mockery/blob/master/docs/changelog.md)
- [Commits](vektra/mockery@v2.34.0...v2.34.2)

---
updated-dependencies:
- dependency-name: github.com/vektra/mockery/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump github.com/prometheus/client_golang (projectcontour#5790)

Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.16.0 to 1.17.0.
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md)
- [Commits](prometheus/client_golang@v1.16.0...v1.17.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/client_golang
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* HTTPProxy: allow dynamic Host header rewrite (projectcontour#5678)

Allows the Host header to be rewritten to the value
of another header while forwarding the request to
the upstream. This is possible at the route level only.

Fixes projectcontour#5673.

Signed-off-by: Clayton Gonsalves <[email protected]>

* fix spelling errors (projectcontour#5798)

Signed-off-by: Steve Kriss <[email protected]>

* hack: bump codespell version to match GH action (projectcontour#5799)

Signed-off-by: Steve Kriss <[email protected]>

* gateway provisioner: add flags to enable running provisioner out of cluster (projectcontour#5686)

Adds --incluster and --kubeconfig flags to
the gateway provisioner to enable running
outside of the cluster.

Signed-off-by: gang.liu <[email protected]>

* site: Bump Hugo to 0.119.0 (projectcontour#5795)

- Also implement more consistent toml file indenting for readability
- Asset optimization is deprecated by netlify, see: https://answers.netlify.com/t/please-read-deprecation-of-post-processing-asset-optimization/96657

Signed-off-by: Sunjay Bhatia <[email protected]>

* internal/dag: default Listener ResolvedRefs to true (projectcontour#5804)

Sets Gateway Listeners' ResolvedRefs condition
to true by default, to pass updated conformance.

Closes projectcontour#5648.

Signed-off-by: Steve Kriss <[email protected]>

* build(deps): bump golang.org/x/oauth2 from 0.12.0 to 0.13.0 (projectcontour#5810)

Bumps [golang.org/x/oauth2](https://github.com/golang/oauth2) from 0.12.0 to 0.13.0.
- [Commits](golang/oauth2@v0.12.0...v0.13.0)

---
updated-dependencies:
- dependency-name: golang.org/x/oauth2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump github.com/vektra/mockery/v2 from 2.34.2 to 2.35.2 (projectcontour#5809)

Bumps [github.com/vektra/mockery/v2](https://github.com/vektra/mockery) from 2.34.2 to 2.35.2.
- [Release notes](https://github.com/vektra/mockery/releases)
- [Changelog](https://github.com/vektra/mockery/blob/master/docs/changelog.md)
- [Commits](vektra/mockery@v2.34.2...v2.35.2)

---
updated-dependencies:
- dependency-name: github.com/vektra/mockery/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump github.com/prometheus/client_model (projectcontour#5811)

Bumps [github.com/prometheus/client_model](https://github.com/prometheus/client_model) from 0.4.1-0.20230718164431-9a2bf3000d16 to 0.5.0.
- [Release notes](https://github.com/prometheus/client_model/releases)
- [Commits](https://github.com/prometheus/client_model/commits/v0.5.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/client_model
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* switch to github.com/distribution/parse (projectcontour#5818)

Signed-off-by: Steve Kriss <[email protected]>

* deps: Bump logrusr to v4.0.0 (projectcontour#5806)

Fixes data races found in projectcontour#5805

Also remove testing around V().Info()

logrusr has changed behavior since v3.0.0, it now tries to mimic logrus
log levels with the V() level, see:
bombsimon/logrusr@9f3fd50

In practice client-go checks if a certain verbosity level is enabled and
initializes a different logger based on that and then uses Info(f) logs,
rather than the V().Info() construction.

This commit removes the testing of log lines written with V() guarding
them and rather just tests the expected verbosity is enabled or not.

Signed-off-by: Sunjay Bhatia <[email protected]>

* wait for cache sync and DAG build before starting xDS server (projectcontour#5672)

Prevents starting the XDS server and building the DAG until the cache is synced with the initial list of k8s objects and these events are processed by the event handler

Signed-off-by: Ahmad Karimi <[email protected]>

* internal/xdscache: Generate uuid for snapshot version (projectcontour#5819)

Snapshotter had a data race reading/writing the snapshot version between
threads. This version is not in practice used for the contour xDS server
DiscoveryResponse versions but is in the go-control-plane version.

Fixes: projectcontour#5482

Signed-off-by: Sunjay Bhatia <[email protected]>

* Bump Envoy to 1.27.1 (projectcontour#5821)

See release notes:
https://www.envoyproxy.io/docs/envoy/v1.27.1/version_history/v1.27/v1.27.1

Signed-off-by: Sunjay Bhatia <[email protected]>

* build(deps): bump golang.org/x/net from 0.16.0 to 0.17.0 (projectcontour#5829)

Bumps [golang.org/x/net](https://github.com/golang/net) from 0.16.0 to 0.17.0.
- [Commits](golang/net@v0.16.0...v0.17.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump google.golang.org/grpc from 1.58.2 to 1.58.3 (projectcontour#5833)

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.58.2 to 1.58.3.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.58.2...v1.58.3)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump github.com/onsi/ginkgo/v2 from 2.12.1 to 2.13.0 (projectcontour#5831)

Bumps [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) from 2.12.1 to 2.13.0.
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.12.1...v2.13.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump github.com/vektra/mockery/v2 from 2.35.2 to 2.35.4 (projectcontour#5834)

Bumps [github.com/vektra/mockery/v2](https://github.com/vektra/mockery) from 2.35.2 to 2.35.4.
- [Release notes](https://github.com/vektra/mockery/releases)
- [Changelog](https://github.com/vektra/mockery/blob/master/docs/changelog.md)
- [Commits](vektra/mockery@v2.35.2...v2.35.4)

---
updated-dependencies:
- dependency-name: github.com/vektra/mockery/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump github.com/google/go-cmp from 0.5.9 to 0.6.0 (projectcontour#5832)

Bumps [github.com/google/go-cmp](https://github.com/google/go-cmp) from 0.5.9 to 0.6.0.
- [Release notes](https://github.com/google/go-cmp/releases)
- [Commits](google/go-cmp@v0.5.9...v0.6.0)

---
updated-dependencies:
- dependency-name: github.com/google/go-cmp
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump go to 1.21.3 (projectcontour#5841)


Signed-off-by: Sunjay Bhatia <[email protected]>

* Add configurability for HTTP requests per IO cycle (projectcontour#5827)

An additional mitigation to CVE-2023-44487 available in Envoy 1.27.1.
This change allows configuring the http.max_requests_per_io_cycle Envoy
runtime setting via Contour configuration to allow administrators of
Contour to prevent abusive connections from starving resources from
others. The default is left as the existing behavior, that is no limit,
so as not to impact existing valid traffic.

See the Envoy release notes for more information:
https://www.envoyproxy.io/docs/envoy/v1.27.1/version_history/v1.27/v1.27.1

Signed-off-by: Sunjay Bhatia <[email protected]>

* provisioner: fix envoy-max-heapsize not set (projectcontour#5814)

* fix envoy-max-heapsize not set

Signed-off-by: yangyang <[email protected]>

* add ut

Signed-off-by: yangyang <[email protected]>

* update ut

Signed-off-by: yangyang <[email protected]>

---------

Signed-off-by: yangyang <[email protected]>

* HTTP/2 max concurrent streams can be configured (projectcontour#5850)

Adds a global Listener configuration field for admins to be able to
protect their installations of Contour/Envoy with a limit. Default is no
limit to ensure existing behavior is not impacted for valid traffic.
This field can be used for tuning resource usage or mitigated DOS
attacks like in CVE-2023-44487.

Also fixes omitempty tags on MaxRequestsPerIOCycle field.

Fixes: projectcontour#5846

Signed-off-by: Sunjay Bhatia <[email protected]>

* Bump Envoy to v1.27.2 (projectcontour#5863)

See release notes:
https://www.envoyproxy.io/docs/envoy/v1.27.2/version_history/v1.27/v1.27.2

Signed-off-by: Sunjay Bhatia <[email protected]>

* site: 1.26.1, 1.25.3, 1.24.6 patch releases (projectcontour#5859)


Signed-off-by: Sunjay Bhatia <[email protected]>

* test/e2e: Add race detection in e2e tests (projectcontour#5805)

Compile contour binary with -race flag and look for "DATA RACE" in
stderr. Fails test if found.

Signed-off-by: Sunjay Bhatia <[email protected]>

* golangci-lint: Fix revive rules (projectcontour#5857)

When we enabled the use-any rule we disabled all the default rules that
are run by revive (see: https://revive.run/docs#golangci-lint)

This change grabs all the default rules from
https://github.com/mgechev/revive/blob/master/defaults.toml and adds the
use-any rule

Also fixes outstanding lint issues

Signed-off-by: Sunjay Bhatia <[email protected]>

* crd/ContourDeployment: Add field 'podLabels' for contour (#2)

* add pod labels field to contourDeployment

---------

Signed-off-by: yy <[email protected]>
Signed-off-by: yangyang <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Steve Kriss <[email protected]>
Signed-off-by: shadi-altarsha <[email protected]>
Signed-off-by: Sotiris Nanopoulos <[email protected]>
Signed-off-by: Clayton Gonsalves <[email protected]>
Signed-off-by: gang.liu <[email protected]>
Signed-off-by: Sunjay Bhatia <[email protected]>
Signed-off-by: Ahmad Karimi <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Steve Kriss <[email protected]>
Co-authored-by: Shadi Altarsha <[email protected]>
Co-authored-by: Sotiris Nanopoulos <[email protected]>
Co-authored-by: Clayton Gonsalves <[email protected]>
Co-authored-by: izturn <[email protected]>
Co-authored-by: Sunjay Bhatia <[email protected]>
Co-authored-by: Ahmad Karimi <[email protected]>
@m-yosefpor m-yosefpor deleted the cache-sync-tracking branch January 8, 2024 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/minor A minor change that needs about a paragraph of explanation in the release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache sync latency Wait to discover all K8s resources before sending xDS responses
6 participants