-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(delta): force eds response after cds change #8
Conversation
Signed-off-by: Lukasz Dziedziak <[email protected]>
Seems like sometimes there is still issue, I need to investigate more :/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of questions but overall looks alright I think
} | ||
|
||
if count := c.GetStatusInfo(key).GetNumDeltaWatches(); count != len(testTypes) { | ||
t.Errorf("watches should be created for the latest version, saw %d watches expected %d", count, len(testTypes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be useful to print the types that were missing?
if count := c.GetStatusInfo(key).GetNumDeltaWatches(); count != len(testTypes)-2 { | ||
t.Errorf("watches should be preserved for all but two, got: %d open watches instead of the expected %d open watches", count, len(testTypes)-1) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this, we check equality for -2
but we print -1
why? also, nitpick can we have 1
/2
as a const and name it descriptively?
|
||
// validate response for endpoints | ||
select { | ||
case out := <-watches[testTypes[1]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assign this testTypes[1]
to a variable so I know what type that is? is this rsrc.ClusterType
?
assertResourceMapEqual(t, cache.IndexRawResourcesByName(out.(*cache.RawDeltaResponse).Resources), snapshot2.GetResources(rsrc.ClusterType)) | ||
vMap := out.GetNextVersionMap() | ||
versionMap[testTypes[1]] = vMap | ||
case out := <-watches[testTypes[0]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above ☝️
snapshot2.Resources[types.Cluster] = cache.NewResources(fixture.version2, []types.Resource{cluster}) | ||
assertResourceMapEqual(t, cache.IndexRawResourcesByName(out.(*cache.RawDeltaResponse).Resources), snapshot2.GetResources(rsrc.ClusterType)) | ||
vMap := out.GetNextVersionMap() | ||
versionMap[testTypes[1]] = vMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we writing something when this code block is supposed to validate things? 🤔
@@ -496,6 +505,15 @@ func (cache *snapshotCache) respond(ctx context.Context, request *Request, value | |||
} | |||
} | |||
|
|||
func getForcePushEdsResourcesNames(typ types.ResponseType, response *RawDeltaResponse) ([]string, types.ResponseType, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need type? it's either endpoint
or unknown
and we already have the last boolean
🤔
@@ -43,6 +43,14 @@ type StreamState struct { // nolint:golint,revive | |||
|
|||
// Ordered indicates whether we want an ordered ADS stream or not | |||
ordered bool | |||
|
|||
// ForcePush indicates if should push the response even when the version is the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// ForcePush indicates if should push the response even when the version is the same | |
// ForcePush indicates if the control plane should push the response even when the version is the same |
|
||
// ForcePush indicates if should push the response even when the version is the same | ||
// https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#resource-warming | ||
// This is required in the situation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This is required in the situation: | |
// This is required in the following situations: |
// This is required in the situation: | ||
// 1) There is a Cluster change and control plane responds with CDS | ||
// 2) Envoy has a cluster in the warming phase until there is a EDS response, if endpoints haven't changed | ||
// there is no EDS send and changes to the clusters are blocked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// there is no EDS send and changes to the clusters are blocked | |
// there is no EDS sent and changes to the clusters are blocked |
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
When there is a CDS update, we need to send an EDS response to transition clusters from the warming phase. If there is no update, Envoy can end up waiting indefinitely for an EDS response. In Envoy 1.32, an EDS cache is enabled by default, and after the
initial_fetch_timeout
elapses, Envoy uses the cached data. However, if this timeout is disabled, the issue reoccurs. Additionally, waiting for theinitial_fetch_timeout
to expire slows down the propagation of changes.Upstream issue: envoyproxy#1035
Upstream PR (no review): envoyproxy#981