Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix backend update racing problem with dpvs-agent and healthcheck. #937

Merged
merged 20 commits into from
Apr 10, 2024

Conversation

ywc689
Copy link
Collaborator

@ywc689 ywc689 commented Jan 24, 2024

No description provided.

you-looks-not-tasty and others added 14 commits January 23, 2024 13:53
…ged adminstratively

When healthcheck program updating the rs's weight, it should consider:

* rs's weight may change adminstratively
* rs availabilty may change due to the server is closed or crashed

Healthcheck stores the original weight when it sends down notification in order to restore
the rs's weight when it gets healthy later. The problem is original weight is syncd from
dpvs periodically and not always up-to-date, which may result in incorrect weight update in
healthcheck up notification.

Signed-off-by: ywc689 <[email protected]>
Unfortunately, the CAS solution in the last commit is not effective for healthcheck
process cannot maintain the correct previous weight value. Thus resource version is
used instead, which should be passed to the lb api when the healthcheck try to update
rs, and the api would reject the request if the resouce version is outdated.

use vs version to resync target status when notification fails

Signed-off-by: ywc689 <[email protected]>
- update config version finally in updateConfig and only when the backend config is in healthy state
- fix notification resync fail problem caused by insufficient channel size
- fix array index problem in function NewTargetFromStr

Signed-off-by: ywc689 <[email protected]>
…g. POST|PUT /vs/${ID}/rs, PUT /vs/${ID})

Healthcheck api /vs/${ID}/rs/health would not update local cache.

GET API /vs | /vs/${ID} response dpvs-agent local cache info default
and response dpvs running service detail with query param `healthcheck=true`
@ywc689 ywc689 added the pr/to-review-codes review codes line by line and check if problem exists. label Mar 6, 2024
@ywc689 ywc689 added pr/accepted the pr passed all review stages and await to be merged and removed pr/to-review-codes review codes line by line and check if problem exists. labels Apr 10, 2024
@ywc689 ywc689 merged commit 3ec1d59 into iqiyi:devel Apr 10, 2024
3 checks passed
}

// if params.Snapshot != nil && *params.Snapshot {
// shareSnapshot.DumpTo(settings.LocalConfigFile(), h.logger)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you remove DumpTo? And how to generate cache file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value of snapshot is true, and the dpvs-healthcheck module has not been set snapshot to false.

healthcheck module makes heavily invoked to getVs(), which will generate a large number of cache files.

We will fix this issue in the next version.

@ywc689 ywc689 deleted the fix-hc-agent-racing branch July 10, 2024 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/accepted the pr passed all review stages and await to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants