Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

controller: Proactively disconnect node based on heartbeat #911

Merged
merged 3 commits into from
Jun 26, 2024

Conversation

dofmind
Copy link
Contributor

@dofmind dofmind commented Jun 21, 2024

This PR is an update of #870

The main difference is that the code is simplified by using microseconds in last_seen instead of the struct timespec.
It updates last_seen in node_method_register() to prevent the node disconnecting right now when last_seen is 0 or too old.
Additionally, it handles disconnected nodes by calling node_disconnected() instead of controller_remove_node(). Otherwise an error will occur.

bluechi-controller: ../git/src/controller/node.c:123: unit_subscriptions_clear: Assertion `LIST_IS_EMPTY(usubs->subs)' failed.

Fixes: #857

@coveralls
Copy link

Coverage Status

coverage: 84.541% (-0.6%) from 85.15%
when pulling 64025ba on dofmind:controller-heartbeat
into af1dcfe on eclipse-bluechi:main.

Copy link
Member

@engelmi engelmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! @dofmind

Overall the PR looks good. I added a few minor comments, mostly about logging. I tested your changes manually (also via valgrind) and it works well. Nonetheless, please add two integration tests:

  • checking that HeartbeatInteral=0 disables it
    (e.g. by confirming the agent stays connected after x sec when both, agent and controller heartbeat are disabled and controller threshold is y sec)
  • checking that the agent gets disconnected when not the threshold is reached (e.g. by disabling the agent heartbeat)

doc/man/bluechi-controller.conf.5.md Outdated Show resolved Hide resolved
doc/man/bluechi-controller.conf.5.md Outdated Show resolved Hide resolved
src/controller/controller.c Show resolved Hide resolved
src/controller/controller.c Outdated Show resolved Hide resolved
src/controller/controller.c Outdated Show resolved Hide resolved
config/controller/controller.conf Outdated Show resolved Hide resolved
config/controller/controller.conf Outdated Show resolved Hide resolved
config/controller/controller.conf Outdated Show resolved Hide resolved
@dofmind dofmind force-pushed the controller-heartbeat branch 2 times, most recently from 153128b to f13eb8e Compare June 25, 2024 10:12
This adds a couple of new options to controller: HeartbeatInterval and
NodeHeartbeatThreshold, which can be used to actively disconnect node
based on the last sent heartbeat.  The controller periodically checks
the last seen timestamp of nodes, and if it was sent before
NodeHeartbeatThreshold, the controller treats it as disconnected.

Signed-off-by: Daiki Ueno <[email protected]>
Signed-off-by: Joonyoung Shim <[email protected]>
@coveralls
Copy link

Coverage Status

coverage: 84.854% (-0.3%) from 85.15%
when pulling 0157cdc on dofmind:controller-heartbeat
into af1dcfe on eclipse-bluechi:main.

@coveralls
Copy link

Coverage Status

coverage: 85.154% (+0.004%) from 85.15%
when pulling a278805 on dofmind:controller-heartbeat
into af1dcfe on eclipse-bluechi:main.

An integration test is to verify if default configuration of controller
disables periodic heartbeat of the controller, and the other is to
verify if the node gets disconnected when did not receive heartbeat
since threshold from node.

Signed-off-by: Joonyoung Shim <[email protected]>
@coveralls
Copy link

Coverage Status

coverage: 85.154% (+0.004%) from 85.15%
when pulling 37269b6 on dofmind:controller-heartbeat
into af1dcfe on eclipse-bluechi:main.

@engelmi
Copy link
Member

engelmi commented Jun 26, 2024

LGTM and thanks for adding tests!

@dofmind Could you add the two new configuration options to the bluechi-controller.conf.5.md? Sorry, I missed that during the previous review. Then, I think, the PR is ready to merge.

@dofmind
Copy link
Contributor Author

dofmind commented Jun 26, 2024

@dofmind Could you add the two new configuration options to the bluechi-controller.conf.5.md? Sorry, I missed that during the previous review. Then, I think, the PR is ready to merge.

This PR already has them. Did I miss anything?

@engelmi
Copy link
Member

engelmi commented Jun 26, 2024

@dofmind Could you add the two new configuration options to the bluechi-controller.conf.5.md? Sorry, I missed that during the previous review. Then, I think, the PR is ready to merge.

This PR already has them. Did I miss anything?

No, but I did. Sorry about that.

Copy link
Member

@engelmi engelmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Created an additional documentation issue for extending the readthedocs #913

@engelmi engelmi merged commit 4884806 into eclipse-bluechi:main Jun 26, 2024
21 checks passed
@dofmind dofmind deleted the controller-heartbeat branch August 1, 2024 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFE: periodic heartbeat in bluechi-controller
4 participants