-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
controller: Proactively disconnect node based on heartbeat #911
Conversation
Signed-off-by: Daiki Ueno <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! @dofmind
Overall the PR looks good. I added a few minor comments, mostly about logging. I tested your changes manually (also via valgrind) and it works well. Nonetheless, please add two integration tests:
- checking that
HeartbeatInteral=0
disables it
(e.g. by confirming the agent stays connected after x sec when both, agent and controller heartbeat are disabled and controller threshold is y sec) - checking that the agent gets disconnected when not the threshold is reached (e.g. by disabling the agent heartbeat)
153128b
to
f13eb8e
Compare
This adds a couple of new options to controller: HeartbeatInterval and NodeHeartbeatThreshold, which can be used to actively disconnect node based on the last sent heartbeat. The controller periodically checks the last seen timestamp of nodes, and if it was sent before NodeHeartbeatThreshold, the controller treats it as disconnected. Signed-off-by: Daiki Ueno <[email protected]> Signed-off-by: Joonyoung Shim <[email protected]>
f13eb8e
to
0157cdc
Compare
0157cdc
to
a278805
Compare
An integration test is to verify if default configuration of controller disables periodic heartbeat of the controller, and the other is to verify if the node gets disconnected when did not receive heartbeat since threshold from node. Signed-off-by: Joonyoung Shim <[email protected]>
a278805
to
37269b6
Compare
LGTM and thanks for adding tests! @dofmind Could you add the two new configuration options to the bluechi-controller.conf.5.md? Sorry, I missed that during the previous review. Then, I think, the PR is ready to merge. |
This PR already has them. Did I miss anything? |
No, but I did. Sorry about that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Created an additional documentation issue for extending the readthedocs #913
This PR is an update of #870
The main difference is that the code is simplified by using microseconds in last_seen instead of the struct timespec.
It updates last_seen in node_method_register() to prevent the node disconnecting right now when last_seen is 0 or too old.
Additionally, it handles disconnected nodes by calling node_disconnected() instead of controller_remove_node(). Otherwise an error will occur.
Fixes: #857