Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What if a systems fails with panic/dead or partially dead? #64

Open
rjsuresh opened this issue Mar 5, 2019 · 0 comments
Open

What if a systems fails with panic/dead or partially dead? #64

rjsuresh opened this issue Mar 5, 2019 · 0 comments

Comments

@rjsuresh
Copy link

rjsuresh commented Mar 5, 2019

Since the ByNar is running as binary (agent) in the system, what happens on the following scenario?

  • Kernel panic
  • System rebooted, not up?
  • Someone stopped the agent and not restarted?
  • Partially died due to hardware (memory, cpu, raid...)

When system goes off then the agent goes off as the agent is running on the system which should be healthy to execute the monitoring.

Possible Solution:

  • Client/Server Architecture ?
  • Peer to Peer monitoring (ex. CEPH OSDs)?

Possible issue again on the solution:

  • Client / Server architecture needs administrative overhead, fail over, firewall, DR, certs, LB and redundancy....
  • Peer to Peer - Message broadcasting or streamlined/narrow down approach. Example, A failed system should be monitored only by the neighbors? A system before and after the sequence ?

Just throwing my thoughts so not miss. :)

@rjsuresh rjsuresh changed the title What if system fails with panic/dead or partially dead? What if a systems fails with panic/dead or partially dead? Mar 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant