What if a systems fails with panic/dead or partially dead? #64

rjsuresh · 2019-03-05T19:03:07Z

Since the ByNar is running as binary (agent) in the system, what happens on the following scenario?

Kernel panic
System rebooted, not up?
Someone stopped the agent and not restarted?
Partially died due to hardware (memory, cpu, raid...)

When system goes off then the agent goes off as the agent is running on the system which should be healthy to execute the monitoring.

Possible Solution:

Client/Server Architecture ?
Peer to Peer monitoring (ex. CEPH OSDs)?

Possible issue again on the solution:

Client / Server architecture needs administrative overhead, fail over, firewall, DR, certs, LB and redundancy....
Peer to Peer - Message broadcasting or streamlined/narrow down approach. Example, A failed system should be monitored only by the neighbors? A system before and after the sequence ?

Just throwing my thoughts so not miss. :)

rjsuresh changed the title ~~What if system fails with panic/dead or partially dead?~~ What if a systems fails with panic/dead or partially dead? Mar 5, 2019

sdandam mentioned this issue Jun 27, 2019

make Bynar peer to peer #83

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What if a systems fails with panic/dead or partially dead? #64

What if a systems fails with panic/dead or partially dead? #64

rjsuresh commented Mar 5, 2019 •

edited

Loading

What if a systems fails with panic/dead or partially dead? #64

What if a systems fails with panic/dead or partially dead? #64

Comments

rjsuresh commented Mar 5, 2019 • edited Loading

rjsuresh commented Mar 5, 2019 •

edited

Loading