Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken Connection Event/Alert #419

Open
vkumarsharma opened this issue Apr 24, 2024 · 3 comments
Open

Broken Connection Event/Alert #419

vkumarsharma opened this issue Apr 24, 2024 · 3 comments

Comments

@vkumarsharma
Copy link

Hi,

My issue is the capability to to raise an alert when a target goes down.

If i turn off a target router, i see an error logged in the log file and gnmic trying to reestablish connection. Is there a way an event/trigger of the same can be generated? Or should i just write my own script to scrap the log file for this event and raise an alert from the script?

Thank you

@karimra
Copy link
Collaborator

karimra commented Apr 24, 2024

Can you elaborate a bit on what kind of event/alert you want to raise ? from gNMIc to where? With which protocol ? ...

@vkumarsharma
Copy link
Author

Thanks Karim for responding. Here is my use case

  1. I have an observability frontend on all my network devices.
  2. I use gnmic to subscribe to these multiple network devices (routers/switches.) to moniter their state and resport back to my front end using Kafka. So for instance if an interface goes down, i get an event through subscription event and i put it in a Kafka Topic which can be consumed by different software components to report the state to the front end user quickly.
  3. However if the router itself goes down, I cannot directly figure that out using any events or triggers from gnmic, The only way i can see right now is through log files where gnmic reports broken connection.
  4. So what i have to do is to run a script that reads the file continiously to see the relevant error and then put this as a message in Kafka Queue to communicate a router going down.Similarly when i see a log of connection getting re-established I send a message to communicate a router coming back online.

So my query is where my solution at point 4, is it the best i can do with Gnmic or is there a more streamlined approach offered(Through some trigger/event in gnmic itself)?

Thanks agin

@mwdomino
Copy link

To solve a similar issue we use a single subscription (device version) as a "heartbeat" which is run every minute. If our application detects that it has not received a message on that topic for >3min (3 intervals) we consider it to have gone offline and we increment a prometheus counter in our application. We then clear this prometheus counter if we see the device begin sending messages again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants