Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healthcheck using existing monitoring systems #9

Open
sarguru opened this issue Feb 5, 2016 · 1 comment
Open

Healthcheck using existing monitoring systems #9

sarguru opened this issue Feb 5, 2016 · 1 comment

Comments

@sarguru
Copy link
Collaborator

sarguru commented Feb 5, 2016

This might be one of the wildcard ideas which we can drop later (or might be rendered useless by AWS NAT gateway service) but anyway dropping it here.

The NAT (0.0.0.0) route management mostly dependent upon AWS VPC route tables info on black holed routes, to figure out failed instance and replace them, which is probably the best option now at the time of network partitions. But it is found that sometimes it takes a while for AWS API to report a route as black holed and its dependent on instance states, so if there is any other issue that causes NAT to fail in the particular region , the route management is not clean at the moment.

One of the probable solution(??) for this problem is adding an ability support external health checks. FWI I find monitoring systems such as sensu provide better view of the health of the instance even at the time of network partitioning (provided its set up in a partition tolerant fashion). We can possibly use this in our case.

The workflow would roughly be,

  • Add an external check to query for health of the NAT instances (of other AZs in the same region).
  • healthcheck/* kicks in the query and returns the value to awsnycast (eg: using sensu client API ).
  • The order of evaluation could be instance state > Ext. Healthcheck (if present) > Ext. Ping (google)
  • Route gets marked as unhealthy by evaluating in the following order.
  • If marked as unhealthy route replacement kicks in.
@bobtfish
Copy link
Owner

bobtfish commented Mar 5, 2016

I like this idea :)

I've just finished adding command healthchecks, which can be used for (some of) this - i.e. you could write a script to check health state from Sensu, and plug that in as a command healthcheck.

I guess that we'd want to be able to have multiple healthchecks for a single route to fully support this though.

Definitely worth chatting about / thinking about further though :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants