You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This might be one of the wildcard ideas which we can drop later (or might be rendered useless by AWS NAT gateway service) but anyway dropping it here.
The NAT (0.0.0.0) route management mostly dependent upon AWS VPC route tables info on black holed routes, to figure out failed instance and replace them, which is probably the best option now at the time of network partitions. But it is found that sometimes it takes a while for AWS API to report a route as black holed and its dependent on instance states, so if there is any other issue that causes NAT to fail in the particular region , the route management is not clean at the moment.
One of the probable solution(??) for this problem is adding an ability support external health checks. FWI I find monitoring systems such as sensu provide better view of the health of the instance even at the time of network partitioning (provided its set up in a partition tolerant fashion). We can possibly use this in our case.
The workflow would roughly be,
Add an external check to query for health of the NAT instances (of other AZs in the same region).
healthcheck/* kicks in the query and returns the value to awsnycast (eg: using sensu client API ).
The order of evaluation could be instance state > Ext. Healthcheck (if present) > Ext. Ping (google)
Route gets marked as unhealthy by evaluating in the following order.
If marked as unhealthy route replacement kicks in.
The text was updated successfully, but these errors were encountered:
I've just finished adding command healthchecks, which can be used for (some of) this - i.e. you could write a script to check health state from Sensu, and plug that in as a command healthcheck.
I guess that we'd want to be able to have multiple healthchecks for a single route to fully support this though.
Definitely worth chatting about / thinking about further though :)
This might be one of the wildcard ideas which we can drop later (or might be rendered useless by AWS NAT gateway service) but anyway dropping it here.
The NAT (0.0.0.0) route management mostly dependent upon AWS VPC route tables info on black holed routes, to figure out failed instance and replace them, which is probably the best option now at the time of network partitions. But it is found that sometimes it takes a while for AWS API to report a route as black holed and its dependent on instance states, so if there is any other issue that causes NAT to fail in the particular region , the route management is not clean at the moment.
One of the probable solution(??) for this problem is adding an ability support external health checks. FWI I find monitoring systems such as sensu provide better view of the health of the instance even at the time of network partitioning (provided its set up in a partition tolerant fashion). We can possibly use this in our case.
The workflow would roughly be,
The text was updated successfully, but these errors were encountered: