Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support health check information? #24

Open
ogrodnek opened this issue Oct 6, 2016 · 3 comments
Open

support health check information? #24

ogrodnek opened this issue Oct 6, 2016 · 3 comments

Comments

@ogrodnek
Copy link

ogrodnek commented Oct 6, 2016

First, thanks for the great mesos and marathon tooling -- it's all been extremely solid and reliable!

I am using marathon-consul along with haproxy-consul for load balancing.

As soon as a task is running, it's added to consul and thus the haproxy config. If an application has a non-trivial start up time, I will start to see 503s. That is, tasks are added to the LB before they are reporting healthy.

I've been trying to figure out the best way to solve this. The app definition in consul does have health check info, so, theoretically I could translate the marathon checks to haproxy checks in my haproxy-consul template. Practically though the marathon checks could be more complex than haproxy can express (e.g. some of the apps are using "COMMAND" marathon health checks). It also seems non-trivial to be able to do the translation in the template, and, there might be a chance haproxy and marathon don't agree.

My current thinking is that it makes sense to leave the health checks evaluated by marathon, and include this information in the KV store (perhaps /<app>/<task>/healthCheck ?)

Then in the template I can check that tasks where their app has a health check is defined is only rendered if the task's health check is passing (i.e. not unknown or failing).

I believe this can be done by adding support for the following event types from the event bus:

  • add_health_check_event
  • remove_health_check_event
  • failed_health_check_event
  • health_status_changed_event

and including this new information in the KV.

I am happy to submit a PR for this, and would love to get any feedback on this approach, if there's a better way, etc.

thanks!

@ogrodnek
Copy link
Author

I've simplified this a bit, by including a HealthCheckResults field directly into the task (already reported by marathon for GET /v2/apps/<app>/tasks), and updating this in the KV when the health_status_changed_event event is fired.

PR is here: #25

would love any feedback,
thanks!

@bwhaley
Copy link

bwhaley commented Nov 15, 2016

We are using this in production now and it's working great. Really helps out with the lag time between new tasks being added and actually being alive and healthy. 👍

@yogeshnath
Copy link

I recently updated to this version to get rid of 503's, It works great if the healthcheck is http or tcp.

Though if healthcheck is based on command ( say curl), Marathon actually shows it as healthy but consul json has "healthCheckResults" set to null. So now Haproxy fails to pick it up based on healthy value ( "healthy":false).

any reason why?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants