-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes-friendly health checking #6023
Comments
Updated the description with another potential solution based on HashiCorp Vault's health check endpoint. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. We value your input and contribution. Please leave a comment if this issue still affects you. |
Don't close this. The fact that issues like this haven't been addressed is one of the primary reasons my org is moving away from all use of Habitat. The use of the stale bot just adds insult to injury. |
It looks like this was at least partially addressed in October 2018 in #5725, specifically this change which maps health check results into HTTP status codes. This means that the health endpoint response code will reflect the actual status, rather than always returning 200. To address the individual specific requests from this issue, though, I've spun out #7689 and #7690. |
Background
In the Kubernetes realm, liveness and readiness probes are used to determine when containers should be restarted due to issues and when a container is available to serve traffic, respectively. These checks can be performed in three ways:
Current situation
Habitat’s built-in health-checking mechanism currently only reports status through the supervisor REST API at
/services/<name>/<group>/health
in a format that the probe requests can’t use. The API currently always returns a 200 response with the health status encoded in the JSON response.Proposed Solutions
Either of these could solve the problem on their own, but I'd very much like both to be implemented.
Compatible API endpoints
Add
healthz
endpoints to the REST API that mirror the standard health endpoints, but return 200 for healthy services and 500 otherwise. It would be beneficial to provide a health endpoint for the for the supervisor itself at/healthz
, and a per-service endpoints at/services/<name>/<group>/healthz
. The downside to this technique is that both the supervisor and Kubernetes perform checks periodically. If the supervisor checks health every 30 seconds, then it doesn't make sense for Kubernetes to check the REST API more often; but if the periods are offset the wrong way, there could potentially be up to a minute from a service problem -> supervisor health check -> Kubernetes readiness probe.API endpoint parameters
Add GET parameter to the health endpoints that enable control over the return code for the different statuses. Prior art: HashiCorp Vault's health endpoint https://www.vaultproject.io/api/system/health.html. This may be the simplest solution and have the best effort-to-payoff ratio.
Direct check execution
The second path is a little more drastic: add the ability to disable the supervisor’s periodic health checks and provide a CLI interface to directly run a service’s
health-check
hook, à lahab pkg exec
. This method allows Kubernetes to take over all of the scheduling responsibility and avoid the issue above.It might even be possible for the Habitat Operator to configure either of these probes automatically, provided that services with no health check hook return successful statuses.
The text was updated successfully, but these errors were encountered: