Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could we add health check endpoints? #131

Open
sandangel opened this issue Mar 9, 2022 · 9 comments
Open

Could we add health check endpoints? #131

sandangel opened this issue Mar 9, 2022 · 9 comments
Labels
enhancement New feature or request

Comments

@sandangel
Copy link

Hi, could we add liveliness and readiness endpoint check? This is needed for deployment configs but I can not find the endpoint in the code.

@jrauschenbusch
Copy link

An health check endpoint would really help to use this tile server implementation in production environments. It should be respond with 200 OK when the tile server is ready to process tile requests or 503 Service Unavailable as long as the server is bootstrapping (connecting to Postgres, initializing internal states, ...)

@pramsey
Copy link
Collaborator

pramsey commented Mar 12, 2022

Is there a standard endpoint people use for this?

@dBitech
Copy link

dBitech commented Mar 12, 2022

I tend to use,
startup: /health/startup returns 200 once the init/startup-sequence completes.
liveness: /health/z returns 200 as long as the app is running
readiness: /health/ready returns 200 as long as the app has the capacity to respond meaningfully to requests

Though it would make sense to have these URLs definable via the configuration file.

@jrauschenbusch
Copy link

From the terms mentioned, it's pretty clear that it's about deployment in Kubernetes.

Therefore i would suggest to have a look into the following links:

I don't think that the startup sequence of the pg_tileserv requires a long time. So imho a startup probe is not really required.

Liveness probe endpoint does only make sense if it's easiy to say the application is "dead" (e.g. deadlock) and where it makes sense to restart the application.

From my point of view, the readiness check is the most relevant. It indicates whether the application can process requests. Most applications use a "/healthz" endpoint for this purpose. It can be used for both Kubernetes environments and environments where an external load balancer needs some information to verify that it is safe to forward requests to a specific application instance (running on a VM, container, etc will). Some applications also implement graceful shutdowns (e.g. via a SIGINT), where ongoing requests are still handled, but the healthz endpoint signals that new requests will not be accepted until the application finally terminates after a certain duration. Some other applications also allow the Healthz state to be set explicitly for maintenance purposes.

@stephenhillier
Copy link
Contributor

Dedicated endpoints would be ideal, but in the meantime if anybody is looking for something basic for Kubernetes deployments, try a tcp connection check for the liveness probe (link below) and an HTTP request against the root index for the readiness check.

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-tcp-liveness-probe

@jdenisgiguere
Copy link

We experienced lesser timeout using /index.json instead of /index.html as a readiness probe.

@pramsey
Copy link
Collaborator

pramsey commented Aug 24, 2022

Another useful link PostgREST/postgrest#2092

@dr-jts
Copy link
Collaborator

dr-jts commented Aug 24, 2022

@pramsey pramsey added the enhancement New feature or request label Jul 13, 2023
@eldang
Copy link
Contributor

eldang commented Nov 22, 2023

I just learned that Azure DevOps--at least in a default configuration--tries to use "does it respond to an http[s] request at /" as the readiness probe on container startup. So having implemented #167 I then found that I couldn't in practice disable the preview UI in my client's environment, because ADO kills the container after a few failed retries for the readiness probe. I'm guessing that it's possible to change the config in ADO to either not require this probe or use a valid tile request URL for it (adapting @stephenhillier's suggestion), but I'd prefer to make that unnecessary, so I'd like to pick this issue up. I'm thinking of doing the following:

  1. Implement a dead simple /health/ endpoint which just returns a 200 status and the string ok
  2. Add an option in the config file to specify the endpoint name, which for my use case would let me make / serve the health check, and for others would let whatever path a given CI tool expects be specified.
  3. Allow that option to be a list of endpoints so that if separate startup, health, and liveness endpoints are needed they can be provided.

I think @jrauschenbusch is right that this service starts up quickly enough to make separate startup and readiness endpoints unimportant, which is why I think that if a CI environment ever insists on having them we can get away with using the same handler at multiple paths. I don't think a dedicated startup handler would in practice be responding any sooner than a health / liveness / readiness one.

I should be able to do this next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants