Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to get an initial Kafka connection should terminate or cause non-liveness #29

Open
solsson opened this issue Oct 3, 2019 · 0 comments

Comments

@solsson
Copy link
Contributor

solsson commented Oct 3, 2019

If the kafka client fails to connect we currently get the following state

# curl localhost:8090/health/live
{
    "status": "UP",
    "checks": [
        {
            "name": "REST liveness",
            "status": "UP"
        }
    ]
# curl localhost:8090/health/ready
{
    "status": "DOWN",
    "checks": [
        {
            "name": "consume-loop",
            "status": "DOWN",
            "data": {
                "stage": "WaitingForKafkaConnection"
            }
        }
    ]
}

This service probably need to take a stance on the topic of https://srcco.de/posts/kubernetes-liveness-probes-are-dangerous.html from a sidecar perspective.

The cause of the above state is

2019-09-28 09:02:40,402 INFO  [se.yol.kaf.key.ConsumerAtLeastOnce] (kafkaclient) At stage Initializing before infinite polls with consumer org.apache.kafka.clients.consumer.KafkaConsumer@7f77c83013f0
2019-09-28 09:02:42,063 WARN  [org.apa.kaf.cli.NetworkClient] (kafkaclient) [Consumer clientId=consumer-1, groupId=integrations-b86db879f-r42zr] Connection to node -1 (bootstrap.kafka/10.43.84.242:9092) could not be established. Broker may not be available.
2019-09-28 09:02:45,197 WARN  [org.apa.kaf.cli.NetworkClient] (kafkaclient) [Consumer clientId=consumer-1, groupId=integrations-b86db879f-r42zr] Connection to node -1 (bootstrap.kafka/10.43.84.242:9092) could not be established. Broker may not be available.
2019-09-28 09:02:45,402 ERROR [se.yol.kaf.key.ConsumerAtLeastOnce] (kafkaclient) A Kafka timeout occured at stage WaitingForKafkaConnection: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata

Exception in thread "kafkaclient" org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
2019-09-28 09:02:45,402 INFO  [se.yol.kaf.key.ConsumerAtLeastOnce] (kafkaclient) Closing consumer ...
2019-09-28 09:02:45,407 INFO  [se.yol.kaf.key.ConsumerAtLeastOnce] (kafkaclient) Consumer closed at stage WaitingForKafkaConnection; Use liveness probes with /health for app termination
2019-09-30 11:26:24,917 ERROR [org.jbo.res.res.i18n] (executor-thread-1) RESTEASY002010: Failed to execute: javax.ws.rs.ServiceUnavailableException: Denied because cache isn't started yet, check /health for status
    at se.yolean.kafka.keyvalue.http.CacheResource.requireUpToDateCache(CacheResource.java:43)
    at se.yolean.kafka.keyvalue.http.CacheResource.keysJson(CacheResource.java:128)

And REST services respond 503

# curl --verbose localhost:8090/cache/v1/keys
*   Trying ::1...
* TCP_NODELAY set
* connect to ::1 port 8090 failed: Connection refused
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8090 (#0)
> GET /cache/v1/keys HTTP/1.1
> Host: localhost:8090
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< Connection: keep-alive
< Content-Length: 0
< Date: Mon, 30 Sep 2019 11:26:30 GMT
<
* Curl_http_done: called premature == 0
* Connection #0 to host localhost left intact
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant