Liveness and Readiness failures for Kong Pods #11698

anup-krai · 2023-10-05T11:01:30Z

Discussed in #11693

^{Originally posted by anup-krai October 4, 2023}
Hi All,
We are using Kong (3.0.1) with KIC (2.8.2) and Postgres DB. From last few days we are getting below Liveness and Readiness failures which is resulting in proxy container restarts -

Events:
  Type     Reason     Age                  From     Message
  ----     ------     ----                 ----     -------
  Warning  Unhealthy  25m (x36 over 5h5m)  kubelet  Liveness probe failed: Get "http://X.X.X.38:8100/status": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal   Killing    25m                  kubelet  Container proxy failed liveness probe, will be restarted
  Warning  Unhealthy  25m (x38 over 5h5m)  kubelet  Readiness probe failed: Get "http://X.X.X.38:8100/status": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal   Pulling    24m (x2 over 5h25m)  kubelet  Pulling image "kong:3.0.1"
  Warning  Unhealthy  24m                  kubelet  Readiness probe failed: Get "http://X.X.X.38:8100/status": read tcp X.X.X.107:58132->X.X.X.38:8100: read: connection reset by peer
  Warning  Unhealthy  24m                  kubelet  Readiness probe failed: Get "http://X.X.X.38:8100/status": dial tcp X.X.X.38:8100: connect: connection refused

What could be causing this error? Below are few logs -

2023/10/03 12:21:32 [alert] 1110#0: *1050641 open socket #426 left in connection 2911
2023/10/03 12:21:32 [alert] 1110#0: aborting
2023/10/03 12:21:35 [alert] 1#0: worker process 1109 exited on signal 9
```</div>

The text was updated successfully, but these errors were encountered:

anup-krai · 2023-10-05T11:03:41Z

After enabling debug level logs I see below -

2023/10/05 10:23:08 [debug] 1108#0: *10990 [lua] targets.lua:240: executing requery for: ****
2023/10/05 10:23:08 [debug] 1108#0: *10990 [lua] targets.lua:439: queryDns(): querying dns for ***
2023/10/05 10:23:08 [debug] 1108#0: *10990 [lua] targets.lua:293: f(): no dns changes detected for ****
2023/10/05 10:23:08 [notice] 1#0: signal 3 (SIGQUIT) received from 1121, shutting down
2023/10/05 10:23:08 [notice] 1108#0: gracefully shutting down
2023/10/05 10:23:08 [debug] 1108#0: *10999 [lua] events.lua:211: do_event_json(): worker-events: handling event; source=resty-worker-events, event=stopping, pid=1108, data=nil
2023/10/05 10:23:08 [crit] 1108#0: *11001 [lua] targets.lua:248: could not reschedule DNS resolver timer: process exiting, context: ngx.timer
2023/10/05 10:23:08 [notice] 1108#0: exiting
2023/10/05 10:23:08 [notice] 1108#0: exit
2023/10/05 10:23:08 [notice] 1#0: signal 17 (SIGCHLD) received from 1108
2023/10/05 10:23:08 [notice] 1#0: worker process 1108 exited with code 0
2023/10/05 10:23:08 [notice] 1#0: signal 29 (SIGIO) received
2023/10/05 10:23:18 [notice] 1#0: signal 15 (SIGTERM) received from 1225, exiting
2023/10/05 10:23:18 [notice] 1#0: signal 14 (SIGALRM) received
2023/10/05 10:23:18 [notice] 1#0: signal 3 (SIGQUIT) received, shutting down
2023/10/05 10:23:18 [notice] 1#0: signal 14 (SIGALRM) received
2023/10/05 10:23:18 [notice] 1#0: signal 14 (SIGALRM) received
2023/10/05 10:23:19 [notice] 1#0: signal 14 (SIGALRM) received
2023/10/05 10:23:19 [notice] 1#0: signal 14 (SIGALRM) received
2023/10/05 10:23:21 [notice] 1#0: signal 14 (SIGALRM) received
2023/10/05 10:23:21 [notice] 1#0: signal 17 (SIGCHLD) received from 1109
2023/10/05 10:23:21 [alert] 1#0: worker process 1109 exited on signal 9
2023/10/05 10:23:21 [notice] 1#0: exit

We have reverted the Kong version from 3.0.1 to 2.8.3 and there also we see similar issue.

nowNick · 2023-10-06T09:26:54Z

Hey @anup-krai

Thank you for reporting this issue. Looking at the logs it seems like something is killing the master process:
2023/10/05 10:23:08 [notice] 1#0: signal 3 (SIGQUIT) received from 1121, shutting down

Could you please share your liveness/readiness probe yamls ?

Do you think it's possible that there's a networking issue inside your k8s cluster? You could try to disable the liveness/readiness probes and verify if and how long it takes for kong to respond with something like:

time curl -v http://X.X.X.38:8100/status

anup-krai · 2023-10-06T10:56:48Z

Hi @nowNick ,

Thanks for your response. Below are the liveness/readiness probe yamls

  readinessProbe:
    httpGet:
      path: "/status"
      port: status
      scheme: HTTP
    initialDelaySeconds: 5
    timeoutSeconds: 5
    periodSeconds: 10
    successThreshold: 1
    failureThreshold: 3

  # livenessProbe for Kong pods
  livenessProbe:
    httpGet:
      path: "/status"
      port: status
      scheme: HTTP
    initialDelaySeconds: 5
    timeoutSeconds: 5
    periodSeconds: 10
    successThreshold: 1
    failureThreshold: 3

I had increased timeoutSeconds to 5 hrs and still consumers are seeing failures. And also status calls were failing even through containers were not restarting.

nowNick · 2023-10-06T12:15:08Z

Thank you! The probe yamls look ok.
Now let's make sure they can actually reach kong - have you tried disabling the probes and trying to manually reach kong pods with curl?

anup-krai · 2023-10-06T12:37:41Z

yes, we have changed the probe endpoints and also tried with disabled it, sometimes curls were timing out or have high response time.

anup-krai · 2023-10-09T07:48:15Z

Team, Can you please suggest on this? Looks like similar issue is raised in #11710.

nowNick · 2023-10-09T11:48:59Z

Hey @anup-krai! Could you tell us how large your configuration is (number of routes/services/consumers), roughly?

anup-krai · 2023-10-09T12:33:53Z

Hi @nowNick , Below are the details -
Routes - 900+
Services - 900+
Consumers - 100+
Plugins - 2900+

anup-krai · 2023-10-10T10:53:21Z

Team, Can you please suggest on this?

nowNick · 2023-10-16T15:12:05Z

Hey @anup-krai

It seems like the similar issue to yours has been closed with this comment:
#11710 (comment)

Do you think the resolution might also be similar?

anup-krai · 2023-10-18T07:54:08Z

Yes, we were able to identify the issue and its fixed now. Thanks for your response @nowNick .

alexandresavicki · 2024-01-09T14:09:32Z

@anup-krai and @nowNick I think i'm suffering with the same issue here, can you please share how did you identify the ofensor plugin causing this problem?

regnaio · 2024-12-29T02:44:33Z

Seeing same issue

nowNick removed the pending author feedback Waiting for the issue author to get back to a maintainer with findings, more details, etc... label Oct 10, 2023

nowNick added the pending author feedback Waiting for the issue author to get back to a maintainer with findings, more details, etc... label Oct 16, 2023

anup-krai closed this as completed Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Liveness and Readiness failures for Kong Pods #11698

Liveness and Readiness failures for Kong Pods #11698

anup-krai commented Oct 5, 2023

anup-krai commented Oct 5, 2023

nowNick commented Oct 6, 2023

anup-krai commented Oct 6, 2023 •

edited

Loading

nowNick commented Oct 6, 2023

anup-krai commented Oct 6, 2023

anup-krai commented Oct 9, 2023

nowNick commented Oct 9, 2023

anup-krai commented Oct 9, 2023

anup-krai commented Oct 10, 2023

nowNick commented Oct 16, 2023

anup-krai commented Oct 18, 2023

alexandresavicki commented Jan 9, 2024

regnaio commented Dec 29, 2024

Liveness and Readiness failures for Kong Pods #11698

Liveness and Readiness failures for Kong Pods #11698

Comments

anup-krai commented Oct 5, 2023

Discussed in #11693

anup-krai commented Oct 5, 2023

nowNick commented Oct 6, 2023

anup-krai commented Oct 6, 2023 • edited Loading

nowNick commented Oct 6, 2023

anup-krai commented Oct 6, 2023

anup-krai commented Oct 9, 2023

nowNick commented Oct 9, 2023

anup-krai commented Oct 9, 2023

anup-krai commented Oct 10, 2023

nowNick commented Oct 16, 2023

anup-krai commented Oct 18, 2023

alexandresavicki commented Jan 9, 2024

regnaio commented Dec 29, 2024

anup-krai commented Oct 6, 2023 •

edited

Loading