Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kong gets stuck on DNS issue #12101

Closed
1 task done
aatishnn opened this issue Nov 27, 2023 · 9 comments
Closed
1 task done

Kong gets stuck on DNS issue #12101

aatishnn opened this issue Nov 27, 2023 · 9 comments
Labels
pending author feedback Waiting for the issue author to get back to a maintainer with findings, more details, etc... stale

Comments

@aatishnn
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Kong version ($ kong version)

3.4.2

Current Behavior

We are occasionally running into some weird DNS resolution issues with
Kong where some kong containers get stuck in the following DNS error
and they do not ever recover until the container is replaced. Other
container instances of the same service seem to work fine, and
usually, it's only one container that goes bad.

[lua] init.lua:371: execute(): DNS resolution failed: dns server
error: 3 name error.

cache-miss/in progress (sync)/try 1 error: timeout/scheduled/try 2
error: timeout/dns lookup pool exceeded retries (1): timeout",
cache-miss/in progress (sync)/try 1 error: timeout/scheduled/try 2
error: timeout/dns lookup pool exceeded retries (1): timeout",
cache-miss/in progress (sync)/try 1 error: timeout/scheduled/try 2
error: timeout/dns lookup pool exceeded retries (1): timeout",
cache-miss/querying/dns server error: 3 name error", cache-hit/dns
client error: 101 empty record received"
cache-hit/dns server error: 3 name error"], client: 127.0.0.1, server:
kong, request: "GET <url> HTTP/1.1",

What could we do to further investigate this? The fact that other
copies of the containers seem fine while only one gets stuck makes us
believe there's something going on within Kong internals.

Expected Behavior

Even if there was a DNS error, we would expect Kong to heal itself from that.

Steps To Reproduce

No response

Anything else?

No response

@locao
Copy link
Contributor

locao commented Nov 27, 2023

Hi @aatishnn! Thanks for your report. Are there any other erros in the error.log file? How many entities are there in this environment? How long are the records TTL?

@chobits
Copy link
Contributor

chobits commented Nov 28, 2023

Some hints for the error log,

...
cache-miss/in progress (sync)/try 1 error: timeout/scheduled/try 2
error: timeout/dns lookup pool exceeded retries (1): timeout",
...

Here timeout means semaphore:wait() failed because of expiration.

@ADD-SP ADD-SP added the pending author feedback Waiting for the issue author to get back to a maintainer with findings, more details, etc... label Dec 4, 2023
@victordrnd
Copy link

victordrnd commented Dec 13, 2023

Hi, we are experiencing the same behavior :
[error] 2321#0: *1170192 [lua] init.lua:388: execute(): DNS resolution failed: dns server error: 3 name error. Tried: ...]

and a few seconds after the livenessprobe timeout and the Pod stops and then restart until the next dnslookup fail

@chobits
Copy link
Contributor

chobits commented Dec 14, 2023

DNS resolution failed: dns server error: 3 name error.

This message of DNS resolution failed: dns server error: 3 name error. means that the dns server could be access but the domain name queried does not exist.

And could you show the complete message of this line, I think there was some other retried information from dns subsystem of kong.

@walosha
Copy link

walosha commented Dec 24, 2023

| 2023/12/24 17:41:55 [error] 1283#0: *1185 [lua] init.lua:371: execute(): DNS resolution failed: dns server error: 3 name error. Tried: ["(short)host.docker.internal:(na) - cache-miss","host.docker.internal:33 - cache-miss/querying/dns server error: 3 name error","host.docker.internal:1 - cache-miss/querying/dns server error: 3 name error","host.docker.internal:5 - cache-miss/querying/dns server error: 3 name error"], client: 192.168.128.1, server: kong, request: "GET /ts-route/api/products HTTP/1.1", host: "time-api", request_id: "1f9dcf3acaeac72b0c74c53b6c47db0e" kong_1 | 192.168.128.1 - - [24/Dec/2023:17:41:55 +0000] "GET /ts-route/api/products HTTP/1.1" 503 91 "-" "got (https://github.com/sindresorhus/got)" kong_request_id: "1f9dcf3acaeac72b0c74c53b6c47db0e"
@chobits

@chobits
Copy link
Contributor

chobits commented Dec 25, 2023

| 2023/12/24 17:41:55 [error] 1283#0: *1185 [lua] init.lua:371: execute(): DNS resolution failed: dns server error: 3 name error. Tried: ["(short)host.docker.internal:(na) - cache-miss","host.docker.internal:33 - cache-miss/querying/dns server error: 3 name error","host.docker.internal:1 - cache-miss/querying/dns server error: 3 name error","host.docker.internal:5 - cache-miss/querying/dns server error: 3 name error"], client: 192.168.128.1, server: kong, request: "GET /ts-route/api/products HTTP/1.1", host: "time-api", request_id: "1f9dcf3acaeac72b0c74c53b6c47db0e" kong_1 | 192.168.128.1 - - [24/Dec/2023:17:41:55 +0000] "GET /ts-route/api/products HTTP/1.1" 503 91 "-" "got (https://github.com/sindresorhus/got)" kong_request_id: "1f9dcf3acaeac72b0c74c53b6c47db0e" @chobits

Kong DNS client tried to resolve the host "host.docker.internal", but finally it failed, because the DNS server always returned name error for this host.

It tried host.docker.internal:33, which means it tried to resolve host.docker.internal SRV record.

  • host.docker.internal:33 -> SRV record
  • host.docker.internal:1 -> A record
  • host.docker.internal:5 -> CNAME record

@walosha
Copy link

walosha commented Dec 25, 2023

@chobits Thanks this was helpful!. I resolved to using gateway IP is usually 172.17.0.1 instead of host.docker.internal

Copy link
Contributor

github-actions bot commented Jan 9, 2024

This issue is marked as stale because it has been open for 14 days with no activity.

@github-actions github-actions bot added the stale label Jan 9, 2024
Copy link
Contributor

Dear contributor,

We are automatically closing this issue because it has not seen any activity for three weeks.
We're sorry that your issue could not be resolved. If any new information comes up that could
help resolving it, please feel free to reopen it.

Your contribution is greatly appreciated!

Please have a look
our pledge to the community
for more information.

Sincerely,
Your Kong Gateway team

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending author feedback Waiting for the issue author to get back to a maintainer with findings, more details, etc... stale
Projects
None yet
Development

No branches or pull requests

6 participants