Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal DNS servers return records with a TTL of 0 #6790

Open
jgallagher opened this issue Oct 7, 2024 · 1 comment
Open

Internal DNS servers return records with a TTL of 0 #6790

jgallagher opened this issue Oct 7, 2024 · 1 comment
Labels
qorb Connection Pooling

Comments

@jgallagher
Copy link
Contributor

Our internal DNS servers currently don't set an explicit TTL, so we get the hickory-proto default of 0. This instructs clients to never cache results, which seems to be in conflict with connection pooling as we've started to integrate qorb. When we construct a qorb DnsResolver, we currently instruct it to ignore those 0-valued TTLs:

DnsResolverConfig {
hardcoded_ttl: Some(tokio::time::Duration::MAX),
..Default::default()
},

From this conversation on the Oximeter qorb integration PR:

smklein Oct 2, 2024

Yup! Right now, I think our internal DNS system is using a TTL of "zero", which means we'd see a backend, start to spin up connections for them, and then discard them immediately. Next time we query DNS, we'd start to create connections, then discard them immediately...

This behavior is tested here: https://github.com/oxidecomputer/qorb/blob/7c886a24369f458bdab4b9530e01f498da79e61f/src/resolvers/dns.rs#L712-L746 , but it sucks.

By ignoring TTLs, we still discard backends that are explicitly removed from DNS, but we only remove backends when DNS servers stop telling us about them successfully.

This might all be fine? We could decide qorb ignoring the TTL and doings its own periodic querying is exactly what we want, but I wanted to write this down in case we want to control TTLs from the DNS side (at which point we'd presumably want to stop overriding it in the qorb config(s)).

@rmustacc
Copy link

rmustacc commented Oct 7, 2024

I think we should be proactively communicating the TTLs that we should be using otherwise as we spin up and spin down services you're not going to know how long you need to wait for it actively drain per se so at least it can't be discovered in internal DNS by subsequent connection pool mechanisms. If we have a fleet of qorb's that are all picking their own policy choices, then we'll have to go back and encode that knowledge somehow (and then add a bit more here). So while not a priority, I think the DNS server should be explicitly advertising this as a way to indicate to other clients what its intents are and to at least start to create some kind of explicit promise here.

Put differently, we should know that the moment say I remove an internal endpoint from DNS if I wait 1-2x the TTL it is safe for me to close my listen socket to minimize errors. There may still be ongoing connections, but there shouldn't be new connections going in because it should have expired from all DNS caches.

@smklein smklein added the qorb Connection Pooling label Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
qorb Connection Pooling
Projects
None yet
Development

No branches or pull requests

3 participants