unbound_response_time_seconds missing cached responses #52

codl · 2023-02-23T07:12:13Z

The help text for the unbound_response_time_seconds histogram says: "Query response time in seconds"

I thought this meant it would measure the time unbound takes to respond to every client query, however it does not seem to include queries served from cache

The munin plugin plots total cache hits along with the histogram, putting them under the lowest histogram bucket

I'm not sure it's possible in Prometheus to do histogram quantile calculation over a histogram + another stray series interpreted as an extra bucket. Perhaps unbound_response_time_seconds should include cache hits in the lowest bucket? At least this should be documented

jsha · 2023-02-23T18:35:01Z

An interesting question! Cache hits and cache misses will have a completely different distribution, so it's probably hard to represent them nicely in a single set of histogram buckets. We could add a label cache="hit" vs cache="miss" but the buckets would still be suboptimal for one or the other situation.

I can also see, though, why you would be interested in the question of "what is the performance my end-users see, covering both hits and misses."

it does not seem to include queries served from cache

Can I ask what you're basing this on? I don't know one way or the other what the answer is.

codl · 2023-02-24T06:48:03Z

I can also see, though, why you would be interested in the question of "what is the performance my end-users see, covering both hits and misses."

That's exactly it 🙂

Can I ask what you're basing this on?

It was a guess based on some surprising results I was seeing on my dashboard, reinforced by checking out the munin setup, and then experimentation confirmed my guess.

I started a new unbound server and repeated the same query a few times, checking unbound-control stats_noreset after each query, and found that the first answer was counted in one of the buckets and subsequent answers were not. I also found through experimentation that background "prefetch" queries don't seem to be counted in the histogram either. I thought maybe the histogram measured outgoing recursion time, regardless of whether it is user-facing or not.

Caveat emptor, I didn't check local authority zones, forward zones, etc, I can't say if those are counted or not.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unbound_response_time_seconds missing cached responses #52

unbound_response_time_seconds missing cached responses #52

codl commented Feb 23, 2023

jsha commented Feb 23, 2023

codl commented Feb 24, 2023

unbound_response_time_seconds missing cached responses #52

unbound_response_time_seconds missing cached responses #52

Comments

codl commented Feb 23, 2023

jsha commented Feb 23, 2023

codl commented Feb 24, 2023