-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(pdk) - respect the trace flags from the parent span (if present) #13015
Conversation
Kong respects the `should_sample` decision from a parent span if one was provided. Previously it was not doing this resulting in incomplete traces. When using kong as an internal api gateway we are seeing incomplete traces. The logs show that the following appears to be happening: (this is based on a sampling rate of 0.01) - Service A makes a request to Service B with no traceparent header specified - Kong generates a traceparent header and specifies trace flags of `"01"` (i.e. the trace should be sampled) - Service B handles the request from Service A and ingests the traceparent header - Service B then makes a request to Service C, setting the traceparent header appropriately (i.e. using flags `"01"`) - Kong then resets the trace flags to `"00"` (i.e. the trace should *not* be sampled) - Service C handles the request from Service B and, respecting the flags from the traceparent header, does not sample the trace This results in Kong and Services A and B sampling the trace, but Service C does not, and so we get an incomplete trace. I identified the `get_sampling_decision` method as the most likely place where the issue occurs, and since `parent_should_sample` is a tri-state boolean (i.e. it can be `true`, `false`, or `nil`) it makes most sense (to me) to respect the value of `parent_should_sample` if it exists. I tested this locally and it behaves as I would expect; when a traceparent header is included in the request, then the flags are respected, however if not, probabilistic sampling is applied. However rereading the code, I can't work out *why* that should be the case, since the probablistic sampler is deterministic based on trace id (i.e. the same trace-id will always return the same outcome). Even stepping through the code, I couldn't work out exactly what was going on. It seems likely to me that it's something to do with how the root span is constructed, but I couldn't tell for sure.
19edd70
to
f92ec2e
Compare
Hi @andyroyle, thank you for your contribution! Could you please test the branch from this PR and let me know if it soves your problem? The explanation of the fix (which should answer your questions) is in the PR description. Regarding your changes:
This is not wrong, however, it is also not the way Kong's sampler is currently designed and I'm afraid your logic might turn out to be a breaking change for some. At the moment, as you noticed, the parent's sampling decision is only inherited when This (if everything worked correctly) should allow configuring the sampler to do either of the following:
Since some might be relying on (2.), if we wanted to introduce a change and make Kong support a "parent based sampler", it should be configurable. However, I believe that with the fix I linked, this should not be needed any longer, but please do let me know in case I missed anything or you have a different opinion. I am converting this to draft for the time being. |
Yep, I think you identified the correct way to solve this issue. I think the key piece of context I was missing was that the sampling_rate could be empty (because it's actually the plugin's sampling rate, which can be different from the global sampling rate). Amazing work, thanks for highlighting this, otherwise I would probably have continued to use my "fixed" version forever, believing it to be correct 😬 |
Thanks for getting back to me @andyroyle and for confirming the fix (now merged) works! I'm really glad that helped. I'm going to close this PR, your contribution was very much appreciated! |
Summary
Kong respects the
should_sample
decision from a parent span if one was provided. Previously it was not doing this resulting in incomplete traces.When using kong as an internal api gateway we are seeing incomplete traces. The logs show that the following appears to be happening:
(this is based on a sampling rate of 0.01)
"01"
(i.e. the trace should be sampled)"01"
)"00"
(i.e. the trace should not be sampled)This results in Kong and Services A and B sampling the trace, but Service C does not, and so we get an incomplete trace.
I identified the
get_sampling_decision
method as the most likely place where the issue occurs, and sinceparent_should_sample
is a tri-state boolean (i.e. it can betrue
,false
, ornil
) it makes most sense (to me) to respect the value ofparent_should_sample
if it exists.I tested this locally and it behaves as I would expect; when a traceparent header is included in the request, then the flags are respected, however if not, probabilistic sampling is applied.
However rereading the code, I can't work out why that should be the case, since the probablistic sampler is deterministic based on trace id (i.e. the same trace-id will always return the same outcome). Even stepping through the code, I couldn't work out exactly what was going on. It seems likely to me that it's something to do with how the root span is constructed, but I couldn't tell for sure.
Checklist
changelog/unreleased/kong
orskip-changelog
label added on PR if changelog is unnecessary. README.md