-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: create trace span for kafkajs eachBatch #3400
Feature: create trace span for kafkajs eachBatch #3400
Conversation
… delays on fast disconnect/re-connect imposed by kafkajs)
…tion-for-kafkajs-each-batch
…github.com:allanpaiste/dd-trace-js into feature/add-instrumentation-for-kafkajs-each-batch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was it left without implementation as the authors expect dd-trace users that decide to go with eachMessage to implement tracing within the batchProcessing themselves?
Yes, unfortunately this is the case. We cannot support batches with our current data model since it's not possible for a single handler to follow from multiple messages from potentially different traces/services. This means that when looping through each of the batch message, it's up to the user to create individual spans that follow from each corresponding message. This is unfortunately a limitation of the product and not something we can address at the library level. This PR would only mitigate the issue for the first message in the batch, but all other messages would still have the same issue, which IMO is even more confusing than the current behaviour of not instrumenting batches at all by default.
Thank you for the comment @rochdev Would it still be reasonable to keep the instrumentation though where it would then just start a new trace (together with specific operation span) just so that the flamegraph would not start with some lower level operation (like database access) which might leave user wondering what preceeded that database access. It would serve the purpose of still allowing people to list all the Kafka operations - while for obvious reasons not guaranteeing that there's no breakage of the trace. Currently due to there not being a good way to retain traces per Kafka message (due to batch processing) we are also sacrificing visibility on the origin of some operations (some kafka operations not producing spans and not starting a trace). So you would not be able to determine which kafka message produced some following operations. One could also still deal with allowing custom instrumentation by allowing (and documenting) the batch instrumentation (with completely new trace) to be disable if need be. CC: @jbertran |
@allanpaiste So basically you would keep the span, but when there are multiple messages it would simply not be connected to the upstream services? Generally speaking that would probably make sense. Could the topic still be used as the resource name, or do batches allow messages from different topics? |
The topic can be still used as resource name, yes. The batch will always hold just one partition of one topic - so that should not be a problem (consumer could be handling multiple partitions of a topic but they'd all end up being passed to eachBatch as separate objects). Just for reference, throwing some kafkajs typing here as well. eachBatchPayload: ... and the Batch itself: |
Blessed by #4645 |
What does this PR do?
Enhances kafkajs instrumentation to cover
eachBatch
the same way the current instrumentation coverseachMessage
just so that both chosen paths would get same kind of span representation show up in APM flamegraphs (and the spans to be searchable).It also provides a experimental environment flag
DD_EXPERIMENTAL_KAFKAJS_PLUGIN_TRACE_FIRST_BATCH_MESSAGE
which allows developers to have at least some level of automatic traceId propagation for batch processing where the context from first message of each batch is used as the trace parent.Motivation
Have both consumption strategies chosen when using
kafkajs
show up in APM flamegraphs and span search rather than half of the usage cases (imagining a situation where half of the consumptions are done in batches) being skipped where the traces would then start from whatever comes after the consume span (db interaction, http calls) which can leave user confused.Plugin Checklist
Additional Notes
There was a test added in dd-trace that was using
eachBatch
in a test that was supposed to indicate that user has not provided any callback at all which leaves me thinking that whoever authored the original instrumentation was at least aware thateachBatch
was a thing, but then decided to skip implementing it. Would be good to know why that choice was made.Am aware that when it comes to traceId propagation - then all kinds of batch processing solutions leave a lot to wish for which makes me wonder: was it left without implementation as the authors expect dd-trace users that decide to go with
eachMessage
to implement tracing within the batchProcessing themselves?If that is so, then feel free to decline this PR.