Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: create trace span for kafkajs eachBatch #3400

Conversation

allanpaiste
Copy link

@allanpaiste allanpaiste commented Jul 17, 2023

What does this PR do?

Enhances kafkajs instrumentation to cover eachBatch the same way the current instrumentation covers eachMessage just so that both chosen paths would get same kind of span representation show up in APM flamegraphs (and the spans to be searchable).

It also provides a experimental environment flag DD_EXPERIMENTAL_KAFKAJS_PLUGIN_TRACE_FIRST_BATCH_MESSAGE which allows developers to have at least some level of automatic traceId propagation for batch processing where the context from first message of each batch is used as the trace parent.

Motivation

Have both consumption strategies chosen when using kafkajs show up in APM flamegraphs and span search rather than half of the usage cases (imagining a situation where half of the consumptions are done in batches) being skipped where the traces would then start from whatever comes after the consume span (db interaction, http calls) which can leave user confused.

Plugin Checklist

  • Unit tests.

Additional Notes

There was a test added in dd-trace that was using eachBatch in a test that was supposed to indicate that user has not provided any callback at all which leaves me thinking that whoever authored the original instrumentation was at least aware that eachBatch was a thing, but then decided to skip implementing it. Would be good to know why that choice was made.

Am aware that when it comes to traceId propagation - then all kinds of batch processing solutions leave a lot to wish for which makes me wonder: was it left without implementation as the authors expect dd-trace users that decide to go with eachMessage to implement tracing within the batchProcessing themselves?

If that is so, then feel free to decline this PR.

@allanpaiste allanpaiste changed the title make sure that all kafkajs usage is properly traced Make sure that all kafkajs usage is properly traced Jul 17, 2023
allanpaiste and others added 4 commits July 17, 2023 14:43
… delays on fast disconnect/re-connect imposed by kafkajs)
…github.com:allanpaiste/dd-trace-js into feature/add-instrumentation-for-kafkajs-each-batch
@allanpaiste allanpaiste marked this pull request as ready for review July 17, 2023 12:06
@allanpaiste allanpaiste requested a review from a team as a code owner July 17, 2023 12:06
@allanpaiste allanpaiste changed the title Make sure that all kafkajs usage is properly traced kafkajs eachBatch tracing (with optional experimental auto-propagation) Jul 17, 2023
@allanpaiste allanpaiste changed the title kafkajs eachBatch tracing (with optional experimental auto-propagation) Feature: tracing for kafkajs eachBatch (with optional experimental auto-propagation) Jul 17, 2023
Copy link
Member

@rochdev rochdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was it left without implementation as the authors expect dd-trace users that decide to go with eachMessage to implement tracing within the batchProcessing themselves?

Yes, unfortunately this is the case. We cannot support batches with our current data model since it's not possible for a single handler to follow from multiple messages from potentially different traces/services. This means that when looping through each of the batch message, it's up to the user to create individual spans that follow from each corresponding message. This is unfortunately a limitation of the product and not something we can address at the library level. This PR would only mitigate the issue for the first message in the batch, but all other messages would still have the same issue, which IMO is even more confusing than the current behaviour of not instrumenting batches at all by default.

@tlhunter tlhunter requested a review from a team as a code owner August 15, 2023 22:07
@tlhunter tlhunter requested review from jbertran and removed request for a team August 15, 2023 22:07
@allanpaiste
Copy link
Author

allanpaiste commented Aug 29, 2023

Thank you for the comment @rochdev

Would it still be reasonable to keep the instrumentation though where it would then just start a new trace (together with specific operation span) just so that the flamegraph would not start with some lower level operation (like database access) which might leave user wondering what preceeded that database access.

It would serve the purpose of still allowing people to list all the Kafka operations - while for obvious reasons not guaranteeing that there's no breakage of the trace.

Currently due to there not being a good way to retain traces per Kafka message (due to batch processing) we are also sacrificing visibility on the origin of some operations (some kafka operations not producing spans and not starting a trace). So you would not be able to determine which kafka message produced some following operations.

One could also still deal with allowing custom instrumentation by allowing (and documenting) the batch instrumentation (with completely new trace) to be disable if need be.

CC: @jbertran

@allanpaiste allanpaiste reopened this Aug 29, 2023
@allanpaiste allanpaiste changed the title Feature: tracing for kafkajs eachBatch (with optional experimental auto-propagation) Feature: tracing for kafkajs eachBatch Aug 29, 2023
@allanpaiste allanpaiste changed the title Feature: tracing for kafkajs eachBatch Feature: create trace span for kafkajs eachBatch Aug 29, 2023
@allanpaiste allanpaiste requested a review from rochdev August 29, 2023 13:11
@rochdev
Copy link
Member

rochdev commented Sep 8, 2023

@allanpaiste So basically you would keep the span, but when there are multiple messages it would simply not be connected to the upstream services? Generally speaking that would probably make sense. Could the topic still be used as the resource name, or do batches allow messages from different topics?

@allanpaiste
Copy link
Author

allanpaiste commented Sep 8, 2023

@rochdev

The topic can be still used as resource name, yes. The batch will always hold just one partition of one topic - so that should not be a problem (consumer could be handling multiple partitions of a topic but they'd all end up being passed to eachBatch as separate objects).

Just for reference, throwing some kafkajs typing here as well.

eachBatchPayload:

https://github.com/tulios/kafkajs/blob/ff3b1117f316d527ae170b550bc0f772614338e9/types/index.d.ts#L990-L999

... and the Batch itself:

https://github.com/tulios/kafkajs/blob/ff3b1117f316d527ae170b550bc0f772614338e9/types/index.d.ts#L876-L886

@Qard Qard requested a review from a team as a code owner June 17, 2024 23:42
@tlhunter
Copy link
Member

Blessed by #4645

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants