184207213 - Improve observability tools #892
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR makes the changes so we have an opt-in feature for collecting trace and log data in the open telemetry standard through a jaeger exporter. Jagger accepts multiple backend databases (e.g. elastic, prometheus) so we can build on top of the platform for flexibly exploring telemetry data. Also building on top of OT will allows to add metrics in the future if we need to and add opt-in remote telemetry, as well as enable this in simulations, test-network and collect remotely in a seamless way.
We collect logs per spans per node, in network mode each peer will be like a single service, and in local mode (for simple in-memory tests) each simulated node has it's own root span so is easy to check info.
There is, additionally, spans created per operation type, where the life of a transaction can be followed and all logs properly classified per transaction, this is done through a custom event register which creates OT traces and logs leveraging our own event registry system, and is independent of the per-node traces.
This is not enabled by default, since it requires a working jaeger collector.
TODO: add some documentation on how to use this.