Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry OTLP setup for tracing #670

Closed
wants to merge 10 commits into from
Closed

Conversation

mzabaluev
Copy link
Collaborator

@mzabaluev mzabaluev commented Oct 8, 2024

Resolves: #555 #556

Summary

  • Categories: networks, util.

Integrate an OTLP exporter for tracing events matching the "movement_tracing" target.
The OTLP gRPC endpoint is configured with the MOVEMENT_OTLP environment variable.

This replaces the earlier "movement_timing" tracing layer, as analysis of OpenTelemetry should be more versatile.

Changelog

  • Added OpenTelemetry OTLP exporter, with the endpoint URL configured with the MOVEMENT_OTLP environment variable.
  • Removed support for timing JSON logs configured with MOVEMENT_TIMING.

Testing

TODO: add docker-compose setup for jaeger or similar.

Local process-compose testing hasn't been conclusive.

Replace the "movement_timing" tracing target and the logging layer
it targeted with an optionally installed OpenTelemetry OTLP exporter.
The name of the tracing target matched to send OpenTelemetry events is
"movement_telemetry".
Provide telemetry API as separate from tracing rather than
a globally installed layer. Installing an OpenTelemetry layer into the
global tracing subscriber raises nasty reentrancy issues because
the OTLP exporter stack also uses tracing under the hood.
Replace the "movement_telemetry" traces with
tracing with OpenTelemetry API, using the tracer
provided by movement-tracing.
In the telemetry overlay for suzuka-full-node, add
an OTLP collector start job running a docker container.
Converted "movement_telemetry" tracing to OpenTelemetry
in the transaction_ingress task module.
The integer values are better represented as such in telemetry data
rather than converted to strings.
Comment on lines +136 to +138
otel_span.add_event(
"transactions_in_flight",
vec![KeyValue::new("in_flight", in_flight as i64)],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This event looks like it wants to be a proper metric.

Comment on lines 48 to 52
info!(
target: "movement_timing",
target: "movement_telemetry",
count,
current,
"decrementing_transactions_in_flight",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this one looks like a metric would be better (I haven't converted it to OpenTelemetry yet).

/// This function should be called at the start of the program before any
/// threads are able to use OpenTelemetry tracers. The function will panic
/// if not called within a Tokio runtime.
pub fn init_tracer_provider(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer these providers are fungible. Ideally, the content of a particularly (set) of environment variables, args, or other would tell an instance of the application which to use. How difficult is this to set up from where we are currently?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I'm getting at with Config, which has the from_env method to set up using well-known environment variable(s). As there typically needs to be a single instance of the OTLP collector in an orchestrated setup within a (virtual) network, all instances can use the same variable with a single source of truth in a declarative environment config.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for fungibility in the API, tracing-opentelemetry was supposed to make OpenTelemetry exports just another sort of tracing macro call sites, but it does not work due to tokio-rs/tracing-opentelemetry#159.
This is why I made telemetry a separate module with its own API.

@@ -146,7 +146,7 @@ impl LightNodeV1 {
Ok(verified_blobs)
}

#[tracing::instrument(target = "movement_timing", level = "debug")]
#[tracing::instrument(target = "movement_telemetry", level = "debug")]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to have comments on each of these explaining why they are needed to achieve a certain metric as outlined here:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we need all of these for telemetry. Originally they were added for perf research.
Do you think we can just remove those that don't contribute to a particular metric goal, or convert them to a plain spans/events with the default target?

@l-monninger
Copy link
Collaborator

Will try to test this e2e shortly.

@mzabaluev
Copy link
Collaborator Author

Superseded by #697.

@mzabaluev mzabaluev closed this Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Emit OpenTelepathy events for ingress transactions
2 participants