-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenTelemetry OTLP setup for tracing #670
Conversation
Replace the "movement_timing" tracing target and the logging layer it targeted with an optionally installed OpenTelemetry OTLP exporter. The name of the tracing target matched to send OpenTelemetry events is "movement_telemetry".
Provide telemetry API as separate from tracing rather than a globally installed layer. Installing an OpenTelemetry layer into the global tracing subscriber raises nasty reentrancy issues because the OTLP exporter stack also uses tracing under the hood.
Replace the "movement_telemetry" traces with tracing with OpenTelemetry API, using the tracer provided by movement-tracing.
74c9b2a
to
72a369b
Compare
In the telemetry overlay for suzuka-full-node, add an OTLP collector start job running a docker container.
72a369b
to
3baea3e
Compare
Converted "movement_telemetry" tracing to OpenTelemetry in the transaction_ingress task module.
The integer values are better represented as such in telemetry data rather than converted to strings.
otel_span.add_event( | ||
"transactions_in_flight", | ||
vec![KeyValue::new("in_flight", in_flight as i64)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This event looks like it wants to be a proper metric.
info!( | ||
target: "movement_timing", | ||
target: "movement_telemetry", | ||
count, | ||
current, | ||
"decrementing_transactions_in_flight", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this one looks like a metric would be better (I haven't converted it to OpenTelemetry yet).
/// This function should be called at the start of the program before any | ||
/// threads are able to use OpenTelemetry tracers. The function will panic | ||
/// if not called within a Tokio runtime. | ||
pub fn init_tracer_provider( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer these providers are fungible. Ideally, the content of a particularly (set) of environment variables, args, or other would tell an instance of the application which to use. How difficult is this to set up from where we are currently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I'm getting at with Config
, which has the from_env
method to set up using well-known environment variable(s). As there typically needs to be a single instance of the OTLP collector in an orchestrated setup within a (virtual) network, all instances can use the same variable with a single source of truth in a declarative environment config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for fungibility in the API, tracing-opentelemetry was supposed to make OpenTelemetry exports just another sort of tracing macro call sites, but it does not work due to tokio-rs/tracing-opentelemetry#159.
This is why I made telemetry
a separate module with its own API.
@@ -146,7 +146,7 @@ impl LightNodeV1 { | |||
Ok(verified_blobs) | |||
} | |||
|
|||
#[tracing::instrument(target = "movement_timing", level = "debug")] | |||
#[tracing::instrument(target = "movement_telemetry", level = "debug")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to have comments on each of these explaining why they are needed to achieve a certain metric as outlined here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we need all of these for telemetry. Originally they were added for perf research.
Do you think we can just remove those that don't contribute to a particular metric goal, or convert them to a plain spans/events with the default target?
Will try to test this e2e shortly. |
Superseded by #697. |
Resolves: #555 #556
Summary
networks
,util
.Integrate an OTLP exporter for tracing events matching the "movement_tracing" target.
The OTLP gRPC endpoint is configured with the
MOVEMENT_OTLP
environment variable.This replaces the earlier "movement_timing" tracing layer, as analysis of OpenTelemetry should be more versatile.
Changelog
MOVEMENT_OTLP
environment variable.MOVEMENT_TIMING
.Testing
TODO: add docker-compose setup for jaeger or similar.
Local process-compose testing hasn't been conclusive.