Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support OpenTelemetry tracestate header for consistent head-based sampling #827

Open
StephanErb opened this issue Oct 3, 2023 · 1 comment

Comments

@StephanErb
Copy link

StephanErb commented Oct 3, 2023

From the Elastic documentation:

Head-based sampling is implemented in the APM agents and SDKs, and requires the sample rate to be propagated between services and the APM Server. This functionality is not currently supported by OpenTelemetry, which results in inaccurate APM throughput, latency, and error metrics. OpenTelemetry users should consider using tail-based sampling instead.

This is by now outdated as OpenTelmetry has tracestate support, even though in a slightly different form than Elastic:

This document specifies an approach based on an “r-value” and a “p-value”. At a very high level, r-value is a source of randomness and p-value encodes the sampling probability. A context is sampled when p <= r.

Both fields are propagated via the OpenTelemetry tracestate under the ot vendor tag using the rules for tracestate handling. Both fields are represented as unsigned decimal integers requiring at most 6 bits of information.

This allows Trace consumers to correctly count spans simply by interpreting the p-value on a given span.

Asks

  • Elastic client libraries should populate both the Elastic as well as the OpenTelemetry tracestate header. This will ensure consistent tracing if microservices with either Elastic or OpenTelemetry instrumentation are in the same call chain.
  • Elastic APM Server should use the OpenTelemetry tracestate header to estimate the full throughput metrics if available.

Context

@axw
Copy link
Member

axw commented Oct 4, 2023

@StephanErb thanks for opening this! This has been on my mind, but hadn't gotten around to opening the issue yet.

This is partially done. The main missing part your first point about the client libraries populating both tracestate keys. I think it would also be useful to have OTel Sampler implementations that produce/handle both tracestate keys.

Elastic APM Server should use the OpenTelemetry tracestate header to estimate the full throughput metrics if available.

FYI this was implemented in v8.8.0: elastic/apm-server#10309. Seems to be missing from the release notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants