diff --git a/docs/otel-limitations.asciidoc b/docs/otel-limitations.asciidoc index 82f29237e26..dd2e0ee9280 100644 --- a/docs/otel-limitations.asciidoc +++ b/docs/otel-limitations.asciidoc @@ -37,3 +37,20 @@ The https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/ was deprecated in 7.13 and replaced by the native support of the OpenTelemetry Line Protocol in Elastic {observability} (OTLP). To learn more, see https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/elasticexporter#migration[migration]. + +[float] +[[open-telemetry-tbs]] +==== OpenTelemetry's tail-based sampling + +Tail-based sampling allows to make sampling decisions after all spans of a trace have been completed. +This allows for more powerful and informed sampling rules. + +When using OpenTelemetry with Elastic APM, there are two different implementations available for tail-based sampling: + +* Tail-based sampling using the https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor[tailsamplingprocessor] in the OpenTelemetry Collector +* Native <> + +Using the https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor[tailsamplingprocessor] in the OpenTelemetry Collector comes with an important limitation. Elastic's APM backend calculates span and transaction metrics based on the incoming span events. +These metrics are accurate for 100% sampling scenarios. In scenarios with probabilistic sampling, Elastic's APM backend is being informed about the sampling rate of spans and can extrapolate throughput metrics based on the incoming, partial data. However, with tail-based sampling there's no clear probability for sampling decisions as the rules can be more complex and the OpenTelemetry Collector does not provide sampling probability information to the Elastic backend that could be used for extrapolation of data. Therefore, there's no way for Elastic APM to properly extrapolate throughput and count metrics that are derived from span events that have been tail-based sampled in the OpenTelemetry Collector. In these scenarios, derived throughput and count metrics are likely to be inaccurate. + +Therefore, we recommend using Elastic's native tail-based smapling when integrating with OpenTelemetry.