Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTel SpanLink results in ClassCastException #3662

Closed
NicklasWallgren opened this issue Jun 2, 2024 · 7 comments
Closed

OTel SpanLink results in ClassCastException #3662

NicklasWallgren opened this issue Jun 2, 2024 · 7 comments
Labels
agent-java community Issues and PRs created by the community

Comments

@NicklasWallgren
Copy link
Contributor

NicklasWallgren commented Jun 2, 2024

Describe the bug

The OTelSpanBuilder is unable to handle linked AutoValue_ImmutableSpanContext.

Should io.opentelemetry.api.trace.SpanContext.create() be instrumented and return a OTelSpanContext instead?

java.lang.ClassCastException: class io.opentelemetry.api.internal.AutoValue_ImmutableSpanContext cannot be cast to class co.elastic.apm.agent.opentelemetry.tracing.OTelSpanContext (io.opentelemetry.api.internal.AutoValue_ImmutableSpanContext is in unnamed module of loader 'app'; co.elastic.apm.agent.opentelemetry.tracing.OTelSpanContext is in unnamed module of loader co.elastic.apm.agent.bci.classloading.IndyPluginClassLoader @7853756f)
	at co.elastic.apm.agent.opentelemetry.tracing.OTelSpanBuilder.startSpan(OTelSpanBuilder.java:198)

span.addSpanLink(TraceContext.fromParentContext(), ((OTelSpanContext) links.get(i)).getElasticTraceContext());

Steps to reproduce

final Tracer tracer = GlobalOpenTelemetry.get()
    .tracerBuilder("")
    .setInstrumentationVersion("0.10.0")
    .build();

final SpanContext parentSpanContext = SpanContext.create(
  "f1f6ae85a6cbd46ef0fc12ca201a5954", "e9effc2a1b309797", TraceFlags.getDefault(), TraceState.getDefault());

final Span span =  tracer
    .spanBuilder("queue.task")
    .setSpanKind(SpanKind.CONSUMER)
    .setParent(Context.current())
    .addLink(parentSpanContext)
    .startSpan();

Expected behavior

The linked span context should work as expected.

Solution

I began implementing a SpanContextOpenTelemetryInstrumentation, but was unable to set the traceId in TraceContext without using reflection.

public class SpanContextOpenTelemetryInstrumentation extends AbstractOpenTelemetryInstrumentation {

    @Override
    public ElementMatcher<? super TypeDescription> getTypeMatcher() {
        return named("io.opentelemetry.api.trace.SpanContext");
    }

    @Override
    public ElementMatcher<? super MethodDescription> getMethodMatcher() {
        return named("create");
    }

    @Override
    public String getAdviceClassName() {
        return "co.elastic.apm.agent.opentelemetry.SpanContextOpenTelemetryInstrumentation$SpanContextOpenTelemetryAdvice";
    }

    public static class SpanContextOpenTelemetryAdvice {
        @Advice.AssignReturned.ToReturned
        @Advice.OnMethodExit(suppress = Throwable.class, onThrowable = Throwable.class, inline = false)
        public static OTelSpanContext onExit(@Advice.Argument(0) String traceHexId, @Advice.Argument(1) String spanHexId) {
            Id traceId = Id.new128BitId();
            traceId.fromHexString(traceHexId, 0);

            Id spanId = Id.new64BitId();
            spanId.fromHexString(spanHexId, 0);

            // TODO handle traceFlags and traceState

            TraceContext traceContext = TraceContext.ofId(spanId, GlobalTracer.get().require(ElasticApmTracer.class)); <--- new static factory method

            return new OTelSpanContext(traceContext);
        }
    }
}
@github-actions github-actions bot added agent-java community Issues and PRs created by the community triage labels Jun 2, 2024
@SylvainJuge
Copy link
Member

Hi,

I've managed to reproduce the error that you are facing here and the agent should not trigger a ClassCastException and should instead either silently ignore or issue a warning.

The problem here is that you are manually creating a SpanContext with OpenTelemetry API and using this as a span link. Our OpenTelemetry implementation currently does not support arbitrary SpanContext like you did.

Can you elaborate a bit on the use-case that you are trying to achieve here ? Maybe there is a simpler option by using a span attribute as a work-around for your use case.

@SylvainJuge
Copy link
Member

I have just implemented a "work-around" that just properly warns that using arbitrary spans context is not supported with the otel bridge. #3672

A snapshot is available here if you'd like to test it.

@NicklasWallgren
Copy link
Contributor Author

NicklasWallgren commented Jun 7, 2024

Hi,

I've managed to reproduce the error that you are facing here and the agent should not trigger a ClassCastException and should instead either silently ignore or issue a warning.

The problem here is that you are manually creating a SpanContext with OpenTelemetry API and using this as a span link. Our OpenTelemetry implementation currently does not support arbitrary SpanContext like you did.

Can you elaborate a bit on the use-case that you are trying to achieve here ? Maybe there is a simpler option by using a span attribute as a work-around for your use case.

Thanks for the reply.

Our goal is to spin up a new trace per asynchronous job, and associate the newly created trace with the one that enqueued the job. Similar to what is described here; https://opentelemetry.io/docs/concepts/signals/traces/#span-links

Request (Trace 1) --> Enqueue Job (metadata: T1TraceId, T1SpanId) -> Asynchronous Job Executed (Spin up Trace 2, and associated with Trace 1).

Is this achievable using the otel bridge, or the elastic apm api?

@SylvainJuge
Copy link
Member

Do you have a strong requirement to use Elastic APM agent here ?

As you are already using the OpenTelemetry API, it might be more natural to use an opentelemetry agent and not the opentelemetry bridge in our APM agent.

I would suggest to use our new opentelemetry java distribution or the upstream opentelemetry instrumentation agent, the data captured by those can be ingested as-is in APM.

@NicklasWallgren
Copy link
Contributor Author

NicklasWallgren commented Jun 11, 2024

Do you have a strong requirement to use Elastic APM agent here ?

As you are already using the OpenTelemetry API, it might be more natural to use an opentelemetry agent and not the opentelemetry bridge in our APM agent.

I would suggest to use our new opentelemetry java distribution or the upstream opentelemetry instrumentation agent, the data captured by those can be ingested as-is in APM.

Thanks for the reply. The opentelemetry-agent might be the right way to go.

Will the https://github.com/elastic/elastic-otel-java/ replace the apm-agent-java, in the long run?

@jackshirazi
Copy link
Contributor

Will the https://github.com/elastic/elastic-otel-java/ be replaced by the apm-agent-java, in the long run?

The classic Elastic APM Java agent has many features not yet available in the OpenTelemetry distribution. We are implementing these in our distribution and gradually contributing these upstream to OpenTelemetry. This is likely to take some time, so the long run could be quite long and ultimately it will come down to customer demand. Which is a long-winded way of saying maybe

@NicklasWallgren
Copy link
Contributor Author

We migrated to the opentelemetry-agent in the end, and it works fine with span links. Thanks for the feedback!

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-java community Issues and PRs created by the community
Projects
None yet
Development

No branches or pull requests

3 participants