Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spring boot runtime metrics #13078

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

zeitlinger
Copy link
Member

Fixes #12812

@zeitlinger zeitlinger requested a review from a team as a code owner January 21, 2025 13:21
@zeitlinger zeitlinger self-assigned this Jan 21, 2025
@github-actions github-actions bot added the test native This label can be applied to PRs to trigger them to run native tests label Jan 21, 2025
@zeitlinger
Copy link
Member Author

@jeanbisutti can you help me with the native failures:

  1. not sure in this is transient:
Failures (1):
  JUnit Jupiter:OtelSpringStarterSmokeTest:shouldSendTelemetry()
    MethodSource [className = 'io.opentelemetry.spring.smoketest.OtelSpringStarterSmokeTest', methodName = 'shouldSendTelemetry', methodParameterTypes = '']
    => org.awaitility.core.ConditionTimeoutException: Assertion condition defined as a Lambda expression in io.opentelemetry.instrumentation.testing.InstrumentationTestRunner
Expecting actual not to be empty within 10 seconds.
       org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
       org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
       org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
       org.awaitility.core.ConditionFactory.until(ConditionFactory.java:1006)
       org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:790)
       [...]
     Caused by: org.awaitility.core.DeadlockException: Deadlocked threads detected:


       org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:159)
       [...]
  1. numLogsCapturedBeforeOtelInstall value of the OpenTelemetry appender is too small. - should we increase the buffer?

  2. thread started: this if for JFR - I'll try @PreDestry for this

The web application [ROOT] appears to have started a thread named [BatchLogRecordProcessor_WorkerThread-1] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
 org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.headers.Pthread.pthread_cond_timedwait(Pthread.java)
 org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.thread.PosixParker.park0(PosixPlatformThreads.java:379)
 org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.thread.PosixParker.park(PosixPlatformThreads.java:354)
 org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.parkCurrentPlatformOrCarrierThread(PlatformThreads.java:1001)
 [email protected]/jdk.internal.misc.Unsafe.park(Unsafe.java:56)
 [email protected]/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:269)
 [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1763)
 [email protected]/java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435)
 io.opentelemetry.sdk.logs.export.BatchLogRecordProcessor$Worker.run(BatchLogRecordProcessor.java:246)
 [email protected]/java.lang.Thread.runWith(Thread.java:1596)
 [email protected]/java.lang.Thread.run(Thread.java:1583)

@jeanbisutti
Copy link
Member

@zeitlinger About 1., it seems an awaitility issue. Does the problem only appear with the new changes? Perhaps it may be worth to do something like

But I am not sure today it would be a good thing to do. It would require some further investigations.

About 2., numLogsCapturedBeforeOtelInstall default value is high: 1 000 logs. I suspect that the warning is related to something specific to the test.

About 3., it seems related to Tomcat searching memory leaks. With the full log we could know if it really comes from Tomcat. It does not seem possible to stop the BatchLogRecordProcessor thread. Surprised it could be JFR related. @jack-berg, would you know if some users have already reported the following log?

appears to have started a thread named [BatchLogRecordProcessor_WorkerThread-1] but has failed to stop it.

Native tests of this PR are failing during the native compilation step:

[native-image-plugin] Native Image written to: /home/runner/work/opentelemetry-java-instrumentation/opentelemetry-java-instrumentation/smoke-tests-otel-starter/spring-boot-3.2/build/native/nativeTestCompile

[Incubating] Problems report is available at: file:///home/runner/work/opentelemetry-java-instrumentation/opentelemetry-java-instrumentation/build/reports/problems/problems-report.html

I would try to focus on the JMX or JFR metrics for a GraalVM native execution. GraalVM supports some JFR events, but not all of them. So, not sure that all the JFR metrics can work today in the native mode.

@jack-berg
Copy link
Member

@jack-berg, would you know if some users have already reported the following log?

I haven't seen that log before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test native This label can be applied to PRs to trigger them to run native tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add runtime-telemetry to spring starter
3 participants