-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential memory leak #1940
Comments
Hey @martinvisser thanks for this report, and I'm sorry that you had a somewhat painful experience with that upgrade. I would not be surprised to hear that a misconfigured agent uses more memory, as it does take some resources to buffer telemetry, potentially do retries, and log the exceptions. Can you clarify what you meant by "eventually leading to crashes" tho? I'm not sure what we should do about a speculative "there might be a memory leak somewhere" issue. Having a reproduction or any more specifics sure would help. |
@martinvisser Have a look at the breaking changes in https://github.com/signalfx/splunk-otel-java/releases/tag/v2.0.0-alpha By default metrics and logs are now also exported, perhaps the export fails because you don't have the piplines for these signals configured in the collector? If you don't need these then you could disable the exporter with |
@laurit That's actually what I meant by using the "old values" to make it work again. @breedx-splk We're running on Cloud Foundry, which has a feature to kill an application if for example the health check fails. This is considered a crash. The memory increased, which cause the GC to become slower and slower, which lead to too slow of a response of the health check in the end, which then lead to a crash. |
We're using Cloud Foundry's Java Buildpack to deploy our applications. During the latest platform upgrade we got the new version of splunk-otel-java, version 2.5.0.
According to the release, version 2.5.0 contains some breaking changes which on their own are resolvable. However, because we did not know we got this new version we only saw a huge increase in errors being logged about the exporter, logs like this:
io.opentelemetry.exporter.internal.http.HttpExporter - Failed to export spans. Server responded with HTTP status code 404. Error message: Unable to parse response body, HTTP status message:
I'd expect that that on its own would not cause trouble, but it seems that this now wrongly configured service might lead to a memory leak: at the same time we deployed a new Spring Boot application we started to see a rise in memory and CPU, eventually leading to crashes. We reverted back to an older deployment, but the results stayed the same: increase in memory and CPU.
After we reconfigured the service binding, by using the old values instead of the new defaults, the issues were gone.
We did not manage to reliably reproduce it and were not able to get heap dumps, because when a container crashes there's no more access to it.
The text was updated successfully, but these errors were encountered: