-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue / holding reference to binaries #459
Comments
Thanks, haven't had time to do a detailed dive into your post but a quick look are you seeing the issue in |
Unfortunately it is not possible to move to |
Ok, this issue sounds familiar actually, there may already be a fix for it I can merge to chatterbox and release a new version. Will get back to you. |
@elvanja I've released grpcbox 0.16.0 and chatterbox 0.13.0. Are you are able to bump to those versions and see if anything improves? |
Will do, stay tuned for results. The process will take at least a few days to determine if this helps. |
Update: app version with grpcbox 0.16.0 and chatterbox 0.13.0 was in production for 5 days now, and I am still seeing the same pattern of binaries being held. I upgraded only those two dependencies, all the other related dependencies stayed the same. Do you feel it would make sense to try with all related dependencies upgraded before giving Here's the list of those related dependencies which use telemetry (removed the others to reduce the noise):
If it helps, here are the details for the largest one: %{
binaries_size: 1226523189,
memory: 239552,
pid: #PID<0.5407.0>,
process_info: [
current_function: {:gen_statem, :loop_receive, 3},
initial_call: {:proc_lib, :init_p, 5},
status: :waiting,
message_queue_len: 0,
links: [#PID<0.16047.24>, #PID<0.24620.2>, #PID<0.29652.25>,
#PID<0.31253.1>, #PID<0.32115.25>, #PID<0.32508.24>, #Port<0.121>,
#PID<0.32245.48>, #PID<0.31690.48>, #PID<0.31829.70>, #PID<0.31763.49>,
#PID<0.31675.24>, #PID<0.30646.48>, #PID<0.30675.69>, #PID<0.30813.69>,
#PID<0.30898.70>, #PID<0.30721.49>, #PID<0.30649.24>, #PID<0.29999.24>,
#PID<0.30020.91>, #PID<0.30083.25>, #PID<0.30000.49>, #PID<0.29777.25>,
#PID<0.29987.70>, #PID<0.29750.92>, #PID<0.27522.24>, #PID<0.28244.24>,
#PID<0.29183.24>, #PID<0.29457.25>, #PID<0.29525.92>, #PID<0.29493.48>,
#PID<0.29415.49>, #PID<0.29246.91>, #PID<0.28777.49>, #PID<0.29171.25>,
#PID<0.29067.69>, #PID<0.28591.49>, #PID<0.28598.49>, #PID<0.28079.1>,
#PID<0.28124.24>, #PID<0.27628.70>, ...],
dictionary: [
"$ancestors": [#PID<0.4553.0>, #PID<0.4547.0>, :otel_batch_processor,
:opentelemetry_sup, #PID<0.4498.0>],
"$initial_call": {:h2_connection, :init, 1}
],
trap_exit: false,
error_handler: :error_handler,
priority: :normal,
group_leader: #PID<0.4497.0>,
total_heap_size: 28690,
heap_size: 28690,
stack_size: 11,
reductions: 568170927,
garbage_collection: [
max_heap_size: %{error_logger: true, kill: true, size: 0},
min_bin_vheap_size: 46422,
min_heap_size: 233,
fullsweep_after: 65535,
minor_gcs: 0
],
suspending: []
]
} |
I'd go with using There should be a simple fix here, it is just wrong that those stream processes are still alive, something isn't properly closing. |
I was hoping I'd find it as simple as checking the number of |
Are you sending to the Otel collector? If so, what version? I'm now wondering if it could be the server is not closing their end and that's why those processes are alive. I'm testing against 0.60.0 of the collector-contrib docker image. |
We are actually sending to Honeycomb, via Refinery as proxy. Related links: https://github.com/honeycombio/refinery/tree/v1.17.0 and https://docs.honeycomb.io/manage-data-volume/refinery/. And, I have started the switch to protobuf, but again it will take a bit of time to actually release and see if it solved the problem. Stay tuned. |
Update: so changing the protocols solved the problem. Apologies for a bit late reply, but it took a bit of time to make sure we are in the clear. Not sure why |
@elvanja good to hear. We should keep the issue open. I know it is a bug in the grpc client and need to get it fixed. |
Hey, we are also seeing the same issue. Unfortunately setting protocol to
|
It is most likely not the same issue if it happens even with |
Sorry for this question but is this the correct way to configure the exporter:
For some reason we are seeing |
@davan-nl technically it can be configured that way but since it is so deeply nested and hard to be sure its correct we have a simpler way to configure the processor and exporter:
These are also the defaults |
and no, grpc isn't used for anything else, so you should not see any use of grpcbox_client when the processor is configured to use the http exporter. |
Thanks for the prompt responses @tsloughter. I am wondering whats the priority to get grpc working. As you mention using |
@davan-nl are you sending to the otel collector or a vendor directly? |
I try doing a simple example against the collector and I'm not seeing the ever increasing list of links like you show in the process info for the grpc connection process (meaning a stream leak). |
We use instana which has a local agent running. Can try investigating if using the otel collector in-between can help. There is a clear difference in memory usage when we switched from grpc to http_protobuf. |
Ok, yea, that'd be great if you can reproduce with otel collector. I'm running an app locally publishing to the collector and not reproducing. It could be something with how the vendor agents close, or don't close, streams. Still likely a bug on our end (not ending the stream when we should) but it would help narrow down where the bug is. |
Just wanted to add this was the output from recon
|
Thanks, |
Just wanted to give 1 last update |
@davan-nl good to know. And yes, it is most likely and issue in grpcbox, but not 100%. It could be both Instana and the Collector use Go's grpc lib and so has the same quirk, but more likely is that something about talking to a grpc server over the internet (could be an intermediary box causing this) results in grpcbox not closing the stream. But now I know I should be able to reproduce by signing up for a service and giving it a try. |
I've got an example exporting to honeycomb, its creating a span every 500 ms (its just a free account so trying not to be cutoff :)) and I'm still unable to see his happening. Going to keep it running and hopefully eventually reproduces. |
Oh, oops, forgot for honeycomb it was first going through refinery when @elvanja was seeing this... I guess I need to try running with that to be able to reproduce. |
Hi, any update with the issue above? we also use Instana and seeing the memory leak issue |
@pprogrammingg afraid not, I still need to reproduce so I can debug. I guess I can look into Instana to see if they have a free trial I can utilize. |
Finding it not so easy to get going with Instana. Are you using the docker image to run the agent? |
We do run it with docker, this should be enough to help bootstrap instana assuming you are using kubernetes agent:
configuration_yaml: |
---
com.instana.plugin.opentelemetry:
enabled: true
com.instana.tracing:
extra-http-headers:
- "X-Request-Id"
- "X-Amzn-Trace-Id"
downloadKey: <redacted>
endpointHost: <your-instana-endpoint>
endpointPort: 443
key: <redacted>
mode: APM
cluster:
name: <your-cluster-name>
fullnameOverride: instana-agent
opentelemetry:
enabled: true
service:
create: true Install Helm chart: helm repo add instana https://agents.instana.io/helm
helm repo update
helm install instana -n instana-agent --create-namespace -f values.yaml Use |
Thanks. I'm just trying to run with |
Be sure to set
|
@tsloughter Did you get a chance to test? |
Sorry, I missed that you provided exactly what I needed to do :(. I have to get a new instana account and then I'll be able to test it. I'll definitely get to it this week, if not today. |
I contacted Instana on Wednesday and thought I'd have an extension on my trial by now but don't. Guess I have to make a new account if the |
Sorry I forgot to note that I did try again with a new account but did not see the leak. I wish I could address this sooner since I have little doubt it is an issue on our end but its slow going being able to give it attention and when giving it attention actually getting anywhere :( |
Oh, I did end up testing this but was unable to reproduce. Sorry for the late notice, I was hoping to keep trying until I was able to and then respond but now that it has been so long I figured I'd better give an update. |
Hi,
A Phoenix app I am working on is experiencing binaries related memory leak.
It is related to https://github.com/quantum-elixir triggered jobs, but after some investigation found that it is actually open telemetry related process that holds on to those binaries. Not sure if this is a Quantum or Telemetry issue though (or our usage!).
Binaries memory leak with quantum and (open)telemetry contains all the details.
Any ideas on what might be the cause?
The text was updated successfully, but these errors were encountered: