-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Message retry performance implications and architectural issues #613
Comments
I dived a little deeper into the syslog writer code recently and I think that we were incorrect in some of our previous assertions about the synchronized nature of the agent. If you check out the syslog connector, which the manager uses to create new drains, each drain is provided with an egress diode. Since writing to the diode should be non-blocking, I think that the envelope writing loop is in fact asynchronous to some degree. At least, a problematic syslog drain shouldn't directly prevent other drains from continuing to receive messages. |
High CPU usage of the agent is a known problem. Unfortunately, none of the logging and metrics agents currently have any kind of memory or CPU limitation placed upon them. They will expand as necessary to meet demand. We took a pprof dump a while ago and saw that marshalling/unmarshalling envelopes was the primary performance issue of most of our agents. Part of what I hope to accomplish by merging every agent into the OTel collector is to reduce the number of marshal/unmarshal steps required to egress an individual envelope from a VM. |
I did some testing as well, and your assumption about every drain getting its own diode is also my understanding why there is some sort of concurrency happening. |
@nicklas-dohrn to confirm the state of this issue, the current concerns are:
Is that correct? If so, I'd move to ignore the first concern in this issue as I consider it to be a general known issue with CF-D components – what we really want is some BPM-specific way to indicate CPU shares. |
This is an issue to discuss the current state of the retry logic for syslog messages,
As there are some implications, that are problematic.
just listed here shortly for an overview:
This will also put the cpu consumption of the syslog agent over 1 cpu, not sure why
I will add details and my testing results here later in a better formatted way.
The text was updated successfully, but these errors were encountered: