-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: do not stop sampling processor when failing to delete trace events #12509
Conversation
The sampling processor should never stop when apm-server is running. Instead log an error on Warn level and skip the current event.
This pull request does not have a backport label. Could you fix it @kruskall? 🙏
NOTE: |
From the issue description:
This PR addresses the part of the github issue to not stop the TBS processor, but it does not address the problem that events exceeding the max size fill up storage without ever getting deleted. Events that are too large to be deleted, should be rejected already when received. |
There is the same Did the deletion txn get too large because writes are batched? Should we handle the ErrTxnTooBig thrown by flushing and retrying? |
The code ensures that an event is added to a new transaction if the max size of the write transaction would otherwise be exceeded. I overlooked that the event cannot be stored in the first place if the event itself is exceeding the max transaction size.
Yes, that sounds like the only reason why this might fail. +1 on the suggested solution. |
Yep, we're keeping track of buffered writes so the storage shouldn't become too big as opposed to the previous approach.
I'm not sure what this means, can you elaborate ? |
To fix it, handle |
Ah right, deleting adds a new entry 🤦 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
- Can you update the PR description?
- Do we want to backport this to 8.12, i.e. 8.12.2?
Done
I guess we could do that since we did the same for the other tbs fix. cc @simitt who opened the issue for confirmation |
It's a small enough fix, so no objections to backport to |
@mergify backport 8.12 |
✅ Backports have been created
|
…ts (#12509) * fix: do not stop sampling processor when failing to delete trace events The sampling processor should never stop when apm-server is running. Instead log an error on Warn level and skip the current event. * fix: handle ErrTxnTooBig when deleting trace events (cherry picked from commit 6f0be72)
…ts (#12509) (#12664) * fix: do not stop sampling processor when failing to delete trace events The sampling processor should never stop when apm-server is running. Instead log an error on Warn level and skip the current event. * fix: handle ErrTxnTooBig when deleting trace events (cherry picked from commit 6f0be72) Co-authored-by: kruskall <[email protected]>
Motivation/summary
The sampling processor should never stop when apm-server is running. Instead log an error on Warn level and skip the current event.
Handle
ErrTxnTooBig
error when deleting trace events: because deleting adds a transaction it can increase the size above the limit.Flush the events before deleting.
Checklist
For functional changes, consider:
How to test these changes
Related issues
Closes #12053