-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prometheus_remote_write: Fix cutoff logic. #225
base: master
Are you sure you want to change the base?
Conversation
A single stale metric can suppress encoding of all following non-stale metrics if the return code CMT_ENCODE_PROMETHEUS_REMOTE_WRITE_CUTOFF_ERROR is not treated as success. Signed-off-by: Lars-Dominik Braun <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rethink this PR but it's not sufficient to handle whether cutoff-ed or not. Because the cutoff
variable is only storing the last one happened cutoff.
Instead, we need to use flag to preserve type type of errors like as: fluent/fluent-bit#9236
How is this information (whether some values were not transmitted due to the cutoff) used by fluent-bit? As far as I see |
Currently, we don't use this error for reporting to fluent-bit plugins. This is because for code simplicity. And I once rethink this PR again, I realized that this should be enough for handling extra cutting off circumstances. |
We noticed that fluent-bit’s prometheus remote_write output plugin was silently dropping some, but not all, process_exporter metrics after about one hour while the stdout output plugin was still showing metrics being collected. We were also able to reduce the time after which metrics were being dropped by modifying
CMT_ENCODE_PROMETHEUS_REMOTE_WRITE_CUTOFF_THRESHOLD
, which indicates the problem is the cutoff logic. This merge-request treatsCMT_ENCODE_PROMETHEUS_REMOTE_WRITE_CUTOFF_ERROR
as success and continues encoding other metrics, so they do not get dropped. It might be worth dropping this “error” code entirely, since it’s not really an error and leads to subtle bugs like this one.After merging this fix the bundled copy of cmetrics inside fluent-bit should be updated.