Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka sink silently discards events on connection errors #21031

Closed
frankh opened this issue Aug 8, 2024 · 4 comments
Closed

Kafka sink silently discards events on connection errors #21031

frankh opened this issue Aug 8, 2024 · 4 comments
Labels
type: bug A code related bug.

Comments

@frankh
Copy link
Contributor

frankh commented Aug 8, 2024

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I've noticed the vector_component_discarded_events_total metric shows some events are discarded from our kafka sink, despite the fact we have acknowledgements enabled

From correlating logs when the discards happen it looks like this happens every time there is an intermittent connection failure with Kafka

Silent errors on acknowledged sinks is unacceptable and is a complete blocker for us to use Vector sadly. Is there any way to have the sink NACK these events?

Configuration

kafka:
  type: kafka
    acknowledgements:
      enabled: true
    buffer:
      - type: memory
        max_events: 10000
        when_full: block

Version

0.40.0

Debug Output

No response

Example Data

No response

Additional Context

No response

References

Related:

@frankh frankh added the type: bug A code related bug. label Aug 8, 2024
@jszwedko
Copy link
Member

jszwedko commented Aug 8, 2024

Thanks for this report @frankh ! It does sound like there might be a bug with nacking in the sink for connection issues. Can you share:

  • The full config (or a minimal config that is capable of reproducing the issue)
  • The log output you get

@frankh
Copy link
Contributor Author

frankh commented Aug 9, 2024

I'm not able to full reproduce it... I've set it up locally and killing kafka seems to consistently result it a 400 error returned to the http client

however, this is still a bug as it should be a 500 (server error) or 503 (service unavailable) not 400 (bad request)

that means the sink is setting the batch status to Rejected not Errored

edit: It looks like vector does report that it's returning 400s for these requests, so not silently discarding, which is good news

weirdly our load balancer metrics at the time don't show a 400 error but that may be a bug on our end, vector_http_server_responses_sent_total{status="400"} does show it

@frankh
Copy link
Contributor Author

frankh commented Aug 9, 2024

I've made a fix to the Kafka sink so these will get correctly reported as 500 errors: #21036

I'm not 100% sure on when exactly Rejected vs Errored should be sent, but based on the HTTP source's response codes I assume Rejected should mean the event itself is bad, and Errored means the sink failed for reasons unrelated to the event content

@frankh
Copy link
Contributor Author

frankh commented Aug 14, 2024

fixed in #21036

@frankh frankh closed this as completed Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

2 participants