-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consumer does not rejoin group after heartbeat timeout #336
Comments
Looks like this is this your SO question, too: https://stackoverflow.com/questions/76015312/how-to-properly-deal-with-zombie-kafka-consumers-in-reactive-spring-boot-appli |
this was originally asked on SO here: https://stackoverflow.com/questions/76015312/how-to-properly-deal-with-zombie-kafka-consumers-in-reactive-spring-boot-appli but as I read the code afterwards, it feels more and more like a bug :( |
I see. Any thoughts about the possible fix? |
Well I didn't design the library so it's hard to tell is it a bug of reactor -kafka, of kafka-clients or a completely valid behaviour.
I've skimmed through the ConsumerCoordinator and it looks like it's *bound* to rejoin, if it's ever polled again.
But is it polled or not - this, I guess, is reactor-kafka responsibility. So I am posting it here hoping that the community will set me straight if I am wrong 🙏🙏
--
Sent from Mail.ru app for Android Friday, 14 April 2023, 06:52pm +03:00 from Artem Bilan ***@***.*** :
…I see. Any thoughts about the possible fix?
Or share, please, with us what part of the project code you think is producing such a bug?
—
Reply to this email directly, view it on GitHub , or unsubscribe .
You are receiving this because you authored the thread. Message ID: @ github . com>
|
You need to add retry (and possibly repeat) to the pipeline: https://projectreactor.io/docs/kafka/release/reference/#_error_handling_2 |
@garyrussell I see. Thanks so much for your help! I will give it a try and close the ticket when I can confirm it's no longer manifesting. May I also ask about a different thing: how useful would it be, to have a partition revoke handler commit offsets?
It seems that the documentation says that downstream consumer should not be concerned with this:
So this quoted code in the handler, presented above, is some legacy code and could/should be removed? Or it might be needed still? We have |
I don't know what reactor-kafka/src/main/java/reactor/kafka/receiver/internals/ConsumerEventLoop.java Lines 161 to 166 in d1aa41b
|
Expected Behavior
Actual Behavior
We have a Reactive Spring Boot application that employs "reactor-kafka" for Kafka consumers and producers.
we use 1
KafkaReceiver
per topic, that is subscribed to and kept in a Spring bean field.I observe that sometimes, some or all of the underlying
Consumer
-s just stop with an error message as follows:(this is the last message in the log thus far; the application lives happily for a day already, after all 11 of consumers have stuck in this limbo; topic is consumed by other pods)
Regardless of what the error says, should not consumer still be restarted by the library/Kafka internals? Or should it be application author's responsibility to somehow track this state and react accordingly (for example, by implementing liveness health check around this somehow)?
Steps to Reproduce
Possible Solution
Your Environment
netty
, ...): reactor-kafka: 1.3.17java -version
): 11.0.16.1uname -a
): Linux x64The text was updated successfully, but these errors were encountered: