Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consumer backoff halts all processing even if one target collection is in bad state. #89

Open
sigram opened this issue Nov 17, 2023 · 1 comment

Comments

@sigram
Copy link
Contributor

sigram commented Nov 17, 2023

Here's the scenario:

  • create a number of 1:1 collections in source / target Solr
  • start indexing (in my tests I was simply sending a constant stream of random documents in a round-robin fashion to each collection)
  • simulate a problem in ONE of the target collections. I simply deleted it, but in real life scenario it could've been any other kind of breakdown, or the following hypothetical scenario:
    • with a high-enough traffic there will be a number of messages in-flight between source and target. Assume the ops decided to remove (simultaneously) one of the collections, both at source and at target, without waiting for all messages for that collection to drain. Now the Consumer will pick up queued in-flight messages intended for the no-longer existing target collection.
  • Consumer will attempt to send picked up requests addressed to a no-longer existing (or functional) collection, to which Solr will respond with errors.
  • this will trigger back-offs, which will eventually halt ALL processing, also for the remaining healthy collections.

Is this behavior the best we can do? I'm not sure, I would expect the Consumer to continue processing requests for healthy collections. At the very least we should offer some protection against the hypothetical I mentioned above.

@anshumg
Copy link
Contributor

anshumg commented Nov 28, 2023

With the dead letter queue, I think this issue shouldn't exist. The failed messages would get sent and processed in parallel. Adding a 'no-op' for specific collections temporarily would also be a possible solution as I guess that's what we really want of the updates that are inflight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants