-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid reordering events due to split partition queues #2437
Avoid reordering events due to split partition queues #2437
Conversation
Great care is required when splitting queues to avoid losing ordering. We rely on ordering for dedupe so reordering -> dropped kafka messages.
We also need this on main I guess |
ill base on main and we can cherry pick later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me, just minor comments
}; | ||
for task_id in topic_partition_tasks.into_values() { | ||
self.task_center.cancel_task(task_id); | ||
Rebalance::Error(_) => {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets propagated in the main loop i suppose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it doesnt get propagated, but also from my reading of librdkafka it cant happen, and actually their code generally assumes that it doesnt happen and has weird behaviour if it does. however, i can see that rust-rdkafka treats this scenario as equivalent to revoke, so i guess i can do the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually i cant treat it as equivalent to revoke, as they dont give me a handle on what the provided partitions are. my feeling is we can either ignore or panic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's panic only if there's a panic handler somewhere that makes sure the panic won't get propagated and tear down the whole node. I think it should be the case with task center/subscription controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo its better to just ignore it
@@ -418,13 +412,11 @@ impl ConsumerContext for RebalanceContext { | |||
} | |||
} | |||
|
|||
struct AbortOnDrop(TaskCenter, TaskId); | |||
struct AbortOnDrop(TaskHandle<()>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: This is a handy type if you want to move to task_center's so others can also use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me. I have no experience with librdkafka, but the rest of the parts make sense.
* Avoid reordering events due to split partition queues Great care is required when splitting queues to avoid losing ordering. We rely on ordering for dedupe so reordering -> dropped kafka messages. * Commit consumer state on rebalance * Review comments * Use spawn_unmanaged * Install prometheus recorder earlier so kafka metrics work
* Avoid reordering events due to split partition queues Great care is required when splitting queues to avoid losing ordering. We rely on ordering for dedupe so reordering -> dropped kafka messages. * Commit consumer state on rebalance * Review comments * Use spawn_unmanaged * Install prometheus recorder earlier so kafka metrics work
* Avoid reordering events due to split partition queues (#2437) * Avoid reordering events due to split partition queues Great care is required when splitting queues to avoid losing ordering. We rely on ordering for dedupe so reordering -> dropped kafka messages. * Commit consumer state on rebalance * Review comments * Use spawn_unmanaged * Install prometheus recorder earlier so kafka metrics work * Rebase changes * Handle panic with empty partitions --------- Co-authored-by: Jack Kleeman <[email protected]>
* Avoid reordering events due to split partition queues Great care is required when splitting queues to avoid losing ordering. We rely on ordering for dedupe so reordering -> dropped kafka messages. * Commit consumer state on rebalance * Review comments * Use spawn_unmanaged * Install prometheus recorder earlier so kafka metrics work * Handle panic with empty partitions
Great care is required when splitting queues to avoid losing ordering. We rely on ordering for dedupe so reordering -> dropped kafka messages.