Avoid reordering events due to split partition queues #2437

jackkleeman · 2024-12-17T16:13:42Z

Great care is required when splitting queues to avoid losing ordering. We rely on ordering for dedupe so reordering -> dropped kafka messages.

slinkydeveloper · 2024-12-18T07:58:08Z

We also need this on main I guess

jackkleeman · 2024-12-18T08:50:30Z

ill base on main and we can cherry pick later

slinkydeveloper

Overall looks good to me, just minor comments

crates/ingress-kafka/src/consumer_task.rs

slinkydeveloper · 2024-12-18T08:47:18Z

crates/ingress-kafka/src/consumer_task.rs

-        };
-        for task_id in topic_partition_tasks.into_values() {
-            self.task_center.cancel_task(task_id);
+            Rebalance::Error(_) => {}


This gets propagated in the main loop i suppose?

no, it doesnt get propagated, but also from my reading of librdkafka it cant happen, and actually their code generally assumes that it doesnt happen and has weird behaviour if it does. however, i can see that rust-rdkafka treats this scenario as equivalent to revoke, so i guess i can do the same

actually i cant treat it as equivalent to revoke, as they dont give me a handle on what the provided partitions are. my feeling is we can either ignore or panic

Let's panic only if there's a panic handler somewhere that makes sure the panic won't get propagated and tear down the whole node. I think it should be the case with task center/subscription controller?

imo its better to just ignore it

crates/ingress-kafka/src/consumer_task.rs

crates/ingress-kafka/src/subscription_controller.rs

crates/ingress-kafka/src/consumer_task.rs

AhmedSoliman · 2024-12-19T12:16:48Z

crates/ingress-kafka/src/consumer_task.rs

@@ -418,13 +412,11 @@ impl ConsumerContext for RebalanceContext {
    }
 }

-struct AbortOnDrop(TaskCenter, TaskId);
+struct AbortOnDrop(TaskHandle<()>);


Optional: This is a handy type if you want to move to task_center's so others can also use it.

AhmedSoliman

Changes look good to me. I have no experience with librdkafka, but the rest of the parts make sense.

* Avoid reordering events due to split partition queues Great care is required when splitting queues to avoid losing ordering. We rely on ordering for dedupe so reordering -> dropped kafka messages. * Commit consumer state on rebalance * Review comments * Use spawn_unmanaged * Install prometheus recorder earlier so kafka metrics work

* Avoid reordering events due to split partition queues (#2437) * Avoid reordering events due to split partition queues Great care is required when splitting queues to avoid losing ordering. We rely on ordering for dedupe so reordering -> dropped kafka messages. * Commit consumer state on rebalance * Review comments * Use spawn_unmanaged * Install prometheus recorder earlier so kafka metrics work * Rebase changes * Handle panic with empty partitions --------- Co-authored-by: Jack Kleeman <[email protected]>

* Avoid reordering events due to split partition queues Great care is required when splitting queues to avoid losing ordering. We rely on ordering for dedupe so reordering -> dropped kafka messages. * Commit consumer state on rebalance * Review comments * Use spawn_unmanaged * Install prometheus recorder earlier so kafka metrics work * Handle panic with empty partitions

jackkleeman added 2 commits December 17, 2024 16:11

Avoid reordering events due to split partition queues

c50d360

Great care is required when splitting queues to avoid losing ordering. We rely on ordering for dedupe so reordering -> dropped kafka messages.

Commit consumer state on rebalance

699aad2

jackkleeman mentioned this pull request Dec 17, 2024

[ingress_kafka] Eagerly Launch Partition Consumers #2430

Closed

jackkleeman requested a review from slinkydeveloper December 17, 2024 19:46

slinkydeveloper approved these changes Dec 18, 2024

View reviewed changes

jackkleeman changed the base branch from 1.1.5-pprof to release/1.1.5 December 18, 2024 09:00

AhmedSoliman reviewed Dec 18, 2024

View reviewed changes

Review comments

49e1f5c

jackkleeman requested review from slinkydeveloper and AhmedSoliman December 18, 2024 15:00

AhmedSoliman reviewed Dec 19, 2024

View reviewed changes

crates/ingress-kafka/src/consumer_task.rs Outdated Show resolved Hide resolved

Use spawn_unmanaged

5381ef5

AhmedSoliman reviewed Dec 19, 2024

View reviewed changes

Install prometheus recorder earlier so kafka metrics work

e1d1926

AhmedSoliman approved these changes Dec 19, 2024

View reviewed changes

jackkleeman merged commit aaecdc2 into restatedev:release/1.1.5 Dec 19, 2024
9 of 10 checks passed

jackkleeman deleted the kafka-ordering branch December 19, 2024 16:11

slinkydeveloper mentioned this pull request Dec 19, 2024

Fix Kafka ordering issue #2447

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid reordering events due to split partition queues #2437

Avoid reordering events due to split partition queues #2437

jackkleeman commented Dec 17, 2024

slinkydeveloper commented Dec 18, 2024

jackkleeman commented Dec 18, 2024

slinkydeveloper left a comment

slinkydeveloper Dec 18, 2024

jackkleeman Dec 18, 2024

jackkleeman Dec 18, 2024

slinkydeveloper Dec 18, 2024 •

edited

Loading

jackkleeman Dec 18, 2024

AhmedSoliman Dec 19, 2024

AhmedSoliman left a comment

Avoid reordering events due to split partition queues #2437

Avoid reordering events due to split partition queues #2437

Conversation

jackkleeman commented Dec 17, 2024

slinkydeveloper commented Dec 18, 2024

jackkleeman commented Dec 18, 2024

slinkydeveloper left a comment

Choose a reason for hiding this comment

slinkydeveloper Dec 18, 2024

Choose a reason for hiding this comment

jackkleeman Dec 18, 2024

Choose a reason for hiding this comment

jackkleeman Dec 18, 2024

Choose a reason for hiding this comment

slinkydeveloper Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

jackkleeman Dec 18, 2024

Choose a reason for hiding this comment

AhmedSoliman Dec 19, 2024

Choose a reason for hiding this comment

AhmedSoliman left a comment

Choose a reason for hiding this comment

slinkydeveloper Dec 18, 2024 •

edited

Loading