Feature request: Backoff the retry when gateway reports RESOURCE_EXHAUSTED #70

jwulf · 2022-01-11T06:02:34Z

When the broker signals backpressure, the Kafka sink responds with an immediate retry.

See: https://github.com/camunda-community-hub/kafka-connect-zeebe/blob/master/src/main/java/io/zeebe/kafka/connect/sink/ZeebeSinkFuture.java#L76

It should backoff the retry to give the broker a chance to recover.

berndruecker · 2022-01-11T08:22:40Z

Great finding (based on https://forum.camunda.io/t/kafka-sink-connector-high-cpu-utilisation/2667/5)!

"I am performing load test with simple BPMN. BPMN have two task one is source and another one is SINK task. I am using the Zeebe kafka connector open source repository for async communication model. Whenever zeebe client started throwing error “RESOURCE_EXHUSTED” then SINK connector CPU utilisation reached to 100%. I dig into the code and found that recursive call to retry until it receive the successful code or failure code. Load test service is creates instances with the rate of 7K RPM. I also observe that backpressure limit went down to 5."

The biggest question is what strategy to use in case of backpressure 🤔

jwulf · 2022-01-11T11:20:52Z

The easy and obvious one is t = 2 * t. So, 1s, 2s, 4s, 8s, 16s, 32s, 64s ... Maybe with a maximum bound?

This is how I do it in the Node client.

berndruecker · 2022-01-11T14:22:14Z

This is about the backoff timing, but we would also need to throttle/stop consuming records from Kafka, otherwise, we build up a huge pile of unprocessed tasks in memory, which will lead to problems sooner or later.

Let's table it for the moment. I find this a fascinating problem I would love to come back on when we have a bit of time.

berndruecker · 2022-01-13T14:36:07Z

Quick update: @falko has contact with a customer where this might get a more urgent request. We will look at it together sometime within the next few weeks.

berndruecker · 2022-01-26T08:34:47Z

Another update: We are looking at it in https://github.com/camunda-community-hub/kafka-connect-zeebe/tree/fm-exponential-backoff

berndruecker · 2022-01-26T11:49:47Z

reminder for me: Pile of tasks in memory is no problem because we do not retrieve new tasks before the others are completed at the moment

berndruecker · 2022-01-26T15:25:57Z

Waiting for customer to confirm that this solves the issue

berndruecker mentioned this issue Jan 26, 2022

Add backoff to retrying #71

Merged

berndruecker closed this as completed in #71 Jan 26, 2022

berndruecker reopened this Jan 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Backoff the retry when gateway reports RESOURCE_EXHAUSTED #70

Feature request: Backoff the retry when gateway reports RESOURCE_EXHAUSTED #70

jwulf commented Jan 11, 2022

berndruecker commented Jan 11, 2022 •

edited

Loading

jwulf commented Jan 11, 2022 •

edited

Loading

berndruecker commented Jan 11, 2022

berndruecker commented Jan 13, 2022

berndruecker commented Jan 26, 2022

berndruecker commented Jan 26, 2022

berndruecker commented Jan 26, 2022

Feature request: Backoff the retry when gateway reports RESOURCE_EXHAUSTED #70

Feature request: Backoff the retry when gateway reports RESOURCE_EXHAUSTED #70

Comments

jwulf commented Jan 11, 2022

berndruecker commented Jan 11, 2022 • edited Loading

jwulf commented Jan 11, 2022 • edited Loading

berndruecker commented Jan 11, 2022

berndruecker commented Jan 13, 2022

berndruecker commented Jan 26, 2022

berndruecker commented Jan 26, 2022

berndruecker commented Jan 26, 2022

berndruecker commented Jan 11, 2022 •

edited

Loading

jwulf commented Jan 11, 2022 •

edited

Loading