[MM2] Consumer Group offsets not updating after upgrading to 3.5.1+ #10461

thartyhp · 2024-08-16T19:44:51Z

thartyhp
Aug 16, 2024

We've been running MM2 for years now in an actice/passive mode and it's generally been pretty stable. We noticed when we moved form 3.4.0 to 3.5.1 we started getting lag alerts on the passive cluster. We moved back to 3.4.0 and MM2 started working again as before. We then tried 3.6.1 and the same issue. Didn't seem to matter if it was the low traffic dev clusters or high traffic prod clusters.

Here's the snippet of config I think is most applicable.

    mirrors:
    - sourceCluster: mm2-${SOURCE_CLUSTER}
      targetCluster: mm2-${CLUSTER_ENV}
      sourceConnector:
        tasksMax: 5
        config:
          producer.override.batch.size: 327680
          replication.factor: 3
          offset-syncs.topic.replication.factor: 3
          sync.topic.acls.enabled: true
          replication.policy.class: org.apache.kafka.connect.mirror.IdentityReplicationPolicy
      heartbeatConnector:
        config:
          heartbeats.topic.replication.factor: 3
      checkpointConnector:
        tasksMax: 5
        config:
          checkpoints.topic.replication.factor: 3
          sync.group.offsets.enabled: "true"
          replication.policy.class: org.apache.kafka.connect.mirror.IdentityReplicationPolicy

We tried messing with sync.group.offsets.enabled.interval.seconds: 30 and refresh.groups.interval.seconds: 90, but that didn't seem to move the needle. We had considered offset.max.lag, but we already had groups with more than 100+ lag on the passive side. Is there something else we could be missing here config wise? Something that was deprecated or needed to be changed when moving from 3.4.0? Is this something that's been fixed in 3.6.2 or 3.7.1?

scholzj · 2024-08-16T19:52:59Z

scholzj
Aug 16, 2024
Maintainer

At some point, Kafka changed how the offset synchronization works. It by default synchronizes it only every 100 or so records. I think that makes it behave better in some situations, but gives you a bigger offset latency. I do not remember which version introduced this change. But maybe that is what you are seeing? I think there is an option to configure it: https://github.com/strimzi/strimzi-kafka-operator/blob/main/examples/mirror-maker/kafka-mirror-maker-2-sync-groups.yaml#L30 ... but not sure if it is recommended to change it.

0 replies

thartyhp · 2024-08-23T17:44:23Z

thartyhp
Aug 23, 2024
Author

For others that might run into this. I asked this on the Kafka mailing list and got this reply.

I would definitely suggest trying 3.8.0 to see if it is suitable for your
use-case, as that last fix KAFKA-15905 is very impactful for Mirror Maker
instances that undergo restarts or rebalances.

From the description "Some topics are dozen or so behind, others are
hundreds of messages behind" is it possible that the translation is already
working to the best of its ability, or may benefit from a lower
offset.lag.max, without more information I can't be sure. I do know that
running versions 3.5-3.7 with offset.lag.max=0 is not sufficient to get
good translation, the latest patches are quite important.
There are open issues [KAFKA-16364, KAFKA-16641] for future improvements to the algorithm, but
there hasn't been much movement on those recently.

With the current implementation, I would expect the target consumer lag to
be approximately double the source consumer lag. The only time you should
expect "perfect" translation is for a consumer group that has committed at
the very end of a stable topic.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strimzi

[MM2] Consumer Group offsets not updating after upgrading to 3.5.1+ #10461

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Strimzi

[MM2] Consumer Group offsets not updating after upgrading to 3.5.1+ #10461

thartyhp Aug 16, 2024

Replies: 2 comments

scholzj Aug 16, 2024 Maintainer

thartyhp Aug 23, 2024 Author

thartyhp
Aug 16, 2024

scholzj
Aug 16, 2024
Maintainer

thartyhp
Aug 23, 2024
Author