Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add concurrent-between-partitions kafka subscriber #2017

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Arseniy-Popov
Copy link

@Arseniy-Popov Arseniy-Popov commented Jan 2, 2025

Description

This adds at-least-once (manual commit) concurrent consumption from Kafka that is concurrent between partitions and sequential within a partition. This improves throughput by enabling concurrency while preserving message ordering guarantees and at-least-once delivery semantics.

Background

Concurrent at-least-once consumption from a single partition is at best not trivial with Kafka because Kafka doesn't support out-of-order commits: a commit shifts the offset forward and effectively commits all previous messages up to that offset that might have not yet been processed. That problem is rectified if consumption is sequential within a single partition while being concurrent between partitions. A partition is a unit of parallelism in Kafka.

Type of change

  • New feature (a non-breaking change that adds functionality)
  • This change requires a documentation update

Checklist

  • My code adheres to the style guidelines of this project (scripts/lint.sh shows no errors)
  • I have conducted a self-review of my own code
  • I have made the necessary changes to the documentation
  • My changes do not generate any new warnings
  • I have added tests to validate the effectiveness of my fix or the functionality of my new feature
  • Both new and existing unit tests pass successfully on my local environment by running scripts/test-cov.sh
  • I have ensured that static analysis tests are passing by running scripts/static-analysis.sh
  • I have included code examples to illustrate the modifications

Example

Assuming a topic is populated with 8 messages spread across 4 partitions in a round-robin manner, this example

import asyncio
import time
from faststream import FastStream, Logger
from faststream.kafka import KafkaBroker
from faststream.kafka.annotations  import KafkaMessage

broker = KafkaBroker("localhost:9092")
app = FastStream(broker)


@broker.subscriber(
    "topic1",
    group_id="microservice-1",
    auto_commit=False,
    max_workers=4
)
async def base_handler(message: dict, msg: KafkaMessage, logger: Logger):
    await asyncio.sleep(3)
    logger.debug(
        f"Finished message: {message['id']:>3}, time from publish: {time.time() - float(message['time']) :4.2f}"
    )
    await msg.ack()

will generate the following output

2025-01-03 14:11:14,821 INFO     - FastStream app starting...
2025-01-03 14:11:14,852 INFO     - topic1 | microservice-1 |            - `BaseHandler` waiting for messages
2025-01-03 14:11:20,796 INFO     -        |                |            - Consumer faststream-0.5.33-7cc1b452-9b78-47ac-bdf2-e5aa6d6263b7 assigned to partitions: frozenset({TopicPartition(topic='topic1', partition=2)})
2025-01-03 14:11:20,797 INFO     -        |                |            - Consumer faststream-0.5.33-9d39e639-e9a0-41d9-9b94-0fad0f68c9bc assigned to partitions: frozenset({TopicPartition(topic='topic1', partition=3)})
2025-01-03 14:11:20,797 INFO     -        |                |            - Consumer faststream-0.5.33-4080e0f3-69d6-42bd-8d12-cfe4af011866 assigned to partitions: frozenset({TopicPartition(topic='topic1', partition=1)})
2025-01-03 14:11:20,797 INFO     -        |                |            - Consumer faststream-0.5.33-10392508-9d8f-4b73-8eca-8d223896a02d assigned to partitions: frozenset({TopicPartition(topic='topic1', partition=0)})
2025-01-03 14:11:20,797 INFO     - FastStream app started successfully! To exit, press CTRL+C
2025-01-03 14:11:27,880 INFO     - topic1 | microservice-1 | 12-1735902 - Received
2025-01-03 14:11:27,884 INFO     - topic1 | microservice-1 | 12-1735902 - Received
2025-01-03 14:11:27,886 INFO     - topic1 | microservice-1 | 12-1735902 - Received
2025-01-03 14:11:27,891 INFO     - topic1 | microservice-1 | 12-1735902 - Received
2025-01-03 14:11:30,883 DEBUG    - topic1 | microservice-1 | 12-1735902 - Finished message:   1, time from publish: 3.01
2025-01-03 14:11:30,891 DEBUG    - topic1 | microservice-1 | 12-1735902 - Finished message:   2, time from publish: 3.01
2025-01-03 14:11:30,895 DEBUG    - topic1 | microservice-1 | 12-1735902 - Finished message:   3, time from publish: 3.01
2025-01-03 14:11:30,895 DEBUG    - topic1 | microservice-1 | 12-1735902 - Finished message:   4, time from publish: 3.01
2025-01-03 14:11:30,905 INFO     - topic1 | microservice-1 | 12-1735902 - Processed
2025-01-03 14:11:30,905 INFO     - topic1 | microservice-1 | 12-1735902 - Processed
2025-01-03 14:11:30,905 INFO     - topic1 | microservice-1 | 12-1735902 - Processed
2025-01-03 14:11:30,905 INFO     - topic1 | microservice-1 | 12-1735902 - Processed
2025-01-03 14:11:30,908 INFO     - topic1 | microservice-1 | 13-1735902 - Received
2025-01-03 14:11:30,908 INFO     - topic1 | microservice-1 | 13-1735902 - Received
2025-01-03 14:11:30,909 INFO     - topic1 | microservice-1 | 13-1735902 - Received
2025-01-03 14:11:30,909 INFO     - topic1 | microservice-1 | 13-1735902 - Received
2025-01-03 14:11:33,913 DEBUG    - topic1 | microservice-1 | 13-1735902 - Finished message:   7, time from publish: 6.02
2025-01-03 14:11:33,914 DEBUG    - topic1 | microservice-1 | 13-1735902 - Finished message:   5, time from publish: 6.03
2025-01-03 14:11:33,914 DEBUG    - topic1 | microservice-1 | 13-1735902 - Finished message:   6, time from publish: 6.02
2025-01-03 14:11:33,915 DEBUG    - topic1 | microservice-1 | 13-1735902 - Finished message:   8, time from publish: 6.02
2025-01-03 14:11:33,920 INFO     - topic1 | microservice-1 | 13-1735902 - Processed
2025-01-03 14:11:33,922 INFO     - topic1 | microservice-1 | 13-1735902 - Processed
2025-01-03 14:11:33,923 INFO     - topic1 | microservice-1 | 13-1735902 - Processed
2025-01-03 14:11:33,923 INFO     - topic1 | microservice-1 | 13-1735902 - Processed

@Arseniy-Popov
Copy link
Author

So far this is a work in progress, I will mark it as ready for review when it is ready.

@Lancetnik Lancetnik added the AioKafka Issues related to `faststream.kafka` module label Jan 2, 2025
@Arseniy-Popov Arseniy-Popov force-pushed the feat/add-kafka-subscriber-concurrent-among-partitions branch from 5b0f54d to e7b31dd Compare January 3, 2025 10:59
@Arseniy-Popov Arseniy-Popov marked this pull request as ready for review January 3, 2025 11:12
@Arseniy-Popov
Copy link
Author

@Lancetnik Ready for review.

@Lancetnik
Copy link
Member

@Arseniy-Popov thank you a lot! But I'll be able to look only next few days, sorry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AioKafka Issues related to `faststream.kafka` module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants