-
Notifications
You must be signed in to change notification settings - Fork 22
Kafka New Consumer
If you want to use the kafka new consumer api KafkaConsumer, since kafka 0.9.0.0.
* The consumer is not thread-safe.
* The consumer maintains TCP connections to the necessary brokers to fetch data.
* Failure to close the consumer after use will leak these connections.
Use the new receiver pool, config like this
<!-- Kafka New Receiver Pool -->
<bean id="messageReceiverPool" class="org.darkphoenixs.kafka.pool.KafkaMessageNewReceiverPool"
init-method="init" destroy-method="destroy">
<!-- Kafka New Receiver Pool -->
<property name="messageAdapter" ref="messageAdapter"/>
<property name="config" value="kafka/newconsumer.properties"/>
<property name="poolSize" value="10"/>
<property name="model" value="MODEL_1"/>
<property name="batch" value="NON_BATCH"/>
<property name="commit" value="AUTO_COMMIT"/>
</bean>
messageAdapter
: org.darkphoenixs.kafka.core.KafkaMessageAdapter.
config
: the consumer config file.
pros
: the consumer config properties(same as config).
model
: MODEL_1 is one consumer per thread, MODEL_2 is decouple consumption and processing.
batch
: BATCH or NON_BATCH message processing.
commit
: AUTO_COMMIT or SYNC_COMMIT or ASYNC_COMMIT.
poolSize
: The size is the consumer thread pool size.
handleMultiple
: How many multiple is the consumer thread pool size, MODEL_2 to take effect.
retryCount
: The retry count for fault-tolerant, When model is MODEL_1 & batch is NON_BATCH & commit is SYNC_COMMIT/ASYNC_COMMIT to take effect.
Note: init-method
and destroy-method
are required.
- PRO : It is the easiest to implement
- PRO : It is often the fastest as no inter-thread co-ordination is needed
- PRO : It makes in-order processing on a per-partition basis very easy to implement (each thread just processes messages in the order it receives them).
- CON : More consumers means more TCP connections to the cluster (one per thread). In general Kafka handles connections very efficiently so this is generally a small cost.
- CON : Multiple consumers means more requests being sent to the server and slightly less batching of data which can cause some drop in I/O throughput.
- CON : The number of total threads across all processes will be limited by the total number of partitions.
- PRO : This option allows independently scaling the number of consumers and processors. This makes it possible to have a single consumer that feeds many processor threads, avoiding any limitation on partitions.
- CON : Guaranteeing order across the processors requires particular care as the threads will execute independently an earlier chunk of data may actually be processed after a later chunk of data just due to the luck of thread execution timing. For processing that has no ordering requirements this is not a problem.
- CON : Manually committing the position becomes harder as it requires that all threads co-ordinate to ensure that processing is complete for that partition.