Skip to content
业余布道师 edited this page Nov 17, 2016 · 6 revisions

If you want to use the kafka new consumer api KafkaConsumer, since kafka 0.9.0.0.

 * The consumer is not thread-safe. 
 * The consumer maintains TCP connections to the necessary brokers to fetch data. 
 * Failure to close the consumer after use will leak these connections.

Use the new receiver pool, config like this

<!-- Kafka New Receiver Pool -->
<bean id="messageReceiverPool" class="org.darkphoenixs.kafka.pool.KafkaMessageNewReceiverPool"
      init-method="init" destroy-method="destroy">
    <!-- Kafka New Receiver Pool -->
    <property name="messageAdapter" ref="messageAdapter"/>
    <property name="config" value="kafka/newconsumer.properties"/>
    <property name="poolSize" value="10"/>
    <property name="model" value="MODEL_1"/>
    <property name="batch" value="NON_BATCH"/>
    <property name="commit" value="AUTO_COMMIT"/>
</bean>

messageAdapter : org.darkphoenixs.kafka.core.KafkaMessageAdapter.

config : the consumer config file.

pros : the consumer config properties(same as config).

model : MODEL_1 is one consumer per thread, MODEL_2 is decouple consumption and processing.

batch : BATCH or NON_BATCH message processing.

commit : AUTO_COMMIT or SYNC_COMMIT or ASYNC_COMMIT.

poolSize : the size is the consumer thread pool size.

handleMultiple : how many multiple is the consumer thread pool size, MODEL_2 to take effect.

retryCount : the retry count for fault-tolerant, MODEL_1 & NON_BATCH & SYNC_COMMIT/ASYNC_COMMIT to take effect.

Note: init-method and destroy-method are required.

Model Pros and Cons

One Consumer Per Thread (MODEL_1)

  • PRO : It is the easiest to implement
  • PRO : It is often the fastest as no inter-thread co-ordination is needed
  • PRO : It makes in-order processing on a per-partition basis very easy to implement (each thread just processes messages in the order it receives them).
  • CON : More consumers means more TCP connections to the cluster (one per thread). In general Kafka handles connections very efficiently so this is generally a small cost.
  • CON : Multiple consumers means more requests being sent to the server and slightly less batching of data which can cause some drop in I/O throughput.
  • CON : The number of total threads across all processes will be limited by the total number of partitions.

Decouple Consumption and Processing (MODEL_2)

  • PRO : This option allows independently scaling the number of consumers and processors. This makes it possible to have a single consumer that feeds many processor threads, avoiding any limitation on partitions.
  • CON : Guaranteeing order across the processors requires particular care as the threads will execute independently an earlier chunk of data may actually be processed after a later chunk of data just due to the luck of thread execution timing. For processing that has no ordering requirements this is not a problem.
  • CON : Manually committing the position becomes harder as it requires that all threads co-ordinate to ensure that processing is complete for that partition.