Skip to content
This repository has been archived by the owner on Nov 20, 2020. It is now read-only.

Application Scenarios

Lev Gorodinski edited this page Dec 13, 2017 · 2 revisions

It can be helpful to classify uses of Kafka into a few application scenarios as follows.

Stream Processor

In this scenario, a service consumes a stream as input and produces a stream as output. The input and output streams may be Kafka topics. The input stream can be any stream and the output can also be a sink, such as a database or another API. A defining characteristic of this scenario is that the stream processor controls its consumption of the input stream, and therefore, controls fault tolerance, retries and concurrency. In case of an error, the processor can crash and resume from the last committed checkpoint. It may retry an operation on an input a configurable number of times. Concurrency is controlled by ensuring a single instance of the application owns individual partitions.

Example:

let consumer : Consumer = ...
let producer : Producer = ...

Consumer.consume consumer (fun _ ms -> async {
  let! results = handle ms
  do! Producer.produceBatched producer results })

Request-Reply Service

In this scenario, a service handles requests sent by client awaiting an immediate response. For example, an HTTP API or an RPC service. As part of handling the request, the service needs to produce message to Kafka. A failure to produce to Kafka results in the failure of the request - they share their fate. If a request does fail, the error is returned to the client, and the client may choose to retry or not. Unlike in the stream processing scenario, in this scenario, crashing the service is usually not an option. Instead, the service catches errors, logs them and continues operating.

Example:

let producer : Producer = ...

let handle (req:HttpReq) : Async<HttpRes> = async {
  let message = ...
  do! Producer.produce producer message
  let res = ...
  return res }

Telemetry

In this scenario, a service produces messages to Kafka similar to the stream processor and request-reply scenarios, however semantically, the messages are telemetry data rather than domain data. This means that publishing telemetry messages is not critical for the correct functioning of the application. Instead, it is acceptable for messages to be delivered on a best effort basis. This scenario has a few implications for how messages are published. Publishing a message should not be in the application's critical path. Instead, messages are placed into a buffer and flushed asynchronously. This means that publishing failures are decoupled from application request failures. However, it is still important to know when failures in occur in the background flush process. Moreover, buffer capacity must be controlled - it may be desirable to discard messages in case of an overflow, or to block incoming requests.

Example:

let producer : BufferingProducer = ...

producer
|> BufferingProducer.errors
|> Event.add (fun (exn,ms) -> ...) 

producer
|> BufferingProducer.discarding
|> Event.add (fun _ -> ...) 

producer
|> BufferingProducer.blocking
|> Event.add (fun _ -> ...) 

let handle (req:HttpReq) : Async<HttpRes> = async {
  let message = ...
  BufferingProducer.produce producer message
  let res = ...
  return res }
Clone this wiki locally