Skip to content
This repository has been archived by the owner on Apr 22, 2022. It is now read-only.

divolte-collector-0.4.0

Compare
Choose a tag to compare
@asnare asnare released this 13 Apr 12:18
· 914 commits to master since this release

The main changes in this release relative to 0.3.0 are:

  • Divolte can now be configured with multiple endpoints for collecting events, as well as multiple mappings and destinations. More on this below.
  • JSON-based event collection is now supported. This is intended to support mobile and server applications.
  • We're now using Kafka's new producer API.

This release also introduces a new configuration format:

  • There are now 4 main sections:

    • global: for settings that affects the entire server instance. This includes server binding settings, ip2geo configuration, HDFS and Kafka configuration, and thread settings for the various phases of event processing.
      sources: the browser and JSON endpoints that events can be received on.
    • sinks: which HDFS directories and Kafka topics Avro data should be written to.
    • mappings: which sources should be connected to which sinks, and how received events should be converted to Avro records.
  • Sources are now more configurable. The endpoint paths can now be customised.

  • Kafka now requires different settings because we're using the new producer instead of the old one. The biggest change is that bootstrap.servers should be used instead of metadata.broker.list; see the Kafka documentation for more details.

  • The HDFS session-binning strategy for writing files has been removed.

  • The maximum 'pause' time for an internal thread to wait when queuing an event for the next stage of processing has been removed. (This used to be the max_enqueue_delay setting.) Now we drop messages immediately. In practice queues are either full or empty, and full means there's a problem which delaying isn't going to help with. In fact, it turned out that being full and waiting leads to cascading failures and problems such as thread starvation in the HTTP server.