Tool To Aggregate Events from Kafka Topics into Elasticsearch
The event aggregator is a simple infrastructure component designed to consume events written into Kafka in AVRO format and write them into Elasticsearch. Event schemas are maintained into a Schema Registry. The one currently supported is the Confluent Schema Registry
The tool is essentially a rewriting for open source purposed of a similar tool I wrote and extensively used when working on a microservice event driven system. It proven very useful to me so I think it would have been good to share with the community.
Feel free to use it and extend it.
One of the major infrastructural requirements when implementing a distributed system, and mostly when working on a microservice architecutre, is log aggregation. Developers and administrators need to be able to have a centralised view of how a user requrest propagates through the system and to trace when an error occurs and how it propagates.
An event based system imposes that the main interactions across the different services are based on events published into an event bus. Different (micro)services are then able to subscribe to different event topics and react to the different events.
In such an architecture is therefore highly beneficial to be able to browse, search and inspect events in the same place where the services logs are. This will allow to completely track the overall system reaction to every event.
If you think about a REST based microservice system, the event aggregation plays a similar role of having the HTTP trafic in your log aggregator.
Well designed events would include tracing facilities such as Correlation IDs that is used to track the whole flow of events originated by the same source. This ID is then usually used as an enricher of the application logs (see also Log4J MDC).
Implementing the above mentioned solutions and having logs and events in your log aggregation facility will allow you to have an incredibly powerful view of your system behavior.
The project is ready to be used but I still need to complete the publishing to DockerHUB. Feel free to start using it and contribute!!
- Simple command line application deployed as a Docker Container
- Designed to support workload sharing through the deployment of multiple instances
- Aggregates events from multiple topics
- Supports writing to Elasticsearch with both native Elasticsearch transport or REST over HTTP (the last one is required if you are trying to integrate with the AWS Elastisearch Service)
- Events are assumed to be written in AVRO format, you could extend the tool to support different format and I will be more than happy to merge your PR
- AVRO Schemas are assumed to be managed via Confluent's Schema Registry
- Connection to Elasticsearch is protected via a Circuit Breaker (connection to the Schema Registry is not since the current client doesn't suport)
- The target Elasticsearch index where the events are stored to is changed every day
- Configurable path for event timestamp and ID extraction
- Container status can be monitored via an HTTP Health Check Endpoint
A series of possible extension I might work on and I will be happy to accept PRs for are:
- Supporting different event serialization formats (Protbuf, Thrift, ...)
- Supporting different log aggregation targets (SolR, ...)
- Supporting different event bus (Kinesis, ...) (this might be more effort considering current implementation)
The tool is based on
- Scala 2.11
- Elasticsearch 2.3 - Chosing this version in order to support AWS Elastisearch Service
- Akka 2.4
- Reactive KAFKA and AKKA Streams
- Elastic4s
- Confluent Schema Registry