Highly Parallel and Distributed Stream Processing: Spark Streaming

In the following, we provide a presentation about building a Linked-data Pipeline for DLSS based on Spark & Kafka. The section on Spark is pretty detailed, providing information on why we want to use it, and how it works, and the advantage that it provides: Spark/Kafka for a Linked-Data Pipeline (Note the presentation has animation, hence we recommend to download it and play it in PowerPoint)

Documentation on Spark Streaming can be found here: Spark Streaming

You will find our spark applications in the project SparkStreamingConvertors which aggregate our convertors component projects. Note that those are applications supposed to be submitted to a spark cluster.

Furthermore the Demo project aggregates among others, some Spark demos projects to learn and play around with: EstimatorStreamingApp (Spark Streaming), EstimatorApp (Spark Batch), MarcXMLtoBibFrame (Spark Batch)