JsonRelay
is a SparkListener
that converts SparkListenerEvent
s to JSON and forwards them to an external service via RPC.
It is designed to be used with slim
, which consumes JsonRelay's emitted events and writes useful statistics about them to Mongo, from whence Spree serves up live-updating web pages.
With Spark >= 1.5.0 you can simply pass the following flags to your spark-shell
and spark-submit
commands:
--packages org.hammerlab:spark-json-relay:2.0.1
--conf spark.extraListeners=org.apache.spark.JsonRelay
If using earlier versions of Spark, you'll need to first download the JAR:
$ wget https://repo1.maven.org/maven2/org/hammerlab/spark-json-relay/2.0.1/spark-json-relay-2.0.1.jar
Then, pass these flags to your spark-submit
or spark-shell
commands:
--driver-class-path spark-json-relay-2.0.1.jar
--conf spark.extraListeners=org.apache.spark.JsonRelay
That's it!
Two additional flags, --conf spark.slim.{host,port}
, specify the location JsonRelay
will attempt to connect and send events to.
JsonRelay
just piggybacks on Spark's JsonProtocol
for JSON serialization, with two differences:
- It adds an
appId
field to all events; this allows downstream consumers to process events from multiple Spark applications simultaneously / more easily over time. - It rolls its own serialization of
SparkListenerExecutorMetricsUpdate
events, which is omitted from Spark'sJsonProtocol
in Spark prior to1.5.0
(cf. SPARK-9036).
Please file an issue if you have any questions about or problems using JsonRelay
!