Thesis project: continuous streaming #5

amosr · 2019-02-05T19:50:40Z

Icicle is a streaming query language for machine-learning feature generation. Icicle must currently be run in batch mode over the day or the week's data set. We would like to be able to run Icicle on realtime streaming data. Ideally, we could point Icicle at an input stream to read from, and Icicle would run on-line, continuously consuming and processing data from the stream.

One possibility is to write a Haskell program that consumes an input stream, and for each new input, passes this input to the C code generated by Icicle. However, our generated C code currently executes in batches, which performs a potentially expensive 'aggregation' step at the end of the batch. It may be beneficial to modify the code generation to split out the aggregation step into a separate function, so it can be applied only when necessary. For the implementation of streaming, Apache Kafka[1] may be a suitable streaming platform.

This project would involve some low-level compiler engineering and code generation. There is a video of a talk by Jacob Stanley [2] about some of the code generation internals.

[1] https://kafka.apache.org/ , https://hackage.haskell.org/package/milena

[2] https://www.youtube.com/watch?v=ZuCRgghVR1Q

amosr changed the title ~~Thesis project: Apache Kafka integration~~ Thesis project: continuous streaming Feb 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thesis project: continuous streaming #5

Thesis project: continuous streaming #5

amosr commented Feb 5, 2019 •

edited

Loading

Thesis project: continuous streaming #5

Thesis project: continuous streaming #5

Comments

amosr commented Feb 5, 2019 • edited Loading

amosr commented Feb 5, 2019 •

edited

Loading