Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thesis project: continuous streaming #5

Open
amosr opened this issue Feb 5, 2019 · 0 comments
Open

Thesis project: continuous streaming #5

amosr opened this issue Feb 5, 2019 · 0 comments

Comments

@amosr
Copy link
Contributor

amosr commented Feb 5, 2019

Icicle is a streaming query language for machine-learning feature generation. Icicle must currently be run in batch mode over the day or the week's data set. We would like to be able to run Icicle on realtime streaming data. Ideally, we could point Icicle at an input stream to read from, and Icicle would run on-line, continuously consuming and processing data from the stream.

One possibility is to write a Haskell program that consumes an input stream, and for each new input, passes this input to the C code generated by Icicle. However, our generated C code currently executes in batches, which performs a potentially expensive 'aggregation' step at the end of the batch. It may be beneficial to modify the code generation to split out the aggregation step into a separate function, so it can be applied only when necessary. For the implementation of streaming, Apache Kafka[1] may be a suitable streaming platform.

This project would involve some low-level compiler engineering and code generation. There is a video of a talk by Jacob Stanley [2] about some of the code generation internals.

[1] https://kafka.apache.org/ , https://hackage.haskell.org/package/milena

[2] https://www.youtube.com/watch?v=ZuCRgghVR1Q

@amosr amosr changed the title Thesis project: Apache Kafka integration Thesis project: continuous streaming Feb 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant