You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Icicle is a streaming query language for machine-learning feature generation. Icicle must currently be run in batch mode over the day or the week's data set. We would like to be able to run Icicle on realtime streaming data. Ideally, we could point Icicle at an input stream to read from, and Icicle would run on-line, continuously consuming and processing data from the stream.
One possibility is to write a Haskell program that consumes an input stream, and for each new input, passes this input to the C code generated by Icicle. However, our generated C code currently executes in batches, which performs a potentially expensive 'aggregation' step at the end of the batch. It may be beneficial to modify the code generation to split out the aggregation step into a separate function, so it can be applied only when necessary. For the implementation of streaming, Apache Kafka[1] may be a suitable streaming platform.
This project would involve some low-level compiler engineering and code generation. There is a video of a talk by Jacob Stanley [2] about some of the code generation internals.
Icicle is a streaming query language for machine-learning feature generation. Icicle must currently be run in batch mode over the day or the week's data set. We would like to be able to run Icicle on realtime streaming data. Ideally, we could point Icicle at an input stream to read from, and Icicle would run on-line, continuously consuming and processing data from the stream.
One possibility is to write a Haskell program that consumes an input stream, and for each new input, passes this input to the C code generated by Icicle. However, our generated C code currently executes in batches, which performs a potentially expensive 'aggregation' step at the end of the batch. It may be beneficial to modify the code generation to split out the aggregation step into a separate function, so it can be applied only when necessary. For the implementation of streaming, Apache Kafka[1] may be a suitable streaming platform.
This project would involve some low-level compiler engineering and code generation. There is a video of a talk by Jacob Stanley [2] about some of the code generation internals.
[1] https://kafka.apache.org/ , https://hackage.haskell.org/package/milena
[2] https://www.youtube.com/watch?v=ZuCRgghVR1Q
The text was updated successfully, but these errors were encountered: