Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does it compare to other Python streaming libraries? #22

Closed
snth opened this issue Sep 30, 2024 · 1 comment
Closed

How does it compare to other Python streaming libraries? #22

snth opened this issue Sep 30, 2024 · 1 comment
Labels
question Further information is requested

Comments

@snth
Copy link

snth commented Sep 30, 2024

Hi,

First off, this project looks really cool! I'm sorry to be starting with an issue asking for comparisons with similar projects but while I've long been a fan of stream processing type patters, I also want to limit how many projects I keep track off so it would be great to know how you see streamable viz a viz similar projects and where you want streamable to go in future?

Things in my current repertoire that look similar on the surface are:

@ebonnal ebonnal added the question Further information is requested label Sep 30, 2024
@ebonnal
Copy link
Owner

ebonnal commented Sep 30, 2024

Hi @snth, thanks a lot, happy to see you here, and great question!
(I'm not a power user of those libraries so correct me if I'm wrong!)

In my opinion, the choice of a library ultimately depends on how naturally and elegantly your use case is implemented with it.

I want to outline the fundamental design of each library to provide insight into how it influences the way you structure your logic when working with them:

  • a streamz's Stream instance is an acyclic tree of operations, root being source and leaves being sinks. Stream.emit allows to push an element through this tree down to every leaf.
  • RxPY implements an Observer Pattern: compose operations to form a subject and make callback functions subscribe to it to be notified/called for each operated element ready for consumption.
  • streamable's Stream is a decorator for iterables: i.e. a Stream[T] is initiated from an Iterable[T] and it is itself an Iterable[T]. It exposes chainable lazy operations (stream's methods), each returning a new child stream. This makes it easy to integrate because one can init a stream from any iterable in their codebase and throw it at any function accepting an iterable. To make the learning curve zero flat, I wanted the interface to feel both pythonic and familiar to how one manipulate collections in functional languages.

More comparison material: if you don't come from there you should check the reddit post (Especially the Comparison section at the end) and this comment.

where you want streamable to go in future?

In term of my future involvement into the project, as I leveraged it at my previous job to implement 30 custom ETL pipelines that are running in production, I have the responsability to at least maintain its quality over years.

I am glad it is now in the feedback phase, gathering some "this would be a cool choice for my use case but it misses that feature" or "how to implement this use case?". I am grateful that other contributors are starting to come into the loop to extend it, you are more than welcome! 🫡 .

Let's minimize its responsabilities and keep it as unopinionated as possible, e.g. this snippet is NOT something the library should look like in the future:

Stream.from_csv("sales.csv")
.join(db_type="postgres", db_name="main", schema_name="public", table="user", on_keys=("user_id",))
.to_bigquery("enriched_sales", partition=datetime.date.today(), batchsize=1024)

instead, one can:

  1. instantiate an Iterable[Dict[str, Any]] of input rows using csv module
  2. join via .map using a psycopg2 client
  3. batch rows via .group(size=1024)
  4. write into BigQuery via .foreach using a bigquery.Client.insert_rows_json

Thank you for reading and let me know if it makes sense 🙏🏻 !

@ebonnal ebonnal pinned this issue Sep 30, 2024
@ebonnal ebonnal closed this as completed Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants